# Performance Research This directory contains all the files used during research into the performance of various alternate implementations for the wcswidth commands. ## Algorithms Five different algorithms are evaluated. |Algorithm |The result is determined ... | |--- |--- | |`map` |... by using the numeric codepoint to index into a __very large__ list | |`binary` |... by a binary tree of nested `if`-checks | |`ternary` |... by a binary tree of nested ternary expressions | |`2map` |... through direct indexing in two lists, using redundancies in the results | |`linear` |... through a linear series of `if`-checks | The `linear` algorithm is the easiest to generate, it can be done in a single pass over the database file. All others need some internal data structures and phases to convert the information in various ways before emitting code. ## Chosen algorithm The chosen algorithm is `binary`. Of the five algorithms looked at `linear` is the slowest by far, with the second slowest (`2map`) already about 25 times faster. The fastest algorithm, `map` is even about 46 times faster. That speed unfortunately comes with a factor 126 size overhead, i.e. 2 orders of magnitude. Because of this `map` is removed from consideration too. Of the three left `2map` is not only the slowest, it also has a larger size overhead than the other two. The performance was actually a surprise. Initial expectation was that its two list indexing ops, plus the math for it, would be faster than running through the tree of conditionals used by `binary` and `ternary`. For `binary` and `ternay` the size of the code is actually more strongly influenced by the chosen formatting than the Tcl commands themselves. I.e. a good chunk of the overhead comes from formatting the Tcl code for readability. Forsaking that brings both down to the size of `linear`. Because of that code size is ignored now, and `binary` is chosen for speed. It is a tick faster than `ternary`. The reason for that is currently not known. ## Detailed information ### General speed, code size Notes: - The algorithms are ordered by time, from fastest down to slowest. - The overhead and gain values are computed relative to the base algorithm, `linear`. The incremental (`*/I`) values are computed relative to the algorithm in the row below (or itself, for the bottom row). - Time unit is seconds. - Due to the benchmarking repeating each codepoint a thousand times the `Points/s` is computed as `1114112*1000/(time)`. |Algorithm |Code Size |Overhead |Overhead/I |Time |Points/s |Gain |Gain/I | |--- |---: |---: |---: |---: |---: |---: |---: | |map |4492047 |126.00 |104.48 |851.58 |1308295.80 |48.11 |1.53 | |binary |42994 |1.21 |1.30 |1304.00 |854379.71 |31.42 |1.07 | |ternary |33037 |0.93 |0.50 |1396.33 |797886.46 |29.34 |1.11 | |2map |66463 |1.86 |1.86 |1554.56 |716674.91 |26.36 |26.36 | |linear |35651 |1.00 |1.00 |40972.78 |27191.51 |1.00 |1.00 | ### Statistics. Five values are shown, per algorithm and command (type information, width information). - Fastest response, minimum time - Slowest response, maximum time - 50th quartile, median response, time in which half the points are handled. - 90th quartile, time in which 90% of the points are handled. - 99th quartile, time in which 99% of the points are handled. Note - The time unit is microseconds. - For all algorithms the 99th quartile is well below the maximum response time. The maximum response times are likely outliers caused by external factors influencing benchmark operation. Looking at the visualizations across the range indicate the same |Algorithm |Type Min |Max |50% |90% |99% |Width Min |Max |50% |90% |99% | |--- |---: |---: |---: |---: |---: |---: |---: |---: |---: |---: | |map |0.34 |1.05 |0.35 |0.36 |0.38 |0.35 |1.09 |0.35 |0.36 |0.39 | |binary |0.52 |2.15 |0.59 |0.61 |0.64 |0.49 |1.89 |0.51 |0.54 |0.57 | |ternary |0.58 |2.13 |0.66 |0.69 |0.71 |0.5 |1.72 |0.52 |0.56 |0.6 | |2map |0.63 |2.37 |0.66 |0.67 |0.78 |0.64 |2.35 |0.66 |0.67 |0.78 | |linear |0.38 |39.21 |23.8 |24.14 |24.59 |0.41 |22.25 |13.56 |13.72 |13.97 | ## Directory Structure Notes: - The `.csv` files are stored `xz` compressed, due to their large size (20M). - The `.pdf` files are not in the repository. Use `plot.sh` to generate them from the .csv files. Decompress the .csz.xz files before doing so. |Pattern |Notes | |--- |--- | |regen.sh |Regenerate all implementations | |plot.sh |Generate the .pdf files from the other data | | | | |bench.sh |Benchmark all implementations | |bench/bench.gnuplot |Plot configuration to visualize a bench result | |bench/bench.tcl |Benchmark a specific implementation | |bench/stats.tcl |Compute statistics from bench results | |bench/stats-tables.tcl |Compute the tables shown in this README. | |bench/quartiles.tcl |Compute response time quartiles | | | | |check.sh |Check implementations for result mismatches | |check/check.tcl |Check implementations for result mismatches | | | | |svs.sh |Generate size/speed plots | |svs/all.csv |Size/speed for all implementations - Manual | |svs/without-map.csv |As above, with `map` removed due its very large size | |svs/svs.gnuplot |Plot configuration | |svs/svs.pdf(.xz) |Resulting plot | | | | |compare.sh |Various comparisons, quartile plots, etc. | |compare/compare.gnuplot|Plot configuration | |compare/compare.pdf(.xz)|Resulting plot | | | | |(alg)/build.tcl |Generator script | |(alg)/wcswidth.tcl |Generated implementation | |(alg)/bench.csv(.xz) |Benchmarking result | |(alg)/bench-stats.csv |Result statistics | |(alg)/bench.pdf(.xz) |Visualized results | |(alg)/quartiles.csv |Results from `bench/quartiles.tcl` |