The result for text is the time it takes to UTF-8 encode a nicely packed lazy text value of length 100'000'000. The other times for blaze, base, via-text, and utf8-string measure how long it takes to UTF-8 encode a String of length 100'000'000 using blaze-builder, GHC 7.0.2's base library, packing to a lazy text value and encoding it, and using the utf8-string library. Sadly, utf8-light supports only strict ByteStrings and, hence, it's not really usable to UTF-8 encode a String that long. Therefore, the time given for utf8-light is the time it takes to encode a String of length 1'000'000 a hundred times.
My conclusions after this experiment are the following:
- Using [Word8] list based encoding implementations is likely to result in suboptimal performance.
- It is not worth packing a String first to a Text value, if it is encoded right away again.
- The current work I'm spending on integrating blaze-builder with the bytestring library is really worth the effort. Compared to the text benchmark, which uses a better (packed) representation of Char sequences, we are only a factor 1.6 slower. Moreover, it might even be worth a try to replace the String encoding functions in the base library by according Builders to gain these additional 50 percent of speed. As an additional benefit, we could even think of executing Builders directly on the buffer associated to a Handle and, therefore, output byte streams denoted by Builders without any buffer allocations.
- It would be very interesting to see how well we fare against other languages. If a reader would implement the same benchmark in C, C++, Java, Python, ... I'd be very glad to publish the obtained results here.