Wednesday, November 10, 2010

Defragmenting lazy bytestrings

In the last post, I introduced the blaze-builder library and stressed the point that it is important to ensure a large average chunk size for the constructed lazy bytestrings. In this post, I'll give a concrete example of why ensuring a large average chunk size matters.

You probably know the nice zlib library that allows you to compress a lazy bytestring with a single call to

    compress :: L.ByteString -> L.ByteString

I will use compress to illustrate that small chunk sizes can be costly. Actually, they can be so costly that it is  worth it to first "defragment" the lazy bytestring before compressing it. Using the blaze-builder library, defragmentation is easily defined as follows.

    defragment :: L.ByteString -> L.ByteString
    defragment = toLazyByteStringfromLazyByteString

The builder created using fromLazyByteString copies the chunks up to a size of 8kb and insert them directly in the output stream otherwise. This way we can guarantee a minimal average chunk size of 4kb no matter when the output buffer is flushed due to a direct insertion of an 8kb block.

The following plot shows the measured times in boxplot format for defragmenting ("compaction only"), direct compression, and compression with preceding compaction of 200kb of data represented as a lazy bytestring for different fixed chunk sizes.

As in my previous post, the benchmarks were measured on a Core2 Duo T7500 with 2GB RAM and Linux 2.6.32-24 i686 and GHC 6.12.3. The corresponding measurement log can be found here. A log-log plot exhibiting more information on the behaviour of defragment for larger chunk sizes can be found here.

The above plot shows that compress profits heavily from defragmentation. Sadly, I do not yet know the cause for the significant slowdown of compress for lazy bytestrings with a small average chunk size. I guess it is a combined effect of the cost of the FFI calls (how big are they actually?) and perhaps some implementation overhead stemming from large amount of state being threaded through the implementation of compress. Comments and clarifications are very much welcome.

Note that as for filesystems, an even better solution than using regular defragmentation is to avoid fragmentation in the first place, which you can achieve by using a Builder for constructing lazy bytestrings. Note also that fromLazyByteString currently does not wrap bytestrings around buffer boundaries, which results in some unnecessarily spilled memory for medium (1kb - 8kb) chunk sizes. I'll implement that in the near future and will post the new measurements here.

1 comment:

  1. Nice work, Simon! Nothing like a combination of a nice performance tip and some good data to back it up to put a smile on my face :-)