tag:blogger.com,1999:blog-37340047134731602502024-02-19T00:22:41.762-08:00λ-viewexperience programming through a functional lens...Simon Meierhttp://www.blogger.com/profile/17873513558398372917noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-3734004713473160250.post-10679199663994826872012-01-20T15:26:00.000-08:002012-01-20T15:28:23.441-08:00Talk: A guided tour through the bytestring libraryYesterday, I held a talk about the <a href="http://hackage.haskell.org/package/bytestring">bytestring</a> library at the <a href="http://www.meetup.com/HaskellerZ/">Zurich HaskellerZ</a> meetup group. The goal of the talk was to enable the audience to write efficient code based on the bytestring library; i.e., explain enough about internals of the library and GHC such that they can judge the cost of the operations on bytestrings both in terms of time and space. The talk also covers the <a href="http://people.inf.ethz.ch/meiersi/downloads/private/bytestring-0.10.0.0-public/">new bytestring builder</a> (based on <a href="http://hackage.haskell.org/package/blaze-builder">blaze-builder</a>), which is currently under review for inclusion in the next release of the bytestring library.<br />
<br />
Here are the <a href="http://meiersi.github.com/HaskellerZ/meetups/2012%2001%2019%20-%20The%20bytestring%20library/slides.html">slides</a> and their corresponding <a href="http://meiersi.github.com/HaskellerZ/meetups/2012%2001%2019%20-%20The%20bytestring%20library/handout.html">handout version</a> for interested readers.Simon Meierhttp://www.blogger.com/profile/17873513558398372917noreply@blogger.com0tag:blogger.com,1999:blog-3734004713473160250.post-13799786344510014192011-03-10T15:43:00.000-08:002011-03-10T15:43:40.022-08:00Efficient UTF-8 EncodingBelgien beer and UTF-8 encoding: what a coincidence. Well, this post is just a quick post about the results for UTF-8 encoding 100'000'000 characters to /dev/null using the various options that our current Haskell ecosystem offers. It happensto be written after a very nice Belgian evening spent in the awesome <a href="http://www.minimum-bouldering.ch/">Minimum climbing gym</a> in <a href="http://maps.google.ch/maps?q=47.344828,8.535186&num=1&sll=47.343944,8.530114&sspn=0.015122,0.032015&ie=UTF8&ll=47.344246,8.535089&spn=0.004878,0.01134&z=17">Zurich Wollishofen</a>. Here's the <a href="https://github.com/meiersi/blaze-builder/blob/master/benchmarks/Utf8IO.hs">code</a> and the corresponding <a href="https://github.com/meiersi/blaze-builder/blob/master/Makefile#L38">Makefile</a>. The following results are the wall-time results obtained on my Core 2 Duo laptop with 2GB RAM using <a href="http://www.haskell.org/ghc/">GHC 7.0.2</a> on a 32bit Linux 2.6.32-29 installation.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">text 1.08s</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">blaze 1.70s</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">base 2.63s</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">via-text 5.97s</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">utf8-string 47.99s</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">utf8-light 40.18s</span><br />
<br />
The result for <a href="http://hackage.haskell.org/package/text"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">text</span></a> is the time it takes to UTF-8 encode a nicely packed lazy text value of length 100'000'000. The other times for <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">blaze</span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">base</span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">via-text</span>, and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">utf8-string</span> measure how long it takes to UTF-8 encode a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">String</span> of length 100'000'000 using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.1.4/doc/html/Blaze-ByteString-Builder-Char-Utf8.html">blaze-builder</a></span>, GHC 7.0.2's <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/System-IO.html#v:hSetEncoding">base library</a>, packing to a lazy text value and encoding it, and using the <a href="http://hackage.haskell.org/package/utf8-string">utf8-string</a> library. Sadly, <a href="http://hackage.haskell.org/package/utf8-light"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">utf8-light</span></a> supports only strict <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ByteString</span>s and, hence, it's not really usable to UTF-8 encode a String that long. Therefore, the time given for utf8-light is the time it takes to encode a String of length 1'000'000 a hundred times.<br />
<br />
My conclusions after this experiment are the following:<br />
<br />
<ol><li>Using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">[Word8]</span> list based encoding implementations is likely to result in suboptimal performance.</li>
<li>It is not worth packing a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">String</span> first to a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Text</span> value, if it is encoded right away again.</li>
<li>The current work I'm spending on integrating blaze-builder with the bytestring library is really worth the effort. Compared to the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">text</span> benchmark, which uses a better (packed) representation of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Char</span> sequences, we are only a factor 1.6 slower. Moreover, it might even be worth a try to replace the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">String</span> encoding functions in the base library by according <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.1.4/doc/html/Blaze-ByteString-Builder.html">Builder</a></span>s to gain these additional 50 percent of speed. As an additional benefit, we could even think of executing <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Builder</span>s directly on the buffer associated to a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Handle</span> and, therefore, output byte streams denoted by <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Builder</span>s <i>without any buffer allocations</i>.</li>
<li>It would be very interesting to see how well we fare against other languages. If a reader would implement the same benchmark in C, C++, Java, Python, ... I'd be very glad to publish the obtained results here.</li>
</ol>Simon Meierhttp://www.blogger.com/profile/17873513558398372917noreply@blogger.com1tag:blogger.com,1999:blog-3734004713473160250.post-44271194941600496362010-11-18T15:09:00.000-08:002010-11-18T15:09:33.393-08:00On the design of an efficient builder monoidLately (<a href="http://lambda-view.blogspot.com/2010/11/blaze-builder-library-faster.html">here</a> and <a href="http://lambda-view.blogspot.com/2010/11/defragmenting-lazy-bytestrings.html">here</a>), I reported on promising results about the efficient construction of lazy bytestrings with a large average chunk size using the blaze-builder library. In the meantime, I have done some more research, which further confirmed the strength of the design underlying the <a href="http://hackage.haskell.org/package/blaze-builder">blaze-builder</a> library. However before reporting on more results, I'd like to present you its design in the hope that you can apply the underlying ideas to other performance critical areas you are working on.<br />
<br />
The <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Foreign</span> module provides us with all the functions we require for working with pointers of various forms. This includes the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ForeignPtr</span> pointers used by strict bytestrings to reference their underlying buffer.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> import Foreign</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> import Data.Monoid</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> import qualified Data.ByteString as S</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> import qualified Data.ByteString.Internal as S</span><br />
<br />
<b>A builder should represent a sequence of bytes such that the following two objectives are maximized:</b><br />
<br />
<ul><li>the efficiency of writing the represented sequence of bytes to a sequence of buffers (e.g., the chunks of a lazy bytestring) and </li>
<li>the efficiency of appending two builders.</li>
</ul><br />
In our design, the work of a builder is handled by build steps. Once we tell a build step where it can start to write and where the buffer ends, the build step will write all the bytes it represents and return a build signal that tells us how to proceed.<br />
<br />
<ul><li>If the build signal is <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><b>Done pf</b></span>, then the build step has completed its work and the next free byte in the buffer is pointed to by <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">pf</span>. </li>
<li>If the build signal is <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><b>BufferFull requiredSize pf nextStep</b></span>, then the build step has filled the buffer up to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">pf</span> and now requires a new buffer with at least the size <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">requiredSize</span>. If we were creating a lazy bytestring, we would now ship of the full buffer as a chunk and allocate a new buffer that we can pass to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">nextStep</span>.</li>
<li>The build signal <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><b>InsertByteString pf bs nextStep</b></span>, tells us that the build step has filled the buffer up to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">pf</span> and would now like to insert a bytestring directly into the output sequence of buffers. The idea behind this signal is that it allows us to avoid copying large bytestrings.</li>
</ul><br />
<br />
A builder is just a build step parametrized over the build step that should be executed after the builder has done its output.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> type BuildStep = Ptr Word8 -- first free byte of buffer</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><span class="Apple-style-span" style="white-space: pre;"> </span> -> Ptr Word8 -- first byte after buffer</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><span class="Apple-style-span" style="white-space: pre;"> </span> -> IO BuildSignal</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br />
</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> data BuildSignal = -- next free byte</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> Done !(Ptr Word8) </span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> | BufferFull !Int !(Ptr Word8) !BuildStep</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> | InsertByteString !(Ptr Word8) !S.ByteString !BuildStep</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> </span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> newtype Builder = Builder (BuildStep -> BuildStep)</span><br />
<br />
Based on these definitions, we can easily define a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Monoid</span> instance that models concatenation of builders. An empty builder just returns the build step to be executed afterwards. We append two builders by telling the first builder that it should call the second builder once its done and by telling the second builder that it should call the continuation builder <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">k</span> once it is done.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> instance Monoid Builder where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> mempty = Builder id</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> mappend (Builder b1) (Builder b2) = Builder $ \k -> b1 (b2 k)</span><br />
<br />
Writing an actual builder requires some care, as we are working in the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">IO</span> monad and, hence, the safety belts are off. We describe the construction of builders that serialize values that do not have to be wrapped over buffer boundaries using the notion of a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Write</span>. A value <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Write size io</span> denotes an <i>atomic</i> write to a buffer of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">size</span> bytes, which can be executed by a call <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">io</span> with a pointer to the first byte that should be written.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> data Write = Write Int (Ptr Word8 -> IO ())</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br />
</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"></span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> writeWord8 :: Word8 -> Write</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> writeWord8 x = Write 1 (\pf -> poke pf x)</span><br />
<br />
Defining writes is simple. Constructing a builder from a write is also not difficult once you have understood how to define a function such that it can call the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">BufferFull</span> signal with a reference to itself.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWrite :: Write -> Builder</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWrite (Write size io) =</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> Builder step</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> step !k !pf !pe</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> | pf' <= pe = do io pf</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> k pf' pe</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> | otherwise = do return $ BufferFull size pf (step k)</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> pf' = pf `plusPtr` size</span><br />
<br />
Once such basic builder constructors are defined, the code quickly looses it's C-like style, while retaining good performance.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWord8 :: Word8 -> Builder</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWord8 = fromWrite . writeWord8</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br />
</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWord8s :: [Word8] -> Builder</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWord8s = mconcat . map fromWord8</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br />
</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> buildABC :: Builder</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> buildABC = fromWord8s [65..90]</span><br />
<br />
The missing piece is now a driver function that actually runs a builder. A very nice property of our builder design is that it <b>completely decouples the allocation strategy for the output buffers from the actual writing to them</b>. Hence, whatever buffer you have ready, you can tell a builder to fill it. The only caveat is that a build step may require a large buffer than you can provide. However, all builders I implemented up to now can be wrapped at almost every point. Therefore, they would even work with a very small output buffer and the above caveat likely never applies; i.e., expensive solutions to get rid of it are OK.<br />
<br />
An additional advantage of separating the buffer allocation strategy from the buffer filling is that the whole <b>state of the allocation strategy is kept out of the performance critical functions</b>; i.e., the concatenation of builders and their implementation itself. This is one of the core differences to the builder implementation provided by <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/src/Data-Binary-Builder.html#Builder">Data.Binary.Builder</a></span> from the binary package.<br />
<br />
The driver I implement here runs a builder and executes an <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">IO</span> action on each full buffer, which we represent by a strict bytestring. We can for example use this driver to send the bytes represented by a builder over the network using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/network/2.3/doc/html/Network-Socket-ByteString.html">Network.Socket.ByteString</a></span>. This driver is slightly simpler to implement than the generation of lazy bytestrings. Moreover, this driver has the nice property that no <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">unsafePerformIO</span> is involved, which means that there are no other semantic pitfalls than the ones already provided by the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">IO</span> monad ;-)<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> toByteStringIOWith :: Int -> (S.ByteString -> IO ()) -> Builder -> IO () </span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> toByteStringIOWith bufSize io (Builder b) = </span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fillBuffer bufSize (b finalStep)</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> finalStep pf _ = return $ Done pf</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fillBuffer !size step = do</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> S.mallocByteString size >>= fill</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fill fpbuf = do</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> let !pf = unsafeForeignPtrToPtr fpbuf</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> -- safe due to later reference of fpbuf</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> -- BETTER than withForeignPtr, as we lose a tail call otherwise</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> signal <- step pf (pf `plusPtr` size)</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> case signal of</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> Done pf' -> io $ S.PS fpbuf 0 (pf' `minusPtr` pf)</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> BufferFull minSize pf' nextStep -> do</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> io $ S.PS fpbuf 0 (pf' `minusPtr` pf)</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fillBuffer (max bufSize minSize) nextStep</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> InsertByteString pf' bs nextStep -> do</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> io $ S.PS fpbuf 0 (pf' `minusPtr` pf)</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> io $ bs</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fillBuffer bufSize nextStep</span><br />
<br />
The implementation is rather straightforward. We construct a final step that we can hand over to the builder. Then we allocate a buffer and fill it using the given step. We handle the signals as described above and recurse using a tail call to fill the next buffer. The tail call is important to get good performance and to not waste stack space. A misplaced (outermost) <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">withForeignPtr</span> can easily destroy this property. Therefore, I switched to using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">unsafeForeignPtrToPtr</span> and reasoning directly about when it is safe to extract the pointer of a foreign pointer.<br />
<br />
Lets test our driver with an artificially small chunk size.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> testABC :: IO ()</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> testABC = toByteStringIOWith 10 (putStrLn . show) buildABC</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><br />
</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> *BuilderDesign> testABC</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> "ABCDEFGHIJ"</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> "KLMNOPQRST"</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> "UVWXYZ"</span><br />
<br />
Yihaa, nice :-) Now, you have seen the core design of the builder monoid provided by he blaze-builder library. <b>It gains a lot of its performance from its simplicity and the option to safely break the abstraction and implement build steps directly when needed.</b> What I was hiding in the above definitions are the {-# UNPACK #-} pragmas and a (not yet so judicious) use of {-# INLINE #-} pragmas. See the <a href="https://github.com/meiersi/blaze-builder">source and the accompanying benchmarks of the blaze-builder library</a> for more information on them.<br />
<br />
Let me give two last examples of how we improve performance without loosing code maintainability. The key to these examples is the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Write</span> abstraction. Writing of a list of elements calls for a typical optimization: move invariant data out of the inner loop. We can implement it once and for all as follows.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWriteList :: (a -> Write) -> [a] -> Builder</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> fromWriteList write = </span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> \xs -> Builder $ step xs</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> step xs0 !k !pf0 !pe0 = go xs0 pf0</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> go [] !pf = k pf pe0</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> go xs@(x':xs') !pf</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> | pf' <= pe0 = io pf >> go xs' pf'</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> | otherwise = return $ BufferFull size pf (step xs k)</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> !pf' = pf `plusPtr` size</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> Write size io = write x'</span><br />
<br />
Using inlining, we can give the compiler enough information such that it can optimize the code for the actual implementations of calls like <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">fromWriteList writeWord8</span>.<br />
<br />
Another optimization often used in the blaze-builder library is to <b>ensure that costly operations are amortized over enough bytes being output</b>. For example, this is why it is OK for buffer wrapping to be a bit more expensive. For consecutive writes, we can reduce the cost of appending them using the following Monoid instance.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> instance Monoid Write where</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> mempty = Write 0 (const $ return ())</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> mappend (Write l1 f1) (Write l2 f2) = Write (l1 + l2) $ \ptr -> do</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> f1 ptr</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> f2 (ptr `plusPtr` l1)</span><br />
<br />
If the compiler knows all the implementations of the writes, then he will produce very nice straight-line code for builders constructed from mappend'ed writes. A typical example would be the serialization of a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Word16</span> represented by two bytes <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">b1</span> and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">b2</span>, as <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">fromWrite (writeWord8 b0 `mappend` writeWord8 b1).</span><br />
<br />
Well, that's it for now. I hope you did enjoy this excursion into some not always Haskell-like code. For me, the development of this library was (and still is) very entertaining and inspiring and I'd love to hear about how it fares in practice.Simon Meierhttp://www.blogger.com/profile/17873513558398372917noreply@blogger.com0tag:blogger.com,1999:blog-3734004713473160250.post-26412670438095297442010-11-10T13:40:00.000-08:002010-11-11T01:09:02.947-08:00Defragmenting lazy bytestringsIn the <a href="http://lambda-view.blogspot.com/2010/11/blaze-builder-library-faster.html">last post</a>, I introduced the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/package/blaze-builder-0.2.0.1">blaze-builder</a></span> library and stressed the point that it is important to ensure a large average chunk size for the constructed lazy bytestrings. In this post, I'll give a concrete example of why ensuring a large average chunk size matters.<br />
<br />
You probably know the nice <a href="http://hackage.haskell.org/package/zlib/">zlib</a> library that allows you to compress a lazy bytestring with a single call to<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> <a href="http://hackage.haskell.org/packages/archive/zlib/0.5.2.0/doc/html/Codec-Compression-Zlib.html#v:compress">compress</a> :: L.ByteString -> L.ByteString</span><br />
<br />
I will use compress to illustrate that small chunk sizes can be costly. Actually, they can be so costly that it is worth it to first "defragment" the lazy bytestring before compressing it. Using the blaze-builder library, defragmentation is easily defined as follows.<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> defragment :: L.ByteString -> L.ByteString</span><br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"> defragment = <a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.1/doc/html/Blaze-ByteString-Builder.html#v:toLazyByteString">toLazyByteString</a> . </span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.1/doc/html/Blaze-ByteString-Builder-ByteString.html#v:fromLazyByteString">fromLazyByteString</a></span><br />
<br />
The builder created using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">fromLazyByteString</span> copies the chunks up to a size of 8kb and insert them directly in the output stream otherwise. This way we can guarantee a minimal average chunk size of 4kb no matter when the output buffer is flushed due to a direct insertion of an 8kb block.<br />
<br />
The following plot shows the measured times in boxplot format for defragmenting ("compaction only"), direct compression, and compression with preceding compaction of 200kb of data represented as a lazy bytestring for different fixed chunk sizes.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiffRAoQACIZorEeWbgq9jk0XeegwkXqO6VvEzil4fyABg-PPXpQ9M2TRzfciCc0A8fkRHKHM37XC-TjzUwjTQSPixnNISlQOV9EEnkzHsyZf8Gbe3wDZDPRzAtayZcFp0BPCBDNFnT38U/s1600/compressing-200kb-of-data-scaling-%2528boxplot%252Clog-x%2529640x480.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiffRAoQACIZorEeWbgq9jk0XeegwkXqO6VvEzil4fyABg-PPXpQ9M2TRzfciCc0A8fkRHKHM37XC-TjzUwjTQSPixnNISlQOV9EEnkzHsyZf8Gbe3wDZDPRzAtayZcFp0BPCBDNFnT38U/s1600/compressing-200kb-of-data-scaling-%2528boxplot%252Clog-x%2529640x480.png" /></a></div><br />
As in my previous post, the benchmarks were measured on a Core2 Duo T7500 with 2GB RAM and Linux 2.6.32-24 i686 and GHC 6.12.3. The corresponding measurement log can be found <a href="https://gist.github.com/671490#file_compression_benchmark_results.txt">here</a>. A log-log plot exhibiting more information on the behaviour of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">defragment</span> for larger chunk sizes can be found <a href="http://picasaweb.google.com/lh/photo/HQkeAvCnf1qDzTLmpJoJ1G3HmNKmvCZV2w4tMH0gGeY?feat=directlink">here</a>.<br />
<br />
The above plot shows that <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">compress</span> profits heavily from defragmentation. Sadly, I do not yet know the cause for the significant slowdown of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">compress</span> for lazy bytestrings with a small average chunk size. I guess it is a combined effect of the cost of the FFI calls (how big are they actually?) and perhaps some implementation overhead stemming from large amount of state being threaded through the implementation of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">compress</span>. Comments and clarifications are very much welcome.<br />
<br />
Note that as for filesystems, an even better solution than using regular defragmentation is to <b>avoid fragmentation in the first place</b>, which you can achieve by <b>using a Builder for constructing lazy bytestrings</b>. Note also that <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">fromLazyByteString</span> currently does not wrap bytestrings around buffer boundaries, which results in some unnecessarily spilled memory for medium (1kb - 8kb) chunk sizes. I'll implement that in the near future and will post the new measurements here.Simon Meierhttp://www.blogger.com/profile/17873513558398372917noreply@blogger.com1tag:blogger.com,1999:blog-3734004713473160250.post-4960966047498105352010-11-07T05:09:00.000-08:002010-11-07T10:01:43.842-08:00The blaze-builder library: faster construction of bytestrings<div>Hi, I am <a href="http://people.inf.ethz.ch/meiersi">Simon Meier</a>, a swiss Haskell enthusiast currently pursuing his PhD in computer science at <a href="http://www.ethz.ch/">ETH Zurich</a>. In this blog post, I'll introduce you to the <a href="http://hackage.haskell.org/package/blaze-builder">blaze-builder</a> library.<br />
<br />
</div><div>The blaze-builder library provides you with a Builder type that you can use to <i>efficiently</i> construct sequences of bytes represented in a packed form as a strict or lazy bytestring. Hence, typical use cases for a Builder are saving your application data in a space efficient binary form to a file or sending a response to some request over the network.<br />
<br />
</div><div>Probably, you know about the <a href="http://hackage.haskell.org/package/binary">binary</a> package, which also provides a Builder type in the <a href="http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Builder.html">Data.Binary.Builder</a> module targeting exactly the same usecase as our Builder. This is no coincidence. During this year's Google Summer of Code, <a href="http://jaspervdj.be/">Jasper Van der Jeugt</a> and I developed the blaze-builder library to overcome performance shortcomings of Data.Binary.Builder with respect to the specific needs of the <a href="http://jaspervdj.be/blaze">blaze-html</a> HTML generation library. Since then, I have restructured the blaze-builder library to serve as a drop-in replacement for Data.Binary.Builder, <i>which it improves upon with respect to both speed as well as expressivity</i>.<br />
<br />
</div><span class="Apple-style-span"><span class="Apple-style-span" style="font-size: x-large;">Usage example</span></span><br />
We start by importing the necessary modules. We also define a convenient abbreviation for <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">mappend</span>, which actually will become part of the base library according to rumors I heard at this years ZuriHac.<br />
<br />
<script src="https://gist.github.com/664979.js?file=BuilderImports.hs">
</script> Our example is about serializing a <i>very</i> simple representation of a person to a sequence of bytes. As usual, this serialization also requires us to fix the encoding format. We encode strings using UTF-8 and prefix them with their length encoded as a 32bit little-endian integer to make parsing unambiguous. We also encode the age of a person as a 32bit little-endian integer. I guess the code speaks for itself.<br />
<br />
<script src="https://gist.github.com/664979.js?file=SerializePeople.hs">
</script> The above code is typical for serialization code based on builders. One uses the predefined functions for creating builders with a fixed encoding format from standard Haskell values. These builders are then combined using the functions from the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Monoid</span> typeclass. Builders essentially store the recipe for building their corresponding sequence of bytes. Once one needs a concrete representation of this sequence of bytes, one just calls <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder.html#v:toLazyByteString">toLazyByteString</a></span> or <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder.html#v:toByteString">toByteString</a></span> to execute that recipe.<br />
<br />
The benefit of using builders to construct a bytestring is twofold: <i>First</i>, appending two builders is an <i>O(1) </i>operation, which is also efficient in absolute terms, as it corresponds to a single function call. <i>Second</i>, when constructing the resulting lazy bytestring the blaze-builder makes sure that the average chunk size is large. A large average chunk size is important to make good use of cache prefetching in later processing steps (e.g. compression) and it also reduces the sytem call overhead when writing the resulting lazy bytestring to a file or sending it over the network.<br />
<br />
For example, the above code results in the following sequence of chunk sizes.<br />
<br />
<script src="https://gist.github.com/664979.js?file=ExampleChunkSizes.hs">
</script> The 170001 bytes represented by <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">lazyBinaryCloneVillage</span> feature an average chunk size of ~24kb. The first buffer is only ~4kb large, because for short output sequences the buffer allocation cost is significant. <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">toLazyByteString</span> compensates this cost by allocating the first buffer with the minimal expected chunk size. Note that these chunk sizes reflect the default settings of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">toLazyByteString</span>, which is optimized to yield efficient and well-chunked results for all lengths of output sequences. If you know more about your typical serialization tasks, then you can tune these settings to your favor.<br />
<br />
Speaking of efficiency, I'm quite sure you would also like to see some benchmark figures. I'm not going to present the figures for the above example. Not because they are embarassing; they are not. However, without good competition, the interpretation of benchmark figures is difficult; and currently, I don't know of a good competitor for the above usecase. However, we can also use builders to pack a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">[Word8]</span> list into a strict or lazy bytestring; and there, we definitely do have good competitors.<br />
<br />
<span class="Apple-style-span"><span class="Apple-style-span" style="font-size: x-large;">Packing [Word8]</span></span><br />
For our benchmark, we use the following implementations for packing <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">[Word8]</span> lists. <script src="https://gist.github.com/664979.js?file=PackingWord8Lists.hs">
</script><br />
The implementations <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/Data-ByteString.html#v%3Apack">S.pack</a></span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/Data-ByteString-Lazy.html#v%3Apack">L.pack</a></span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">declPackLazy</span>, and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">binaryDeclPackLazy</span> are trivial. The implementations <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packStrict</span> and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packLazy</span> make use of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder-Word.html#v:fromWord8s">fromWord8s</a> :: [Word8] -> Builder</span>, which is a very efficient function to serialize lists of bytes, as the following plot shows.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih3SaV2azbtqhnDnC77rB2PU7kkIes3tiPr65GORE3SgcyVouoAFVsODl3UVBdDKOnqUJ6KIEoiM-J364uoAifbE-uF5dAk7AbsDA_nWDQ0RfIGAKJ1fWV9h7Iqefv092gEcJ1_NBGE3o/s1600/packing-%5Bword8%5D-scaling-640x480.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih3SaV2azbtqhnDnC77rB2PU7kkIes3tiPr65GORE3SgcyVouoAFVsODl3UVBdDKOnqUJ6KIEoiM-J364uoAifbE-uF5dAk7AbsDA_nWDQ0RfIGAKJ1fWV9h7Iqefv092gEcJ1_NBGE3o/s1600/packing-%5Bword8%5D-scaling-640x480.png" /></a></div><br />
The plot is a log-log plot of the mean time for packing <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">[Word8]</span> lists using the above implementations when being run on a Core2 Duo T7500 with 2GB RAM and Linux 2.6.32-24 i686 and GHC 6.12.3. I created this plot by adapting <a href="http://www.serpentine.com/blog/">Bryon O'Sullivan's</a> excellent <a href="http://hackage.haskell.org/package/criterion">Criterion</a> benchmarking library to handle scaling benchmarks (cf. <a href="https://github.com/meiersi/blaze-builder/blob/master/Criterion/ScalingBenchmark.hs">ScalingBenchmarks.hs</a>). In the spirit of Criterion, I also generate a <a href="http://en.wikipedia.org/wiki/Box_plot">boxplot</a> version for every scaling benchmark (using more transparent lines to draw the quartiles and whiskers), which allows us to judge the quality of the measurements. The <a href="http://picasaweb.google.com/lh/photo/ABkR1Zurb_MQVHoOJWO4jG3HmNKmvCZV2w4tMH0gGeY?feat=directlink">boxplot version of the above plot</a> shows that nothing went wrong during its measurement.<br />
<br />
Note that the mean times are plotted with respect to a logarithmic scale. Hence, a constant difference between two graphs means a <i>constant factor improvement</i>. As you can see from the <a href="https://gist.github.com/664979#file_packing_word8_lists_results.txt">measurement log</a>, using blaze-builder is a definitive win for output sequences longer than 1kb: <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packStrict</span> beats <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">S.pack</span> by almost a factor 2 and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packLazy</span> beats <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">L.pack</span> by a factor 10 and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">binaryDeclPackLazy</span> by a factor 92 (!).<br />
<br />
The crucial ingredient for this improvement is the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">fromWord8s</span> function. <script src="https://gist.github.com/664979.js?file=fromWord8s.hs">
</script> It is constructed using the <a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder-Write.html#t:Write"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Write</span> abstraction</a> Jasper introduced during his work on blaze-html. The function <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder-Write.html#v:fromWrite8List">fromWrite8List</a></span> forces and writes eight list elements at a time, which allows the compiler to bundle the actual writes to the output buffer.<br />
<br />
For shorter output sequences, the improvement gained from using blaze-builder gets smaller and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">S.pack</span> is even faster for very short sequences. The following plot, its <a href="http://picasaweb.google.com/lh/photo/vr9lrsLdloyQion59V-jIW3HmNKmvCZV2w4tMH0gGeY?feat=directlink">boxplot version</a>, and the <a href="https://gist.github.com/664979#file_packing_short_word8_lists_results.txt">measurement log</a> give a more detailed comparison for such short sequences.<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3eAa98ghDtPhi72FgNZEwESYKNbiFmpbp33O9ahocnWntxC3cNKfUTNtPs8cWuDOMcp3CzFX_aHIGIiEYlXgW5gkFPepqTnTdv0NNwwLD4dlxSZslS60ueEpyD-1LqsKm01YFPhyphenhyphenqiRI/s1600/packing-short-%5Bword8%5D-lists-scaling-640x480.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3eAa98ghDtPhi72FgNZEwESYKNbiFmpbp33O9ahocnWntxC3cNKfUTNtPs8cWuDOMcp3CzFX_aHIGIiEYlXgW5gkFPepqTnTdv0NNwwLD4dlxSZslS60ueEpyD-1LqsKm01YFPhyphenhyphenqiRI/s1600/packing-short-%5Bword8%5D-lists-scaling-640x480.png" /></a><br />
<br />
<br />
<br />
<br />
The results are not surprising when comparing the implementations: <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packStrict</span> uses <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder.html#v:toByteString">toByteString</a></span>, which simply runs <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder.html#v:toLazyByteString">toLazyByteString</a></span> and copies all chunks into a single buffer of the appropriate size. Hence, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packStrict</span> is always slightly slower than <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packLazy</span>. The <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/src/Data-ByteString.html#pack">S.pack</a></span> function from <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Data.ByteString</span> works in two passes over the input list: first, it determines the length of the list and then it copies all bytes to the allocated buffer. Traversing linked lists of bytes is costly and pays off only for very short lists, as there the output buffer allocation cost is dominant. The peak of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">packLazy</span> at 64 bytes stems from the fact that it first allocates a 64 byte buffer which is copied to a 4kb buffer once its clear that more than 64 bytes are output. This is done to compensate the buffer allocation cost for very short output sequences. It can be switched off using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://hackage.haskell.org/packages/archive/blaze-builder/0.2.0.0/doc/html/Blaze-ByteString-Builder.html#v:toLazyByteStringWith">toLazyByteStringWith</a></span>, if required.<br />
<br />
<span class="Apple-style-span" style="font-size: x-large;">Conclusions</span><br />
The blaze-builder library provides an <i>expressive and efficient</i> way to construct both lazy as well as strict bytestrings. The <a href="https://github.com/meiersi/blaze-builder/tree/master/benchmarks">accompanying benchmarks</a> show that it improves (often significantly) in all cases over <a href="http://hackage.haskell.org/packages/archive/binary/0.5.0.2/doc/html/Data-Binary-Builder.html"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Data.Binary.Builder</span></a> from the <a href="http://hackage.haskell.org/package/binary">binary</a> package. The benchmarks presented in this post also show that the implementation of blaze-builder compares favorably against special purpose functions for packing <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">[Word8]</span> lists; <i>on a Core2 Duo T7500 with 2GB RAM and Linux 2.6.32-24 i686 and GHC 6.12.3. </i>Yeah, that's what the benchmarks state ;-). However, I expect that the conclusions drawn from them stay also valid for most other settings. For example, the GHC-7.0.1 release candidate makes D<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ata.ByteString.Builder</span> run a bit faster, but still not as fast as blaze-builder.<br />
<br />
During the work on blaze-html, I learned from several benchmarks that ensuring a large average chunk size is <i>very important</i> for lazy bytestrings to be efficient. However, many encoding functions on Hackage produce bytestrings or lazy bytestrings. Hence, we have to copy their result again to guarantee large average chunk sizes, which is a waste of resources. Hence, I suggest that encoding functions produce a builder instead of strict or lazy bytestrings. Apart from guaranteeing a fast append and a large average chunk size, this change also simplifies and generalizes the encoding code, as it separates the buffer allocation strategy from the encoding function.<br />
<br />
In order for such a change to be effective, I suggest that the <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring">bytestring</a> library itself provides an implementation of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Data.ByteString.Builder<span class="Apple-style-span" style="font-family: 'Times New Roman';">,</span> </span>which would provide a blessed way to incrementally create bytestrings. The blaze-builder library offers one possible implementation path for such a bytestring builder. If the community would see it fit, then I'd be happy to port the builder parts to the bytestring library. The string encodings currently provided by blaze-builder would then move into their own libraries.<br />
<br />
Well that's it for now. I will publish more of the experiments I have done during the work on blaze-builder once I find some more time. I'm also looking very much forward to your feedback.<br />
<br />
Happy packing :-pSimon Meierhttp://www.blogger.com/profile/17873513558398372917noreply@blogger.com5