.NET StringBuilder — Fast, but not as fast as you think!


I recently ran into a situation where I was tasked to profile some .NET code and do some optimizations anywhere hot spots popped up. I was amazed to find out that one of the BIGGEST offenders in our code block was a simple call to StringBuilder.Append(char). I had to take a step back and scratch my head and wonder if my profiler was confused.

I re-ran some tests using the StopWatch class to hard code some metrics into the application and they also confirmed the findings. What’s up? How could a class that everyone says you can use to your hearts content when it came to string concatenation was failing me?

Turns out, it was a mix of misuse and a common misconception about the StringBuilder Class.

One of the first things you learn while picking up .NET is that the StringBuilder Class is your friend when it comes to concatenating large strings in memory. It beats the pants off of String.Concat and String.Format, while also being a mutable object in the Framework utilizing an in-memory buffer.

I used JetBrains dotTrace to help profile the application and it was very evident from the get-go that StringBuilder was causing the whole process to slow down.

The nature of my application was basically reading in a text buffer 1 character as a time, and using the StringBuilder as an output buffer. So for a 1k file, The method Append(char) would be called 1024 times. A 600k file would call Append(char) 614,400 times.

So why was I getting burned in execution time? The issue turned out to be two fold.

First, there’s overhead cost to the call. I don’t care how lightweight your method is, if you’re calling it SIX HUNDRED THOUSAND TIMES, it’s going to take a bit. Let alone a method who handles a string buffer in memory and string manipulation. So basically, no matter how fast StringBuilder actually is, it’s not a free call and you should consider the fact that the call still has overhead when architecting your solution.

Architecture brings me to my second point. While writing each character individually made sense initally, it seems that it was just lazy :P The optimized route would have been calling Append with a SUBSTRING of the input buffer, this way we avoid the overhead of multiple calls by writing all the neccisary data in one big blob.

So 600,000 calls to StringBuilder.Append(char) becomes only a few hundred calls to StringBuilder.Append(string.Substring(start, count)). Sure, the Substring Virtual Method itself has overhead, but it’s still less than the thousands of calls to Append(char) that we’re saving ourself :)

Conclusion?

StringBuilder is fast, but it’s not free. Take this into consideration when utilizing it while appending large data sets in small chunks. :)

Cheers!

, , , , , , , , ,

  1. #1 by agnain on August 8, 2009 - 3:27 AM

    If you know the exact length of the output string you can use a char array.

    This way you can set each char at its position without any buffer resizing (which occurs in StringBuilder.Append unless you defined the StringBuilder’s length on instanciation).

    If you don’t know the exact length, you may approximate it so that the StringBuilder resizes its buffer less.

    I guess your app changes every read char isn’t it ? If not I really don’t understand why the hell you would read chars one by one to write it exactly the same one by one with a StringBuilder.

    BTW you could try many other ways of doing what you do, with Streams, buffering, etc.

    What is really your app doing with these chars ?

(will not be published)

Powered by WP Hashcash