Posts Tagged WinZip

Why does the Microsoft .NET implementation of GZip compression suck?

I’ve been working on a self contained patch generator using C#. For the patch data payload, I wanted to use GZip (or deflate) to compress the payload, this way if you generated a patch for a 1MB file, the patch file wouldn’t be +1MB (5 bytes per change).

I did some quick tests to compare compression ratios on a simulated data payload:

Original Uncompressed Payload: 288k
GZipped using System.IO.Compression:
171k
Zipped using WinZip with “Super Fast” Compression: 105k
7-Zipped using 7-Zip with “Maximum” Compression: 61k

I had to do some research and dig into the MSDN libraries before the answer was finally revealed from the horses mouth:

“The compression functionality in DeflateStream and GZipStream is exposed as a stream. Data is read in on a byte-by-byte basis, so it is not possible to perform multiple passes to determine the best method for compressing entire files or large blocks of data. The DeflateStream and GZipStream classes are best used on uncompressed sources of data. If the source data is already compressed, using these classes may actually increase the size of the stream.”

It’s a bit disappointing that Microsoft itself would not recommend it’s own built in compression routine, and even go so far to say that it may “actually increase the size of the stream”. I hope in the upcoming version of .NET 3.0, they address some shortcomings of this area of the framework. Perhaps an exposed method where you can pass a byte[] which can be compressed using multiple passes and Adaptive Huffman Encoding.

I know there are several alternatives out there, such as assemblies which support the ZIP compression format, but wasn’t the goal of .NET to move away from DLL hell? ;-)

In the meantime, I’ll probably be sticking with the Microsoft implementation of GZip until I take some time to create my own Huffman encoding routine. I think a good place to start would be this book titled Data Compression: The Complete Reference. I remember thumbing through it once at a Borders book store and thinking it was a really great reference. At the time though, I had no use for such a resource.

Quick note to my readers: I understand that there are plenty of online resources for Adaptive Huffman Encoding and other compression methods using C# that have already been created. Personally, I learn by doing. So when I do my own research and create something from the ground up it really helps me understand how it all works.

, , , , ,

1 Comment