Home > Software Development, Virtualization > Best Tool to Compress Virtual Machines

Best Tool to Compress Virtual Machines

While working in virtualized environments, we need to pass around virtual machines (a.k.a. virtual appliances) from time to time. Most of the virtual machines I’ve seen for downloading are compressed to save storage and network bandwidth.

Not all the compression algorithms are created equal in terms of compression ratio, compressing speed, and decompressing speed. In most cases, it doesn’t really matter that much with documents and small programs. But it matters a lot with virtual machines whose virtual disk files are much larger than normal files. Any small percentage improvement can result in significant saving on storage and bandwidth.

Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.

I recently compared several compression algorithms with a virtual machine based on Linux which is the main OS type of virtual machine as should. The following table is what I found. Note that this does not mean to be a comprehensive comparison.

Format Size Compare
Raw 1,052,683,412  
7z 280,234,852 100%
zip 363,012,043 129.5%
bzip2 363,169,420 129.6%
gzip 360,911,625 128.8%
tar.gzip 363,008,979 129.5%

As the table shows, the 7Z has by far the best compression ratio for virtual machines with about 30% smaller size than others like Zip. This is actually consistent with the data with other types of files as shown in the 7Zip home page.

Given that the 7Zip is better and free (how can you beat that?), it should be the de facto tool for compressing all virtual appliances. Should you find better tools, please feel free to share.

It’s worthwhile to point out that 7Zip has not only a nice GUI, but also a command line interface which allows you to achieve same things as with GUI. This helps a lot for integrating it with your automatic release system.

7Zip has also been ported to Linux with command line only, which is good enough for most use cases like supporting Linux based build system. Check it out here. Update: IBM DeveloperWorks has an article How to use 7zip on Linux command Line.

  1. June 11th, 2012 at 01:17 | #1

    Hi Steve,

    Don’t you have the compress time written down somewhere, to have the best compression/time ratio comparison too ?

    Would be great 😉

  2. June 11th, 2012 at 01:31 | #2

    Hey Timo,
    Good idea. I did not find a noticeable difference between 7Zip and Zip on the same virtual machine I tried. Compression/Extracing times in my opinion is not as important as the size ratio unless the difference is too big, which is not the case here.
    Steve

  3. xabu
    June 11th, 2012 at 01:45 | #3

    can you try VeeamZip and post the result?

  4. June 11th, 2012 at 01:46 | #4

    Yes, that makes sense, your VM is pretty small here, that was why I was asking :-)

    I’m unfortunately dealing with huge vApps/VMs, so it can take a while.

  5. June 11th, 2012 at 01:57 | #5

    Sure. Do you know where I can get it?
    Steve

  6. June 11th, 2012 at 02:02 | #6

    OK. I now see where you come from. When your VM gets bigger, any difference in time can be amplified.
    Since you have huge VMs, do you want to give them a try and let us know how different time wise? I can add a new table to your credit. :-)
    Steve

  7. June 11th, 2012 at 02:58 | #7

    @Steve and @Timo you may want to look at eXdupe -> http://www.exdupe.com/
    It uses deduplication techniques as opposed to compression techniques such 7ZIP’s LZMA (?) algorithm…

    Cheers,
    Didier

  8. Frank Terhaar-Yonkers
    June 11th, 2012 at 07:29 | #8

    Far more important than the compression algorithm/tool is the content of the filesystem(s)/vmdk. This includes not only filesystem space allocated to files but any unallocated space. Before I compress a VM I run the following:
    for i in list-of-file-system-root-dirs; do
    cd $i
    dd if=/dev/zero of=aaaaaaaaaajunk && /bin/rm -f aaaaaaaaaajunk
    done
    poweroff

    Now, you obviously need to make a decision if you want the disk space fully allocated or not as this is not good for incrementally allocated vmdk. However even if the filesystems are large, these VMs will compress down to practically nothing.

    Enjoy – Frank

  9. June 11th, 2012 at 21:48 | #9

    Thanks Didier,
    Just tried the exdupe, and got 379,240,163 bytes for the same virtual machine. I think it may perform the best for multiple similar virtual machines.
    Steve

  10. xabu
    June 14th, 2012 at 12:11 | #10

    @Steve Jin
    it’s available for free on veeam website

  11. June 14th, 2012 at 20:39 | #11

    Thanks! Will check it out and get back to you.
    Steve

  12. Peter
    December 29th, 2016 at 04:32 | #12

    Hi, I did a lot of testing: most important is to cleanup _inside_ the image, then a 7zip on ultra level will be the best you need. Find my detailed post here with updated benchmarking data:
    https://wiitez.blogspot.sg/2016/12/vm-image-compression-and-optimization.html

  1. No trackbacks yet.