Best Tool to Compress Virtual Machines

While working in virtualized environments, we need to pass around virtual machines (a.k.a. virtual appliances) from time to time. Most of the virtual machines I’ve seen for downloading are compressed to save storage and network bandwidth.

Not all the compression algorithms are created equal in terms of compression ratio, compressing speed, and decompressing speed. In most cases, it doesn’t really matter that much with documents and small programs. But it matters a lot with virtual machines whose virtual disk files are much larger than normal files. Any small percentage improvement can result in significant saving on storage and bandwidth.

Lost VMs or Containers? Too Many Consoles? Too Slow GUI? Time to learn how to "Google" and manage your VMware and clouds in a fast and secure HTML5 App.

I recently compared several compression algorithms with a virtual machine based on Linux which is the main OS type of virtual machine as should. The following table is what I found. Note that this does not mean to be a comprehensive comparison.

Format Size Compare
Raw 1,052,683,412  
7z 280,234,852 100%
zip 363,012,043 129.5%
bzip2 363,169,420 129.6%
gzip 360,911,625 128.8%
tar.gzip 363,008,979 129.5%

As the table shows, the 7Z has by far the best compression ratio for virtual machines with about 30% smaller size than others like Zip. This is actually consistent with the data with other types of files as shown in the 7Zip home page.

Given that the 7Zip is better and free (how can you beat that?), it should be the de facto tool for compressing all virtual appliances. Should you find better tools, please feel free to share.

It’s worthwhile to point out that 7Zip has not only a nice GUI, but also a command line interface which allows you to achieve same things as with GUI. This helps a lot for integrating it with your automatic release system.

7Zip has also been ported to Linux with command line only, which is good enough for most use cases like supporting Linux based build system. Check it out here. Update: IBM DeveloperWorks has an article How to use 7zip on Linux command Line.

This entry was posted in Software Development, Virtualization and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

11 Comments

  1. Posted June 11, 2012 at 1:17 am | Permalink

    Hi Steve,

    Don’t you have the compress time written down somewhere, to have the best compression/time ratio comparison too ?

    Would be great 😉

  2. Posted June 11, 2012 at 1:31 am | Permalink

    Hey Timo,
    Good idea. I did not find a noticeable difference between 7Zip and Zip on the same virtual machine I tried. Compression/Extracing times in my opinion is not as important as the size ratio unless the difference is too big, which is not the case here.
    Steve

  3. xabu
    Posted June 11, 2012 at 1:45 am | Permalink

    can you try VeeamZip and post the result?

  4. Posted June 11, 2012 at 1:46 am | Permalink

    Yes, that makes sense, your VM is pretty small here, that was why I was asking :-)

    I’m unfortunately dealing with huge vApps/VMs, so it can take a while.

  5. Posted June 11, 2012 at 1:57 am | Permalink

    Sure. Do you know where I can get it?
    Steve

  6. Posted June 11, 2012 at 2:02 am | Permalink

    OK. I now see where you come from. When your VM gets bigger, any difference in time can be amplified.
    Since you have huge VMs, do you want to give them a try and let us know how different time wise? I can add a new table to your credit. :-)
    Steve

  7. Posted June 11, 2012 at 2:58 am | Permalink

    @Steve and @Timo you may want to look at eXdupe -> http://www.exdupe.com/
    It uses deduplication techniques as opposed to compression techniques such 7ZIP’s LZMA (?) algorithm…

    Cheers,
    Didier

  8. Frank Terhaar-Yonkers
    Posted June 11, 2012 at 7:29 am | Permalink

    Far more important than the compression algorithm/tool is the content of the filesystem(s)/vmdk. This includes not only filesystem space allocated to files but any unallocated space. Before I compress a VM I run the following:
    for i in list-of-file-system-root-dirs; do
    cd $i
    dd if=/dev/zero of=aaaaaaaaaajunk && /bin/rm -f aaaaaaaaaajunk
    done
    poweroff

    Now, you obviously need to make a decision if you want the disk space fully allocated or not as this is not good for incrementally allocated vmdk. However even if the filesystems are large, these VMs will compress down to practically nothing.

    Enjoy – Frank

  9. Posted June 11, 2012 at 9:48 pm | Permalink

    Thanks Didier,
    Just tried the exdupe, and got 379,240,163 bytes for the same virtual machine. I think it may perform the best for multiple similar virtual machines.
    Steve

  10. xabu
    Posted June 14, 2012 at 12:11 pm | Permalink

    @Steve Jin
    it’s available for free on veeam website

  11. Posted June 14, 2012 at 8:39 pm | Permalink

    Thanks! Will check it out and get back to you.
    Steve

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  • NEED HELP?


    My company has created products like vSearch ("Super vCenter"), vijavaNG APIs, EAM APIs, ICE tool. We also help clients with virtualization and cloud computing on customized development, training. Should you, or someone you know, need these products and services, please feel free to contact me: steve __AT__ doublecloud.org.

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.