Best Tool to Compress Virtual Machines
While working in virtualized environments, we need to pass around virtual machines (a.k.a. virtual appliances) from time to time. Most of the virtual machines I’ve seen for downloading are compressed to save storage and network bandwidth.
Not all the compression algorithms are created equal in terms of compression ratio, compressing speed, and decompressing speed. In most cases, it doesn’t really matter that much with documents and small programs. But it matters a lot with virtual machines whose virtual disk files are much larger than normal files. Any small percentage improvement can result in significant saving on storage and bandwidth.
Time to learn how to "Google" and manage your VMware and clouds in a fast and secure
HTML5 AppI recently compared several compression algorithms with a virtual machine based on Linux which is the main OS type of virtual machine as should. The following table is what I found. Note that this does not mean to be a comprehensive comparison.
Format | Size | Compare |
Raw | 1,052,683,412 | |
7z | 280,234,852 | 100% |
zip | 363,012,043 | 129.5% |
bzip2 | 363,169,420 | 129.6% |
gzip | 360,911,625 | 128.8% |
tar.gzip | 363,008,979 | 129.5% |
As the table shows, the 7Z has by far the best compression ratio for virtual machines with about 30% smaller size than others like Zip. This is actually consistent with the data with other types of files as shown in the 7Zip home page.
Given that the 7Zip is better and free (how can you beat that?), it should be the de facto tool for compressing all virtual appliances. Should you find better tools, please feel free to share.
It’s worthwhile to point out that 7Zip has not only a nice GUI, but also a command line interface which allows you to achieve same things as with GUI. This helps a lot for integrating it with your automatic release system.
7Zip has also been ported to Linux with command line only, which is good enough for most use cases like supporting Linux based build system. Check it out here. Update: IBM DeveloperWorks has an article How to use 7zip on Linux command Line.
Hi Steve,
Don’t you have the compress time written down somewhere, to have the best compression/time ratio comparison too ?
Would be great 😉
Hey Timo,
Good idea. I did not find a noticeable difference between 7Zip and Zip on the same virtual machine I tried. Compression/Extracing times in my opinion is not as important as the size ratio unless the difference is too big, which is not the case here.
Steve
can you try VeeamZip and post the result?
Yes, that makes sense, your VM is pretty small here, that was why I was asking
I’m unfortunately dealing with huge vApps/VMs, so it can take a while.
Sure. Do you know where I can get it?
Steve
OK. I now see where you come from. When your VM gets bigger, any difference in time can be amplified.
Since you have huge VMs, do you want to give them a try and let us know how different time wise? I can add a new table to your credit.
Steve
@Steve and @Timo you may want to look at eXdupe -> http://www.exdupe.com/
It uses deduplication techniques as opposed to compression techniques such 7ZIP’s LZMA (?) algorithm…
Cheers,
Didier
Far more important than the compression algorithm/tool is the content of the filesystem(s)/vmdk. This includes not only filesystem space allocated to files but any unallocated space. Before I compress a VM I run the following:
for i in list-of-file-system-root-dirs; do
cd $i
dd if=/dev/zero of=aaaaaaaaaajunk && /bin/rm -f aaaaaaaaaajunk
done
poweroff
Now, you obviously need to make a decision if you want the disk space fully allocated or not as this is not good for incrementally allocated vmdk. However even if the filesystems are large, these VMs will compress down to practically nothing.
Enjoy – Frank
Thanks Didier,
Just tried the exdupe, and got 379,240,163 bytes for the same virtual machine. I think it may perform the best for multiple similar virtual machines.
Steve
@Steve Jin
it’s available for free on veeam website
Thanks! Will check it out and get back to you.
Steve
Hi, I did a lot of testing: most important is to cleanup _inside_ the image, then a 7zip on ultra level will be the best you need. Find my detailed post here with updated benchmarking data:
https://wiitez.blogspot.sg/2016/12/vm-image-compression-and-optimization.html