Tuesday, January 28, 2014

Compression Techniques for Hadoop

I am currently researching various options to do compression in Hadoop and found the following articles useful:
My project needs are = should have minimum performance impact + should give decent compression gains. We looked at Spilatble LZO. However this requires
  • A custom codec to be compiled on your environment (Linux)
  • The library files need to be copied to all nodes
  • Update the Hadoop site config file
  • Additionally you will need to refer to your codec in your mapreduce
  • If Hive is used, your table creation command has to refer to LZO compression codec in both READ and WRITE
With all these requirements, my client decided to use Bzip2 as it is native to Hadoop and does not require additional libraries to be installed. I have attached a quick comparison of the 2 solutions:

1 comment:

  1. I get a lot of great information from this blog. Thank you for your sharing this informative blog. Just now I have completed hadoop certification course at a leading academy. If you are interested to learn Hadoop Training in Chennai visit FITA IT training and placement academy.

    ReplyDelete