I am currently researching various options to do compression in Hadoop and found the following articles useful:
- http://comphadoop.weebly.com/index.html
- Slide share = http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2
- Splitable LZO = http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
- Read this Microsoft article: http://download.microsoft.com/download/1/C/6/1C66D134-1FD5-4493-90BD-98F94A881626/Compression%20in%20Hadoop%20(Microsoft%20IT%20white%20paper).docx
- A custom codec to be compiled on your environment (Linux)
- The library files need to be copied to all nodes
- Update the Hadoop site config file
- Additionally you will need to refer to your codec in your mapreduce
- If Hive is used, your table creation command has to refer to LZO compression codec in both READ and WRITE
I get a lot of great information from this blog. Thank you for your sharing this informative blog. Just now I have completed hadoop certification course at a leading academy. If you are interested to learn Hadoop Training in Chennai visit FITA IT training and placement academy.
ReplyDelete