Hadoop Simplified: January 2014

Thursday, January 30, 2014

How To get Job Status?

There are several ways to get status of your Hadoop Jobs. I will discuss the Hadoop 2.0 options here

Hadoop Cluster via Yarn Application Manager. Example - http://serverhostingyarn:8088/cluster/app/application_1390931417866_0020
hadoop fs -ls /user/history/done/ | grep "1390931417866_0020"
yarn command line to get status

Tuesday, January 28, 2014

Can I really append a file in HDFS OR do i need to always replace the file? http://hadoop4mapreduce.blogspot.com/2012/08/two-methods-to-append-content-to-file.html We had a client that wanted to merge data coming from same source into a single large file. The challenge was how to handle concurrency. The append process is not thread safe. The solution we used was to batch all the small files into a larger file periodically.

Compression Techniques for Hadoop

I am currently researching various options to do compression in Hadoop and found the following articles useful:

http://comphadoop.weebly.com/index.html
Slide share = http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2
Splitable LZO = http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
Read this Microsoft article: http://download.microsoft.com/download/1/C/6/1C66D134-1FD5-4493-90BD-98F94A881626/Compression%20in%20Hadoop%20(Microsoft%20IT%20white%20paper).docx

My project needs are = should have minimum performance impact + should give decent compression gains. We looked at Spilatble LZO. However this requires

A custom codec to be compiled on your environment (Linux)
The library files need to be copied to all nodes
Update the Hadoop site config file
Additionally you will need to refer to your codec in your mapreduce
If Hive is used, your table creation command has to refer to LZO compression codec in both READ and WRITE

With all these requirements, my client decided to use Bzip2 as it is native to Hadoop and does not require additional libraries to be installed. I have attached a quick comparison of the 2 solutions:

Welcome to My Blog

Welcome to my Blog. Through this Blog, I will share real world scenarios, issues and solutions for Hadoop based projects.

Hadoop Simplified

Thursday, January 30, 2014

How To get Job Status?

Tuesday, January 28, 2014

Appending Files in Hadoop

Compression Techniques for Hadoop

Welcome to My Blog

Blog Archive

Reference Sites