Tuesday, February 18, 2014

HDFS Data Federation

My client has this requirement to restrict certain Hadoop directories based on user/group.  This seems to be a common issue where you want to isolate portions of HDFS to say a department or users.

Federated NameNode sounds a perfect solution for this, however this has issues and limitations.  It does not work well with shared network storage.  Additionally I would like to understand who is utilizing this in production.

One alternative is to use the Posix permissions owner-group-others  and restrict certain directories to groups.  This provides basic security but is harder to manage and does not provide a clean isolation like the federated namenode.


Some links to read

http://www.slideshare.net/huguk/hdfs-federation-hadoop-summit2011


  

Thursday, February 6, 2014

Yarn Command Line

Starting from Hadoop 2.0, we can use Yarn to run mapreduce as an application.


  • Running a Wordcount using Yarn
         $yarn jar $HADOOP_HOME/HadoopSamples.jar \
          mr.wordcount  -libjars $MYLIBS/custom-libs.jar

Tuesday, February 4, 2014

Avro DataSerialization in Hadoop