- Avro is a data serialization system
- http://avro.apache.org/
- Key advantage is that it supports schema evolution.
- Schema is stored along with data.
- Schema is expressed in JSON format.
- Both writer and reader have to define a schema to access avro files.
- This allows a good way to handle schema evolution.
- Youtube link: http://www.youtube.com/watch?v=EBV4C-P3G94
- IBM article: http://www.ibm.com/developerworks/library/bd-avrohadoop/
- MapReduce and Avro example: http://avro.apache.org/docs/current/mr.html#Example%3A+ColorCount
- Other links:
- Hive has a Serde, so it can query data that is in Avro format http://www.michael-noll.com/
blog/2013/07/04/using-avro-in- mapreduce-jobs-with-hadoop- pig-hive/ - MSDN article on how to use this with C#: http://code.msdn.microsoft.com/Schema-Evolution-In-Avro-240f0a7a
Tips and Tricks to build a Hadoop eco system. References to good articles on Hadoop based solutions. Topics include: Hadoop architecture, Hive, SQL on Hadoop, Compression, Metadata.
Tuesday, February 4, 2014
Avro DataSerialization in Hadoop
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment