Tips and Tricks to build a Hadoop eco system. References to good articles on Hadoop based solutions. Topics include: Hadoop architecture, Hive, SQL on Hadoop, Compression, Metadata.
Thursday, September 22, 2016
Sunday, August 28, 2016
How to avoid passwords in your Sqoop scripts
For all those who use Sqoop and are looking at ways to manage the password issue... here is a good article on use of JCEKS
https://www.mapr.com/blog/key-tips-managing-passwords-sqoop
Thursday, August 25, 2016
Sunday, August 21, 2016
Amazon QuickSight now can be used to analyze your billing data
AWS now is using the new QuickSight product to allow customers to analyze their data usage. Here is the latest article from AWS.
https://aws.amazon.com/about-aws/whats-new/2016/08/aws-cost-and-usage-report-data-is-now-easy-to-upload-directly-into-amazon-redshift-and-amazon-quicksight/
Wednesday, August 10, 2016
CASK 3.5 released
This seems to be a good Data Lake tool and is open source.
Slides: http://www.slideshare.net/caskdata/webinar-whats-new-in-cdap-35
Monday, August 8, 2016
Big Data Roles and Responsibilities
Here is a good list of Roles of Responsibilities for Big Data:
http://www.kdnuggets.com/2015/11/different-data-science-roles-industry.html
Friday, May 20, 2016
SQL on Hadoop - Trafodion
http://trafodion.apache.org/quickstart.html
http://trafodion.apache.org/architecture-overview.html
Thursday, March 10, 2016
My Kafka Blog
https://datafloq.com/read/realize-real-time-analytics-iot-monetization-kafka/1930
PL/SQL
This is great addition to Hive...now you can get data from Hive and RDBMS at same time.
http://www.hplsql.org/home
Sunday, February 28, 2016
Kafka and Spotify
https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/
Friday, February 26, 2016
Apache NiFi aka DataFlow
http://www.infoworld.com/article/2975833/hadoop/hortonworks-buys-better-hadoop-data-flow-management.html
Thursday, February 25, 2016
Hive Streaming
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
This leverages Hive transaction capability but is limited to tables with ORC format. Only supports Storm and Flume.
Tuesday, February 23, 2016
Monday, February 15, 2016
Workflow Design and Execution Engines - Luigi Airflow Pinball
http://bytepawn.com/luigi-airflow-pinball.html
Thursday, February 4, 2016
Streamsets - Open Source flume interface
https://streamsets.com
azkaban open source workflow
this has become better now... check this out
https://azkaban.github.io
Sunday, January 24, 2016
ClickStream Data Analytics end-to-end
http://hortonworks.com/hadoop-tutorial/how-to-visualize-website-clickstream-data/
Coursera Big Data Training
Affordable training... online from a reputed university.
https://www.coursera.org/specializations/big-data?utm_medium=onlineads&utm_campaign=Big+Data&utm_source=fb&nan_pid=1846868993
Subscribe to:
Posts (Atom)