Posted by: Ruben123 January 5, 2017
DevOps or BigData Hadoop
Login in to Rate this Post:     1       ?         Liked by
In addition to fdpower: If you want to be a Hadoop developer, you need to know the commonly used tools/components in hadoop ecosystem. It can be fairly simple or complex at the same time to be a hadoop/big data developer and depends on which side of the hadoop ecosystem you are comfortable working with.

Consider Hadoop as a black box and lets say there is a left side of the box(1); the box itself(2); there is a right side of the box(3) & there is also a enclosure for the black box(4).

Left Side of the box(1): This is often termed as big data ingestion in the world of hadoop. The tools like Apache Sqoop, Kafka, Flume, Flafka, HDF (Hortonworks)..etc are used.

Box itself(2): Nowadays there are various open source tools like (Hive, Pig..etc) that are wrapper around old/gold MapReduce; you don't really need to write core MR code in java to execute your job. You should consider learning about the basic architecture of Hadoop, Yarn, Zookeeper, Mappers, Reducers, Partitioners, RecordReader, RecordWriter, job, job tracker etc... and also some knowledge of core java.

Right Side of the box(3): This is where you apply your business logic; Big Data analytics or Big Data Scientists, a sort of fancy name in Hadoop world. These guys treat Hadoop as a distributed storage unit. Pig & Hive are widely used for batch processing; analytics tools like Spark, HBase, Storm, Phoenix (SQL wrapper for HBase) & reporting tools like PentaHo, SAS.. etc are widely used. If you are a Phd in Physics, mathematics, even Biology, Chemistry.. you will be working in this side of Hadoop; extracting data from hadoop bridge (iPhython Notebook) or java and implement your complex machine learning, image analysis, etc.

Box Enclosure(4): This is where Dev Ops or Sys Admin role comes into play. Companies prefer preconfigured Hadoop cluster from either Cloudera or Hortonworks.. and buy subskriptions for support from these guys. Cloud service like AWS, Microsoft Azure, Rackspace..etc also provide excellent support for Cloudera or Hortonworks.

p.s. it turned out to be a long post. This is a little insight from a person who has background in Hadoop/Big Data; I hope it helps.
Read Full Discussion Thread for this article