Thursday, 14 August 2014

Hadoop - new course coming soon!

I'm pleased to announce that recording of my next Virtual Pair Programmers course is now almost complete. The course is covering Hadoop for Java Developers. If you haven't heard of Hadoop (is there anyone who hasn't?) this is a framework for distributing the processing of large amounts of data accross a a network of computers.

The course assume some basic Java knowlege, but no prior knowledge of Hadoop or its Map Reduce programming model.

Once recording is complete, there will be an edit phase and of course some post-production work to complete, but the likely running order is as follows:


  • An overview of what Hadoop is, and introducing the concept of the map-reduce programming model
  • Getting to grips with map-reduce, including creating some map-reduce code in standard Java
  • Hadoop operating modes, and how to set up and install Hadoop
  • Creating our first Hadoop Map-Reduce job
  • The Hadoop Distributed File System (HDFS)
  • Understanding the map-reduce process flow, including combine and shuffle
  • Looking at map reduce job configuration options, such as file formats, runtime options
  • Creating custom data types 
  • Chaining multiple jobs, and adding extra Map steps to jobs 
  • Optimising jobs
  • Working with JDBC databases
  • Unit testing (with MRUnit)
  • Secondary Sorting (sorting the values as well as the keys)
  • Joining Data from multiple files
  • Using the Amazon EMR service
The course has a number of real world examples throughout and two large case studies to work through too so there's lots of practical exercises. As well as model answers and sample code throughout, I'm also including some templates that I use for my own map-reduce jobs which you'll be able to re-use in your own projects.

If you are a Microsoft Windows user, then you need to know that installing Hadoop on Windows is hard, so in the course, I ask you to use a virtual machine running Linux... and I'll talk you through how to install and configure that... no prior knowledge of Linux is required.  Mac and Linux users can either install Hadoop directly, or also use a virtual machine - all the options are covered.

The course should be going live some time in September so keep an eye out on this blog or the Virtual Pair Programmers' facebook page for more information.

6 comments:

  1. There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.Hadoop Training in Chennai | Big Data Training in Chennai

    ReplyDelete
  2. Best RPA Training in Pune.Blue Prism Training, UiPath, AA, Cognitive RPA Training course in Pune. Python, Machin learning, AI training. We provide hands-on training and post-training support on live projects, POC, Use case, case study and many real-time industry examples. RPA Training Hinjewadi, Bhandarkar Rd & Kharadi Kausal Vikash-RPA,Blue Prism,UiPath,Cognitive RPA Training Pune

    ReplyDelete
  3. Really Good blog post.provided a helpful information.I hope that you will post more updates like this
    Big data hadoop online training Hyderabad

    ReplyDelete
  4. Thank you for your guide to with upgrade information about Hadoop
    Hadoop admin Online Training

    ReplyDelete
  5. Thanks for sharing the great post.
    CT courses in pune,Maharashtra,CT courses in Bhosari,CT courses in hadapsar,CT courses in deccan, Diploma CT courses in pune, Diploma CT courses in bhosari,Diploma CT courses in hadapsar,Diploma CT courses in bhosari offering by Adarsh paramedical institute.

    CT courses in pune,Maharashtra
    CT courses in Bhosari
    CT courses in hadapsar
    CT courses in deccan
    Diploma CT courses in pune
    Diploma CT courses in bhosari
    Diploma CT courses in hadapsar
    Diploma CT courses in bhosari
    CT courses in nanded city
    CT courses in baramati

    ReplyDelete