Big Data for Beginners (Online)

The Big Data for Beginners class is an intensive four half-day class designed to help get you introduced to the Big Data ecosystem. The class starts off by introducing you to the basic concepts behind how Hadoop systems are laid out and how they function. On day two, will look at the different types of work that you might consider Hadoop for, and whether or not it is the right tool. We’ll also take a look at the different data systems like HCatalog, Hive, and NoSQL options. We’ll also start digging in to deployment options and start going through options for onsite deployment. On day three, we will continue with deployments, but crossover to cloud deployments. On day four, the class will wrap up discussing the various tools for manipulating data, visualizing data, and optimization techniques for your environment.

IMPORTANT: When you purchase a seat in this class you also receive 2 hours of our BI Virtual Mentor service for assistance with YOUR project work. This remote service lets you work ONE on ONE with our BI mentors to accelerate YOUR project and further improve YOUR skills. Your Virtual Mentor hours can be used during business hours anytime during the 12 months following your class. To set up a session, download this simple questionnaire, fill it out, and send it as an attachment to This unique Pragmatic Works offering insures that your training will translate into real world success for YOUR projects.

All virtual training students are provided a previously recorded version of the class that are available for 7 days after the end of the class.


    Day 1 - The Basics

  1. Class Introduction
  2. Big Data Basics
    • Understanding the basic layout
    • Covering the hardware layout
    • Breaking down Cloud vs On-Prem
    • Progressing from MapReduce to Yarn/Stinger
  3. Hadoop Ecosystem
    • It's not your average file system
    • Structuring your data
    • Transforming your data
    • Piping your data in
  4. Day 2 - Onsite Deployment

  5. Data Storage Systems
    • Understand how HCatalog and Hive store data
    • Using NoSQL options and what they are
  6. What problems can Big Data help me with?
    • Making Big Data work for you
    • What Big Data will not give you
  7. Onsite Deployment Options
    • Deploying to Linux hardware
    • Deploying to Windows hardware
  8. Day 3 - Cloud Deployment

  9. Cloud Deployment Options
    • HDInsight (PaaS)
      • Manual menu based deployment
      • PowerShell deployment
      • Automated deployment
    • Azure Virtual Machine (IaaS)
      • Recommended Virtual Machine templates
      • Deployment automation
  10. Day 4 - Data Manipulation and Optimizations

  11. Data Manipulation Tools
    • Storing data using Hive
    • Transforming data with Pig
    • Using Sqoop to get data directly from a relational database
    • Collecting Log data with Flume
  12. Optimization
    • Optimizing your use of Hive
    • How the Stinger initiative is helping Hadoop
  13. Data Visualization
    • Extracting data with Power Query
    • Combining data with Power Pivot
    • Visualizing results with Power View


This class is targeted for Systems Administrators, Database Administrators, or any other role interested introducing Hadoop in to their environment.  This class is focused on the basic structure and deployment. No previous SQL Experience is necessary. A basic of understanding of Linux and PowerShell is needed to understand the deployments section. A basic knowledge of Excel and Power Pivot and Power Query will be helpful for the data presentation section.


  • PowerShell 4.0
  • Azure PowerShell Commandlets
  • Access to Microsoft Hyper-V(or other virtualization host) to create Virtual Machines. Hyper-V is available for free with Windows 8 and higher
  • Excel 2013 add-ins for Power Pivot and Power Query

Some of our clients