Cloud 1 - Scaling

A study of the available “cloud computing technologies” in the field of Scalability. Topics include fault tolerance and load balancing at the network, data, and web server level. Introduces students to the technologies, their benefits, and how to leverage them. Class includes labs and optional take home assignments in which students apply the knowledge following real scenarios. Course uses the Java programming language.


Prerequisites:

Course requires the ability to use the Java programming language.


Course Duration:

4 days (32 hours) classroom time


Appropriate Roles:

Advanced Technical

Optional: Technical


Required Textbooks and Materials:

Tom White, Hadoop: The Definitive Guide

http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449389732/ref=pd_sim_b_1


     



Upon completion of this course the student will be able to:



Syllabus:

  1. Network Load balancing and fault tolerance

    1. Network Topology

    2. Load Balancing

    3. Fault Tolerance

  2. Google Map/Reduce distributed computing

    1. Comparion with other systems

    2. Mapper

    3. Reducer

    4. Merge

  3. Apache Hadoop distributed file system using Map/Reduce

    1. History of Hadoop

    2. Scaling

      1. Combiner Functions

      2. Running a Distributed Job

    3. HDFS

      1. HDFS Concepts

        1. Blocks

        2. Namenodes and Datanodes

      2. Java Interface

        1. Reading Data

        2. Writing Data

        3. Querying

        4. Deleting Data

        5. Data Integrity

        6. Serilization

    4. How MapReduce Works

      1. Anatomy of a MapReduce Job Run

      2. Failures

      3. Job Scheduling

      4. Data Types and Formats

  4. Apache HBase distributed database

    1. HBase Overview

    2. HBase Data Model

    3. Java Clients

    4. Example Schemas

    5. Example Queries

    6. Differences between HBase and RDBMS

  5. Database replication

    1. Replication

      1. Replication in Distributed Systems

    2. Modes of Replication

    3. Oracle RAC

  6. Remote Procedure Calls

    1. RMI, Java Remote Method Invocation

    2. Serilization

    3. Remote Interfaces

    4. XML-RPC

    5. REST

  7. J2EE

    1. Java 2 Platform

    2. Servlets

    3. JSP, Java Server Pages

    4. EJB, Enterprise Java Beans

    5. JMS, Java Messaging Service

    6. JNDI, Java Naming and Directory Interface

    7. Jboss, J2EE Server

      1. Jboss Features

      2. Jboss Setup

      3. Using Jboss

  8. Federation

    1. Enterprise Beans

      1. Message Driven Beans

    2. JMS as a Resource

    3. Temporary Destinations

  9. Jboss Cache

    1. Cluster

    2. Cache

    3. Jboss AOP

  10. Implementing cloud scaling technologies

    1. Developing a MapReduce Application

      1. Configuration API

      2. Unit Testing

      3. Running Locally

      4. Running on a cluster

      5. Workflows

    2. Setting up a Hadoop Cluster

      1. Network Topology

      2. Cluster setup and installation

      3. Hadoop Configuration

      4. Security

      5. Benchmarking

    3. HBase

      1. Installation

      2. Test Drive



Agenda:

  1. Cloud Computing Overview

    1. [Cloud Computing Overview.ppt]

    2. What is Cloud Computing

    3. Need for a Solution

    4. Available Technologies

    5. Lab #0 Sticky Sessions

    6. Learning Objectives

      1. [Syllabus_Intro_Cloud_Computing.doc]

  2. Network Solutions

    1. [Load Balancing.ppt]

    2. Load Balancing

    3. Need for a Solution

    4. Session Replication

      1. Sticky Sessions

      2. Replicated Sessions

      3. Server Clusters

        1. IP Multicast

      4. IP Sockets

    5. 4 Categories of State Management

      1. Stateless

      2. Conversational

      3. Cached

      4. Singleton

  3. XA Transactions

    1. [XA Transactions.ppt]

    2. ACID

      1. Atomic Transactions

        1. Example

        2. Implementation

      2. Orthogonality

      3. Isolation

        1. Phantom Reads

        2. Isolation Levels

          1. Serializable

          2. Repeatable Read

          3. Read Committed

          4. Read Uncommitted

        3. Examples

    3. Two Phase Commit

      1. Protocol

      2. Assumptions

      3. Initiation

      4. Examples

      5. Disadvantages

  4. Database Replication

    1. [Database Replication.ppt]

    2. Replication

      1. Data Replication

      2. Database Replication

      3. Replicated Servers

      4. Transparency

      5. Active vs Passive Replication

      6. Multi-Master Replication

      7. Load Balancing

      8. Backup

    3. Replication in Distributed Systems

      1. Transactional Replication

      2. State Machine Replication

      3. Virtual Synchronomy

      4. Performance Comparison

    4. Modes

      1. Master/Slave

      2. Master/Master

      3. Lazy Replication

      4. Multi-Master Replication

        1. Benefits

        2. Disadvantages

        3. Methods

        4. Example

    5. Wrap Up Example

    6. Oracle Rac, Relational Database Cloud

      1. [Oracle RAC.ppt]

      2. Overview

        1. Oracle RAC Definition

        2. Goal of Oracle RAC

        3. Shared Storage

      3. Configuration

      4. Installing and Configuring Oracle Clustering File System

      5. Cluster Ready Services

      6. Installing Oracle with Real Application Clusters

      7. Oracle RAC Wrap up

  5. RPC, Remote Procedure Calls

    1. [RPC.ppt]

    2. Overview

      1. RPC Definition

      2. Goal of RPC

      3. History of RPC

    3. Methodology

      1. Client / Server

      2. Local vs Remote

      3. Interface Description Language

    4. Java Remote Method Invocation

      1. RMI Overview

      2. COBRA vs. RMI

      3. Java.rmi

      4. Jini

    5. Serilization

      1. Marshalling

      2. Serilization Advantages

      3. Serilization Disadvantages

      4. XML Serilization

      5. Serilization in Programming

        1. Language Support

        2. Java.io.Serilizable

        3. Example

        4. Dynamic Coad Loading

    6. Remote Interfaces

      1. Java Standards

      2. Remote Interfaces

      3. Remote Example

      4. Implementing a Remote Interface

      5. Passing Objects in RMI

      6. Making the Remote Object Available

    7. XML-RPC Protocol

      1. Protocol Overview

      2. History of XML-RPC

      3. XML-RPC Implementations

    8. SOAP Protocol

    9. REST Protocol

      1. REST Overview

      2. Simplicity of REST

      3. Restful Resources

      4. HTTP and REST

      5. REST vs RPC

    10. EJB

      1. Session Beans

      2. Example

    11. RPC Summary

  6. J2EE

    1. [J2EE_Overview.ppt]

    2. Java 2 Platform

      1. Java Versions

      2. J2EE Technologies

      3. J2EE Components

    3. Servlets

      1. Servlet Overview

      2. Anatomy of a Servlet

      3. Servlet Example

    4. JSP, Java Server Pages

      1. JSP Overview

      2. JSP Example

    5. EJB, Enterprise Java Beans

      1. EJB Overview

      2. Anatomy of a EJB

      3. Types of Beans

        1. Entity Bean

        2. Session Beans

          1. States

        3. Message Beans

    6. JMS, Java Message Service

      1. JMS Overview

      2. Reasons to use JMS

    7. JDBC, Data Access API

    8. JNDI, Java Naming and Directory Interface

      1. JNDI Overview

      2. JNDI Layers

      3. JNDI Common Uses

    9. J2EE Application Structure

    10. J2EE Deployment Structure

  7. Jboss, J2EE Server

    1. [Jboss.ppt]

    2. J2EE Review

    3. Jboss Features

    4. Jboss Setup

      1. Jboss Installation

      2. Jboss Setup

      3. Jboss Datasources

    5. Using JBoss

      1. Jboss webserver

      2. Jboss JMS setup

      3. Jboss Default Ports

      4. Jboss Administration Console

      5. Jboss Application Deployment

    6. Jboss Security

      1. JAAS

        1. JAAS Login Modules

    7. Breit Example

      1. Building Breit

    8. Jboss Advantages and Disadvantages

  8. Federation

    1. Building a Federated Query System, Maven and EJB

      1. [Maven_and_EJB.ppt]

      2. Maven

        1. Maven Overview

        2. Maven Objectives

        3. Installing Maven

        4. POM files

        5. Maven Phases

        6. Example: Building Breit

      3. Enterprise Beans

        1. Creating a Session Bean

        2. Remote Interface

        3. The Bean Class

        4. Message Driven Beans

        5. Calling from a Servlet

        6. The Servlet Class

        7. The Ear file

    2. [Federation.ppt]

    3. Federation Goal

    4. JMS as a Resource

      1. JMS Overview

      2. JMS Clients

      3. Producer / Consumer

      4. EJB and JMS

      5. Asynchronous

      6. JMS Example

      7. Loosly Coupled

      8. JMS Messaging Domains

    5. Message Driven Beans

      1. JMS: Entity and Session Beans

      2. Message Driven Beans

        1. MDB Overview

        2. MDB Charecteristics

        3. EJB3 MDB

        4. EJB Example

      3. Temporary Destinations

        1. Temporary Destination Overview

        2. Temporary Destination Limitations

        3. Temporary Queue Architecture

      4. Example Federated Query

    6. Federation Lab Assignment

    7. Federation Wrap Up

  9. Data Replication

    1. Aspect Oriented Programming

      1. Basic Overview on AOP

        1. [Aspect_Oriented_Programming.ppt]

        2. AOP Overview

        3. AOP Terminology

        4. The Need for AOP

          1. The Problem, Why AOP?

          2. The Solution

        5. Join Point models

        6. Implementation

        7. Terminology Review

      2. Lecture Notes, JavaWorld article on AOP

        1. [www.javaworld.com - jw-0118-aspect.doc]

    2. Jboss Cache

      1. [Jboss_Cache.ppt]

      2. Cache Overview

        1. Define Cache

        2. Define Cluster

        3. Why Cache

        4. Why Cluster

      3. Jboss Cache

        1. Flavors of Jboss Cache

        2. Searchable Cache

        3. Jboss Cache Overview

          1. Jboss Cache Users

          2. Jboss Cache Architecture

          3. Jboss Cache Goal

          4. Jboss Cache Features

        4. TreeCache Architecture

        5. Jboss Aop

          1. Aop Features

          2. Dynamic Aop

          3. TreeCache Aop

            1. About TreeCache Aop

            2. TreeCache Aop API

            3. TreeCache Aop Mapping

            4. Replication

              1. Replication in TreeCache API

        6. Jboss Cache Overview

        7. More info: JGroups presentation on JBoss cache

          1. [Jboss_Cache.ppt]

      4. More Info: New version, JBoss Cache now known as Infinispan

        1. Quick Start guide to Infinispan 4.1.x

          1. [Community_ 5 minute tutorial on Infinispan.doc]

        2. Architecture guide to Infinispan

          1. [Community_ Architectural Overview.doc]

  10. Map Reduce

    1. [Map Reduce and Hadoop.ppt]

    2. Map Reduce Overview

      1. Map Reduce Overview

      2. Map and Reduce Steps

      3. Why use Map Reduce

      4. Map Reduce Users

    3. Map Reduce

      1. Map Step

      2. Reduce Step

      3. M & R pieces

      4. Counting Words Example

        1. Stage 1: Mapper

        2. Stage 2: Reducer

        3. Example Wrap Up

    4. Map Reduce Features

      1. Fault Tolerance

      2. Ordering Guarantee

      3. Partitioning Function

      4. Combiner Function

      5. Counters

    5. Hadoop

      1. Hadoop Overview

      2. Hadoop Configuration

      3. Hadoop Example

        1. Grep

        2. Preparing Hadooop for Grep

        3. Using Grep in Hadoop

        4. Grep Example Overview

        5. Word Count

    6. Map Reduce Merge

      1. Map Reduce Limitations

      2. Merge

      3. Merge Terms

      4. Configurable Iterators

    7. More Information, Google’s published paper on Map Reduce

      1. [mapreduce-osdi04_MapReduce Simplified Data Processing on Large Clusters Original Paper.doc]

  11. Hadoop

    1. [Hadoop.ppt]

    2. Hadoop Overview

    3. Map Reduce Overview

      1. System Overview

      2. Process Flow Diagram

      3. Launching a Map Reduce Job

        1. Client

    4. Terminology

      1. Input Format and Output Format

        1. Example

      2. Job Client

      3. Job Tracker

      4. Task Tracker

      5. Task

      6. Task Runner

    5. Mapper

      1. Creating the Mapper

      2. Example

    6. What is Writable?

    7. Input Formats

      1. Reading Data

      2. FileInputFormat

      3. Filtering File Inputs

      4. Record Readers

      5. Input Split Size

    8. Writing Input

      1. Sending Data To Reducers

      2. Writeable Comparator

    9. Sending Data to the Client

      1. Partitioner

      2. Reducer

        1. Example

      3. Output Format

    10. Example: N-Gram Generator

      1. N-Gram Overview

      2. Map-Reduce Process

      3. N-Gram Requirements

      4. High Level Data Flow

      5. Executable Example** For Extra Practice Assignment

        1. Downloading Hadoop N-Gram Example

        2. Running Example

        3. Difference from Word Count

        4. Changes Needed

          1. New RecordReader

          2. New InputFormat

          3. Output.collect

          4. “Find” Mapper / Reducer

          5. “Prune” Mapper / Reducer

          6. Connecting different Map/Reduce Jobs

          7. Counters

          8. JobConf

            1. Find

            2. Prune

          9. Design Questions

    11. More information: Hadoop Setup and Configuration Guides

      1. Hadoop Single Node install and configuration guide

        1. [Hadoop_single_node_setup.doc]

      2. Hadoop Cluster install and configuration guide

        1. [Hadoop_cluster_setup.doc]

      3. Hadoop Map Reduce Framework Tutorial

        1. [Hadoop_mapred_tutorial.doc]

  12. HDFS Guides

    1. HDFS Overview

      1. [Hadoop_hdfs_user_guide.doc]

      2. HDFS Purpose

      3. HDFS Overview

      4. Prerequisites

      5. Web Interface

      6. Shell Commands

      7. Secondary NameNode

      8. Checkpoint Node

      9. Backup Node

      10. Import Checkpoint

      11. Rebalancer

      12. Rack Awareness

      13. Safemode

      14. Fsck

      15. Upgrade and Rollback

      16. File Permissions and Security

      17. Scalability

      18. Related Documentation

    2. More Information: HDFS Architecture Guide

      1. [Hadoop_hdfs_design.doc]

  13. HBase

    1. HBase Overview

      1. [michael_stack-hbase.ppt]

      2. What is HBase

      3. Hbase Data Model

      4. Hbase Implementation

      5. Using Hbase

      6. Projects Powered-by HBase

    2. More Information: Google’s published paper on HBase concepts

      1. [bigtable-osdi06_Google_HBase.doc]

    3. Complete HBase guide

      1. [HBase_book.html]

      2. Chapter 1 covered in class, further chapters recommended for self study

      3. Chapter 1, Getting Started

        1. Introduction

        2. Quick Start

          1. Download and unpacking

          2. Start HBase

          3. Shell Exercises

          4. Stopping HBase

          5. Where to Go Next

        3. The Not-so-quick Start Guide

          1. Requirements

          2. HBase run modes: Standalone and Distributed

          3. Example Configurations

  14. Scaling Wrap Up

    1. Review / Test, Class Excercise

      1. [Scaling_Final_Exam_vA.doc] or [Scaling_Final_Exam_vB.doc]

    2. End of Scaling Course Review