Cloud 1 - Scaling
A study of the available “cloud computing technologies” in the field of Scalability. Topics include fault tolerance and load balancing at the network, data, and web server level. Introduces students to the technologies, their benefits, and how to leverage them. Class includes labs and optional take home assignments in which students apply the knowledge following real scenarios. Course uses the Java programming language.
Prerequisites:
Course requires the ability to use the Java programming language.
Course Duration:
4 days (32 hours) classroom time
Appropriate Roles:
Advanced Technical
Optional: Technical
Required Textbooks and Materials:
Tom White, Hadoop: The Definitive Guide
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449389732/ref=pd_sim_b_1
Upon completion of this course the student will be able to:
Rapidly deploy scaling and redundancy requirements into an application using existing open source (Apache) technologies.
Understand network level load balancing and fault tolerance.
Specify two mechanisms of load balancing and fault tolerance in relational databases.
Describe fault tolerance and load balancing in Hbase and Hadoop
Setup fault tolerance and load balancing in Hbase and Hadoop
Demonstrate an understanding of “Map” and “Reduce”
Implement Session Replication through Sticky Sessions
Explain the components of an “ACID” XA Transaction
Choose appropriate Database Replication methods
Perform Remote Procedure calls using various different methods
Understand the J2EE technology stack
Use Jboss, a J2EE server
Understand Federation and use both JMS and Beans
Implement JbossCache
Syllabus:
Network Load balancing and fault tolerance
Network Topology
Load Balancing
Fault Tolerance
Google Map/Reduce distributed computing
Comparion with other systems
Mapper
Reducer
Merge
Apache Hadoop distributed file system using Map/Reduce
History of Hadoop
Scaling
Combiner Functions
Running a Distributed Job
HDFS
HDFS Concepts
Blocks
Namenodes and Datanodes
Java Interface
Reading Data
Writing Data
Querying
Deleting Data
Data Integrity
Serilization
How MapReduce Works
Anatomy of a MapReduce Job Run
Failures
Job Scheduling
Data Types and Formats
Apache HBase distributed database
HBase Overview
HBase Data Model
Java Clients
Example Schemas
Example Queries
Differences between HBase and RDBMS
Database replication
Replication
Replication in Distributed Systems
Modes of Replication
Oracle RAC
Remote Procedure Calls
RMI, Java Remote Method Invocation
Serilization
Remote Interfaces
XML-RPC
REST
J2EE
Java 2 Platform
Servlets
JSP, Java Server Pages
EJB, Enterprise Java Beans
JMS, Java Messaging Service
JNDI, Java Naming and Directory Interface
Jboss, J2EE Server
Jboss Features
Jboss Setup
Using Jboss
Federation
Enterprise Beans
Message Driven Beans
JMS as a Resource
Temporary Destinations
Jboss Cache
Cluster
Cache
Jboss AOP
Implementing cloud scaling technologies
Developing a MapReduce Application
Configuration API
Unit Testing
Running Locally
Running on a cluster
Workflows
Setting up a Hadoop Cluster
Network Topology
Cluster setup and installation
Hadoop Configuration
Security
Benchmarking
HBase
Installation
Test Drive
Agenda:
Cloud Computing Overview
[Cloud Computing Overview.ppt]
What is Cloud Computing
Need for a Solution
Available Technologies
Lab #0 Sticky Sessions
Learning Objectives
[Syllabus_Intro_Cloud_Computing.doc]
Network Solutions
[Load Balancing.ppt]
Load Balancing
Need for a Solution
Session Replication
Sticky Sessions
Replicated Sessions
Server Clusters
IP Multicast
IP Sockets
4 Categories of State Management
Stateless
Conversational
Cached
Singleton
XA Transactions
[XA Transactions.ppt]
ACID
Atomic Transactions
Example
Implementation
Orthogonality
Isolation
Phantom Reads
Isolation Levels
Serializable
Repeatable Read
Read Committed
Read Uncommitted
Examples
Two Phase Commit
Protocol
Assumptions
Initiation
Examples
Disadvantages
Database Replication
[Database Replication.ppt]
Replication
Data Replication
Database Replication
Replicated Servers
Transparency
Active vs Passive Replication
Multi-Master Replication
Load Balancing
Backup
Replication in Distributed Systems
Transactional Replication
State Machine Replication
Virtual Synchronomy
Performance Comparison
Modes
Master/Slave
Master/Master
Lazy Replication
Multi-Master Replication
Benefits
Disadvantages
Methods
Example
Wrap Up Example
Oracle Rac, Relational Database Cloud
[Oracle RAC.ppt]
Overview
Oracle RAC Definition
Goal of Oracle RAC
Shared Storage
Configuration
Installing and Configuring Oracle Clustering File System
Cluster Ready Services
Installing Oracle with Real Application Clusters
Oracle RAC Wrap up
RPC, Remote Procedure Calls
[RPC.ppt]
Overview
RPC Definition
Goal of RPC
History of RPC
Methodology
Client / Server
Local vs Remote
Interface Description Language
Java Remote Method Invocation
RMI Overview
COBRA vs. RMI
Java.rmi
Jini
Serilization
Marshalling
Serilization Advantages
Serilization Disadvantages
XML Serilization
Serilization in Programming
Language Support
Java.io.Serilizable
Example
Dynamic Coad Loading
Remote Interfaces
Java Standards
Remote Interfaces
Remote Example
Implementing a Remote Interface
Passing Objects in RMI
Making the Remote Object Available
XML-RPC Protocol
Protocol Overview
History of XML-RPC
XML-RPC Implementations
SOAP Protocol
REST Protocol
REST Overview
Simplicity of REST
Restful Resources
HTTP and REST
REST vs RPC
EJB
Session Beans
Example
RPC Summary
J2EE
[J2EE_Overview.ppt]
Java 2 Platform
Java Versions
J2EE Technologies
J2EE Components
Servlets
Servlet Overview
Anatomy of a Servlet
Servlet Example
JSP, Java Server Pages
JSP Overview
JSP Example
EJB, Enterprise Java Beans
EJB Overview
Anatomy of a EJB
Types of Beans
Entity Bean
Session Beans
States
Message Beans
JMS, Java Message Service
JMS Overview
Reasons to use JMS
JDBC, Data Access API
JNDI, Java Naming and Directory Interface
JNDI Overview
JNDI Layers
JNDI Common Uses
J2EE Application Structure
J2EE Deployment Structure
Jboss, J2EE Server
[Jboss.ppt]
J2EE Review
Jboss Features
Jboss Setup
Jboss Installation
Jboss Setup
Jboss Datasources
Using JBoss
Jboss webserver
Jboss JMS setup
Jboss Default Ports
Jboss Administration Console
Jboss Application Deployment
Jboss Security
JAAS
JAAS Login Modules
Breit Example
Building Breit
Jboss Advantages and Disadvantages
Federation
Building a Federated Query System, Maven and EJB
[Maven_and_EJB.ppt]
Maven
Maven Overview
Maven Objectives
Installing Maven
POM files
Maven Phases
Example: Building Breit
Enterprise Beans
Creating a Session Bean
Remote Interface
The Bean Class
Message Driven Beans
Calling from a Servlet
The Servlet Class
The Ear file
[Federation.ppt]
Federation Goal
JMS as a Resource
JMS Overview
JMS Clients
Producer / Consumer
EJB and JMS
Asynchronous
JMS Example
Loosly Coupled
JMS Messaging Domains
Message Driven Beans
JMS: Entity and Session Beans
Message Driven Beans
MDB Overview
MDB Charecteristics
EJB3 MDB
EJB Example
Temporary Destinations
Temporary Destination Overview
Temporary Destination Limitations
Temporary Queue Architecture
Example Federated Query
Federation Lab Assignment
Federation Wrap Up
Data Replication
Aspect Oriented Programming
Basic Overview on AOP
[Aspect_Oriented_Programming.ppt]
AOP Overview
AOP Terminology
The Need for AOP
The Problem, Why AOP?
The Solution
Join Point models
Implementation
Terminology Review
Lecture Notes, JavaWorld article on AOP
[www.javaworld.com - jw-0118-aspect.doc]
Jboss Cache
[Jboss_Cache.ppt]
Cache Overview
Define Cache
Define Cluster
Why Cache
Why Cluster
Jboss Cache
Flavors of Jboss Cache
Searchable Cache
Jboss Cache Overview
Jboss Cache Users
Jboss Cache Architecture
Jboss Cache Goal
Jboss Cache Features
TreeCache Architecture
Jboss Aop
Aop Features
Dynamic Aop
TreeCache Aop
About TreeCache Aop
TreeCache Aop API
TreeCache Aop Mapping
Replication
Replication in TreeCache API
Jboss Cache Overview
More info: JGroups presentation on JBoss cache
[Jboss_Cache.ppt]
More Info: New version, JBoss Cache now known as Infinispan
Quick Start guide to Infinispan 4.1.x
[Community_ 5 minute tutorial on Infinispan.doc]
Architecture guide to Infinispan
[Community_ Architectural Overview.doc]
Map Reduce
[Map Reduce and Hadoop.ppt]
Map Reduce Overview
Map Reduce Overview
Map and Reduce Steps
Why use Map Reduce
Map Reduce Users
Map Reduce
Map Step
Reduce Step
M & R pieces
Counting Words Example
Stage 1: Mapper
Stage 2: Reducer
Example Wrap Up
Map Reduce Features
Fault Tolerance
Ordering Guarantee
Partitioning Function
Combiner Function
Counters
Hadoop
Hadoop Overview
Hadoop Configuration
Hadoop Example
Grep
Preparing Hadooop for Grep
Using Grep in Hadoop
Grep Example Overview
Word Count
Map Reduce Merge
Map Reduce Limitations
Merge
Merge Terms
Configurable Iterators
More Information, Google’s published paper on Map Reduce
[mapreduce-osdi04_MapReduce Simplified Data Processing on Large Clusters Original Paper.doc]
Hadoop
[Hadoop.ppt]
Hadoop Overview
Map Reduce Overview
System Overview
Process Flow Diagram
Launching a Map Reduce Job
Client
Terminology
Input Format and Output Format
Example
Job Client
Job Tracker
Task Tracker
Task
Task Runner
Mapper
Creating the Mapper
Example
What is Writable?
Input Formats
Reading Data
FileInputFormat
Filtering File Inputs
Record Readers
Input Split Size
Writing Input
Sending Data To Reducers
Writeable Comparator
Sending Data to the Client
Partitioner
Reducer
Example
Output Format
Example: N-Gram Generator
N-Gram Overview
Map-Reduce Process
N-Gram Requirements
High Level Data Flow
Executable Example** For Extra Practice Assignment
Downloading Hadoop N-Gram Example
Running Example
Difference from Word Count
Changes Needed
New RecordReader
New InputFormat
Output.collect
“Find” Mapper / Reducer
“Prune” Mapper / Reducer
Connecting different Map/Reduce Jobs
Counters
JobConf
Find
Prune
Design Questions
More information: Hadoop Setup and Configuration Guides
Hadoop Single Node install and configuration guide
[Hadoop_single_node_setup.doc]
Hadoop Cluster install and configuration guide
[Hadoop_cluster_setup.doc]
Hadoop Map Reduce Framework Tutorial
[Hadoop_mapred_tutorial.doc]
HDFS Guides
HDFS Overview
[Hadoop_hdfs_user_guide.doc]
HDFS Purpose
HDFS Overview
Prerequisites
Web Interface
Shell Commands
Secondary NameNode
Checkpoint Node
Backup Node
Import Checkpoint
Rebalancer
Rack Awareness
Safemode
Fsck
Upgrade and Rollback
File Permissions and Security
Scalability
Related Documentation
More Information: HDFS Architecture Guide
[Hadoop_hdfs_design.doc]
HBase
HBase Overview
[michael_stack-hbase.ppt]
What is HBase
Hbase Data Model
Hbase Implementation
Using Hbase
Projects Powered-by HBase
More Information: Google’s published paper on HBase concepts
[bigtable-osdi06_Google_HBase.doc]
Complete HBase guide
[HBase_book.html]
Chapter 1 covered in class, further chapters recommended for self study
Chapter 1, Getting Started
Introduction
Quick Start
Download and unpacking
Start HBase
Shell Exercises
Stopping HBase
Where to Go Next
The Not-so-quick Start Guide
Requirements
HBase run modes: Standalone and Distributed
Example Configurations
Scaling Wrap Up
Review / Test, Class Excercise
[Scaling_Final_Exam_vA.doc] or [Scaling_Final_Exam_vB.doc]
End of Scaling Course Review