The Big Data course introduces you to cloud-based big data solutions such as EMR, Redshift, Kinesis, and the rest of the big data platform. In this course, we will show you how to use EMR to process data with a broad system of Hadoop tools such as Hive and Hue. The course also includes how to create a big data environment, how to use DynamoDB, Redshift Quicksight, Athena, and Kinesis in combination, and how to use best practices to design a big data environment for security and economy.
Course targets By taking this course, you will be able to:
Cloud vendors solutions within big data systems
Use Apache Hadoop in an EMR environment
Understanding the components of an EMR cluster
Launch and configure an EMR cluster
Common programming frameworks that leverage EMR, including Hive, Pig, and Streaming
Improve the usability of EMR with Hue
Use memory analysis with Spark on EMR
Choose the right Cloud vendors data storage option
Determine the benefits of using Kinesis to process big data at near real-time
Effectively store and analyze data with Redshift
Understand and manage the costs and security of big data solutions
Protecting big data solutions
Identify options for acquiring, transmitting, and compressing data
Ad Hoc query analysis with Athena
Describe data and queries with visualization software with QuickSight
Orchestrate big data workflows with Cloud vendors Data Pipeline
Target populationThis course applies to:
The person responsible for designing and implementing a big data solution, the solution architect Data scientists and data analysts are interested in understanding the services and architectural patterns behind big data solutions on Cloud vendorsprerequisitesWe recommend that those attending this course meet the following prerequisites:
Basic familiarity with big data technologies, including Apache Hadoop, MapReduce, HDFS, and SQL / NoSQL queries
Students should complete Big Data Technology Fundamentals online training or have equivalent experience
Experience with core Cloud vendors services and public cloud implementation
Students should complete Cloud vendors Technical Essentials courses or have equivalent experience
Understand the basic teaching methods of data warehouse, relational database system and database design
This course will be taught in combination with:
Instructor-led training (ILT)
Hands-on lab
Hands-on activities
This course allows you to experiment with new technologies and apply what you have learned to your work environment through various practical exercises
Course OutlineDay 1 Big data overview
Big data acquisition and transmission
Big data streaming and Kinesis
Lab 1: Stream and analyze Apache server log data with Kinesis
Big data storage solutions
Big data processing and analysis
Lab 2: Querying S3 Log Data with AthenaDay 2
Apache Hadoop and EMR
Lab 3: Store and query data on DynamoDB
Use EMR
Hadoop programming framework
Lab 4: Processing server logs with Hive on AEMR
Web interface on EMR
Lab 5: Running Pig Scripts in Hue on EMR
Apache Spark on EMR
Lab 6: Using Spark on EMR to Process NY Taxi Data Day 3
Redshift and big data
Visualization and compilation of big data
Lab 7: Visualizing Data with TIBCO Spotfire
Manage Big Data Expenses
Protect your deployment
Big data design patterns
3rd day
Redshift and big data
Visualization and compilation of big data
Lab 7: Visualizing Data with TIBCO Spotfire
Manage Big Data Expenses
Protect your deployment
Big data design patterns
Comment
There are no reviews yet.