Hive and Pig allows the management and manipulation of data in a Hadoop cluster without Java programming experience. Apache Hive is Hadoop’s data warehouse infrastructure and it makes multi-structured data accessible to analysts, database administrators, and others without Java programming expertise. Apache Pig applies the fundamentals of familiar scripting languages to the Hadoop cluster. It is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. In this hands-on course, students learn how Apache Pig and Apache Hive enable data transformations and analyses via filters, joins, and user-defined functions. Students learn how to apply data analytics and business intelligence skills to big data, including how to access, manipulate, and analyze complex data sets using SQL and other scripting languages.
Topics include the Pig Latin scripting language, the Grunt shell, Pig User Defined Functions (UDFs) and using Hive’s SQL dialect, HiveQL, to summarize, query, and analyze large datasets stored in Hadoop. Other Pig topics include Pig’s data model, Pig Latin scripts to sort, group, join, project, and filter your data, Grunt, and load and store functions. Other Hive topics include how to use Hive to create, alter, and drop databases, tables, views, functions, and indexes, how to load and extract data from tables, and how to perform queries, grouping, filtering, joining, and other conventional query operations.
- Hadoop Fundamentals
- Introduction to Hive
- Relational Data Analysis with Hive
- Hive Data Management
- Text Processing with Hive
- Introduction to Pig
- Basic Data Analysis with Pig
- Processing Complex Data with Pig
- Multi-Dataset Operations with Pig
- Choosing the Best Tool for the Job
- Comparing MapReduce, Pig, Hive, and Relational Databases
- Which to Choose?
SQL100 – Introduction to SQL Server using SQL Server or prior experience with SQL required. Some scripting experience is also recommended.
18 Hours | 3 Days or 6 Nights
Applies Towards the Following Certificates
- Hadoop Big Data 1 : Required