IBM DW606G - IBM Open Platform with Apache Hadoop - Updated - Seminar / Kurs von PROKODA GmbH

None, however, knowledge of Linux would be beneficial.

Inhalte

Key Topics:- Unit 1: IBM Open Platform with Apache Hadoop

Exercise 1: Exploring the HDFS

- Unit 2: Apache Ambari

Exercise 2: Managing Hadoop clusters with Apache Ambari

- Unit 3: Hadoop Distributed File System

Exercise 3: File access and basic commands with HDFS

- Unit 4: MapReduce and Yarn

Topic 1: Introduction to MapReduce based on MR1

Topic 2: Limitations of MR1

Topic 3: YARN and MR2

Exercise 4: Creating and coding a simple MapReduce job

Possibly a more complex second Exercise

- Unit 5: Apache Spark

Exercise 5: Working with Spark’s RDD to a Spark job

- Unit 6: Coordination, management, and governance

Exercise 6: Apache ZooKeeper, Apache Slider, Apache Knox

- Unit 7: Data Movement

Exercise 7: Moving data into Hadoop with Flume and Sqoop

- Unit 8: Storing and Accessing Data

Topic 1: Representing Data: CSV, XML, JSON, and YAML

Topic 2: Open Source Programming Languages: Pig, Hive, and Other [R, Python, etc]

Topic 3: NoSQL Concepts

Topic 4: Accessing Hadoop data using Hive

Exercise 8: Performing CRUD operations using the HBase shell

Topic 5: Querying Hadoop data using Hive

Exercise 9: Using Hive to Access Hadoop / HBase Data

- Unit 9: Advanced Topics

Topic 1: Controlling job workflows with Oozie

Topic 2: Search using Apache Solr

No lab exercises

Objectives:

- List and describe the major components of the open-source Apache Hadoop stack and the approach taken by the Open Data Foundation.

- Manage and monitor Hadoop clusters with Apache Ambari and related components

- Explore the Hadoop Distributed File System (HDFS) by running Hadoop commands.

- Understand the differences between Hadoop 1 (with MapReduce 1) and Hadoop 2 (with YARN and MapReduce 2).

- Create and run basic MapReduce jobs using command line.

- Explain how Spark integrates into the Hadoop ecosystem.

- Execute iterative algorithms using Spark’s RDD.

- Explain the role of coordination, management, and governance in the Hadoop ecosystem using Apache Zookeeper, Apache Slider, and Apache Knox.

- Explore common methods for performing data movement (Configure Flume for data loading of log files, Move data into the HDFS from relational databases using Sqoop)

- Understand when to use various data storage formats (flat files, CSV/delimited, Avro/Sequence files, Parquet, etc.).

- Review the differences between the available open-source programming languages typically used with Hadoop (Pig, Hive) and for Data Science (Python, R)

- Query data from Hive.

- Perform random access on data stored in HBase.

- Explore advanced concepts, including Oozie and Solr

Zielgruppen

This intermediate training course is for those who want a foundation of IBM BigInsights. This includes: Big data engineers, data scientist, developers or programmers, administrators who are interested in learning about IBM’s Open Platform with Apache Hadoop.

SG-Seminar-Nr.: 5254063

Preis jetzt anfragen

Seminar merken ›

Sie buchen immer automatisch den besten Preis für jeden Termin. Semigator berücksichtigt

  • Frühbucher-Preise
  • Last-Minute-Preise
  • Gruppenkonditionen

und verfügt über Sonderkonditionen mit einigen Anbietern.

Der Anbieter ist für den Inhalt verantwortlich.

Über Semigator mehr erfahren

  • Anbietervergleich von über 1.500 Seminaranbietern
  • Vollständige Veranstaltungsinformationen
  • Schnellbuchung
  • Persönlicher Service