SPAT - Apache Spark Application Performance Tuning - Online Training - English - Webinar von Fast Lane Institute for Knowledge Transfer

Inhalte

Spark Architecture
  • RDDs
  • DataFrames and Datasets
  • Lazy Evaluation
  • Pipelining
Data Sources and Formats
  • Available Formats Overview
  • Impact on Performance
  • The Small Files Problem
Inferring Schemas
  • The Cost of Inference
  • Mitigating Tactics
Dealing With Skewed Data
  • Recognizing Skew
  • Mitigating Tactics
Catalyst and Tungsten Overview
  • Catalyst Overview
  • Tungsten Overview
  • Mitigating Spark Shuffles
  • Denormalization
  • Broadcast Joins
  • Map-Side Operations
  • Sort Merge Joins
Partitioned and Bucketed Tables
  • Partitioned Tables
  • Bucketed Tables
  • Impact on Performance
Improving Join Performance
  • Skewed Joins
  • Bucketed Joins
  • Incremental Joins
Pyspark Overhead and UDFs
  • Pyspark Overhead
  • Scalar UDFs
  • Vector UDFs using Apache Arrow
  • Scala UDFs
  • Caching Data for Reuse
  • Caching Options
  • Impact on Performance
  • Caching Pitfalls
Workload XM (WXM) Introduction
  • WXM Overview
  • WXM for Spark Developers
Whats New in Spark 3.0?
  • Adaptive Number of Shuffle Partitions
  • Skew Joins
  • Convert Sort Merge Joins to Broadcast Joins
  • Dynamic Partition Pruning
  • Dynamic Coalesce Shuffle Partitions
Appendix A: Partition ProcessingAppendix B: BroadcastingAppendix C: Scheduling
Spark Architecture
  • RDDs
  • DataFrames and Datasets
  • Lazy Evaluation
  • Pipelining
Data Sources and Formats
  • Available Formats Overview
  • Impact on Performance
  • The Small Files Problem
Inferring Schemas
  • The Cost of Inferenc ...
Mehr Informationen >>

Lernziele

Students who successfully complete this course will be able to:

  • Understand Apache Sparks architecture, job execution, and how techniques such as lazy execution and pipelining can improve runtime performance
  • Evaluate the performance characteristics of core data structures such as RDD and DataFrames
  • Select the file formats that will provide the best performance for your application
  • Identify and resolve performance problems caused by data skew
  • Use partitioning, bucketing, and join optimizations to improve SparkSQL performance
  • Understand the performance overhead of Python-based RDDs, DataFrames, and user-defined functions
  • Take advantage of caching for better application performance
  • Understand how the Catalyst and Tungsten optimizers work
  • Understand how Workload XM can help troubleshoot and proactively monitor Spark applications performance
  • Learn about the new features in Spark 3.0 and specifically how the Adaptive Query Execution engine improves performance

Students who successfully complete this course will be able to:

  • Understand Apache Sparks architecture, job execution, and how techniques such as lazy execution and pipelining can improve runtime perfor ...
Mehr Informationen >>

Zielgruppen

This course is designed for software developers, engineers, and data scientists who have experience developing Spark applications and want to learn how to improve the performance of their code. This is not an introduction to Spark.

This course is designed for software developers, engineers, and data scientists who have experience developing Spark applications and want to learn how to improve the performance of their code. This i ...

Mehr Informationen >>

Termine und Orte

Datum Uhrzeit Dauer Preis
Webinar
04.06.2024 - 07.06.2024 09:00 - 17:00 Uhr 32 h Mehr Informationen >  
12.11.2024 - 14.11.2024 09:00 - 17:00 Uhr 24 h Mehr Informationen >  

SG-Seminar-Nr.: 6723024

Preis jetzt anfragen

Seminar merken ›

Semigator berücksichtigt

  • Frühbucher-Preise
  • Last-Minute-Preise
  • Gruppenkonditionen

und verfügt über Sonderkonditionen mit einigen Anbietern.

Der Anbieter ist für den Inhalt verantwortlich.

Veranstaltungsinformation

  • Webinar
  • Englisch
    • Teilnahmebestätigung
  • 32 h

Ihre Vorteile mehr erfahren

  • Anbietervergleich von über 1.500 Seminaranbietern
  • Vollständige Veranstaltungsinformationen
  • Schnellbuchung
  • Persönlicher Service
Datum Uhrzeit Dauer Preis
Webinar
04.06.2024 - 07.06.2024 09:00 - 17:00 Uhr 32 h Mehr Informationen >  
12.11.2024 - 14.11.2024 09:00 - 17:00 Uhr 24 h Mehr Informationen >