Apache Spark

1. Spark Basics

2. Spark Architecture

3. Spark Components and Libraries

4. RDDs, DataFrames, and Datasets

5. Key Transformations and Actions

6. Optimizations in Spark

Catalyst Optimizer (for SQL and DataFrames): Uses rule-based and cost-based optimization to produce efficient query plans.

Memory Management: Manages caching, shuffle files, and resource allocation for performance.