AWS Glue

Simple Python Code Example Using AWS Glue

Here's a simple example of Python code that you might use in an AWS Glue ETL job. This code reads data from an S3 bucket, applies a basic transformation, and then writes the transformed data back to another S3 bucket.

Example: Simple ETL Job Using AWS Glue


import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

# Initialize Glue Context
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

# Load data from S3
input_data = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={"paths": ["s3://your-input-bucket/input-data/"]},
    format="json"
)

# Apply transformation: Filter out records where "age" is less than 30
filtered_data = Filter.apply(frame=input_data, f=lambda x: x["age"] >= 30)

# Write the transformed data back to S3 in JSON format
glueContext.write_dynamic_frame.from_options(
    frame=filtered_data,
    connection_type="s3",
    connection_options={"path": "s3://your-output-bucket/transformed-data/"},
    format="json"
)

# Commit job
job.commit()

Explanation:

Initialization: The script initializes a GlueContext, which is needed to interact with AWS Glue. The Job object is initialized with the job name passed as an argument.
Loading Data: The data is loaded from an S3 bucket in JSON format using create_dynamic_frame.from_options.
Transformation: The transformation step filters the records, keeping only those where the "age" field is 30 or older.
Writing Data: The transformed data is written back to another S3 bucket in JSON format using write_dynamic_frame.from_options.
Job Commit: The job is committed to signal completion.

Usage:

Replace "s3://your-input-bucket/input-data/" and "s3://your-output-bucket/transformed-data/" with your actual S3 bucket paths. This script can be run in AWS Glue as part of a Glue job.

This example shows a basic ETL workflow using AWS Glue, demonstrating how to load data from S3, apply a transformation, and save the result back to S3.

What is Glue ETL?

Breakdown:

AWS Glue as an ETL Service:

Example ETL Workflow with Glue:

Simple Python Code Example Using AWS Glue

Example: Simple ETL Job Using AWS Glue

Explanation:

Usage: