Here's a simple example of Python code that you might use in an AWS Glue ETL job. This code reads data from an S3 bucket, applies a basic transformation, and then writes the transformed data back to another S3 bucket.
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
# Initialize Glue Context
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
# Load data from S3
input_data = glueContext.create_dynamic_frame.from_options(
connection_type="s3",
connection_options={"paths": ["s3://your-input-bucket/input-data/"]},
format="json"
)
# Apply transformation: Filter out records where "age" is less than 30
filtered_data = Filter.apply(frame=input_data, f=lambda x: x["age"] >= 30)
# Write the transformed data back to S3 in JSON format
glueContext.write_dynamic_frame.from_options(
frame=filtered_data,
connection_type="s3",
connection_options={"path": "s3://your-output-bucket/transformed-data/"},
format="json"
)
# Commit job
job.commit()
GlueContext
, which is needed to interact with AWS Glue. The Job
object is initialized with the job name passed as an argument.create_dynamic_frame.from_options
.write_dynamic_frame.from_options
.Replace "s3://your-input-bucket/input-data/"
and "s3://your-output-bucket/transformed-data/"
with your actual S3 bucket paths. This script can be run in AWS Glue as part of a Glue job.
This example shows a basic ETL workflow using AWS Glue, demonstrating how to load data from S3, apply a transformation, and save the result back to S3.