rdd = sc.parallelize([1, 2, 3, 4, 5])
map()
and flatMap()
in PySpark?map()
transformation in PySpark applies a function to each element of the RDD and returns a new RDD with the results. The flatMap()
transformation can return multiple elements for each input element and flattens them into a single list.map()
or filter()
. Actions trigger the execution of transformations to return results to the driver program, such as count()
or collect()
.join()
function on DataFrames. For example, to perform an inner join:df1.join(df2, df1.id == df2.id, 'inner')