Managed and External Tables in Delta Lake
Managed Table
- Storage Location: Delta Lake automatically manages both the data and metadata. The data is stored in a location controlled by the Delta Lake system.
- Lifecycle Management: When you create a managed table, Delta Lake determines the storage location, typically inside the database directory. When the table is dropped, both the data and metadata are deleted automatically.
- Use Case: Suitable for cases where you want Delta Lake to manage the lifecycle of the data, including cleaning up when the table is no longer needed.
External Table
- Storage Location: The data resides in a location outside of Delta Lake's control, often specified by the user (e.g., an S3 or HDFS path). Delta Lake only manages the metadata, not the physical storage of the data.
- Lifecycle Management: When an unmanaged table is dropped, only the metadata is removed, but the data remains intact at its specified storage location.
- Use Case: Ideal when you want to retain control over the data storage location, or the data is shared with other systems.
Summary
Managed Table: Delta Lake controls both the data and metadata, and automatically cleans up data when the table is dropped.
External Table: Delta Lake only controls the metadata, and the user manages the data location. Dropping the table doesn't remove the data itself.