ETL (Extract, Transform, Load)
- ETL processes are commonly used to move data from OLTP systems into OLAP systems
- It's a system that
extractsthe unstructured data from the production databases,transformsthe data into structure and meaningful data andloadsthe data into an analytical environment - For example, transactional data from an OLTP database could be extracted and aggregated into a data warehouse for analytical processing
Extract
- Collects data from the transactional environment (OLTP)
- Get the raw data (e.g., database log files) and store it in a data lake (e.g., S3 with avro files)

Volume: TB ~ PB /dayVariety: mixed data types from multiple sourcesVelocity: high rate
Transform
- Processes the collected raw data from the
data lakeand outputs data that can be used for analyses - The raw data is
convertedinto materialized views of datasets - These datasets have
contractsorschemasto help on the data analysis - The raw data is divided into small batches of subtasks, and then processes each batch separately
Load
- Loads the data into analytical environments or other environment
- Data Warehouse (Google Big Query)
- Databricks
-
Back to the transactional environment. E.g., (fraud data)
-
Propagates the computed data to
BI toolsanalytical environments (such as Looker, Google Data Studio) and to the production environment -
All the data that was transformed and stored can be accessed through various BI tools
-
Data Governance
- Implementing granular access control of data in the analytical environments
- Managing PII data inventory and data subject rights