Table
Managed Table
- On
managed tables
, the original data is copied/moved into Hive
Schema On Read
: Hive takes unstructured data and apply its schema as it's being read
CREATE TABLE ratings (
userId INT,
movieId INT,
rating INT,
time INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
-- Schema for the raw file imported (it will be applied when data is read)
-- The data is never imported. The schema and how to parse it that is created
LOAD DATA LOCAL INPATH '${env:HOME}/ml/100k/ratings.csv'
OVERWRITE INTO TABLE ratings
LOCAL DATA
: moves data from RDFS into Hive
LOCAL DATA LOCAL
: copies data from local fs into Hive (not dealing with big data)
External Table
- A table to be accessed lives outside of Hive
- Hive doesn't take ownership of the original data
- The original data is left untouched
CREATE EXTERNAL TABLE IF NOT EXISTS ratings (
userId INT,
movieId INT,
rating INT,
time INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/data/ml-100k/movies.csv';