Skip to content

Table

Managed Table

  • On managed tables, the original data is copied/moved into Hive
  • Schema On Read: Hive takes unstructured data and apply its schema as it's being read
CREATE TABLE ratings (
  userId INT,
  movieId INT,
  rating INT,
  time INT
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  STORED AS TEXTFILE;

-- Schema for the raw file imported (it will be applied when data is read)
-- The data is never imported. The schema and how to parse it that is created
LOAD DATA LOCAL INPATH '${env:HOME}/ml/100k/ratings.csv'
OVERWRITE INTO TABLE ratings
  • LOCAL DATA: moves data from RDFS into Hive
  • LOCAL DATA LOCAL: copies data from local fs into Hive (not dealing with big data)

External Table

  • A table to be accessed lives outside of Hive
  • Hive doesn't take ownership of the original data
  • The original data is left untouched
CREATE EXTERNAL TABLE IF NOT EXISTS ratings (
  userId INT,
  movieId INT,
  rating INT,
  time INT
)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LOCATION '/data/ml-100k/movies.csv';