Foundation Model
- Foundation Models are trained on a wide variety of input data (
unlabeled data) - Foundation models may cost tens of millions of dollars to train
- FMs can be fine-tuned if necessary to better fit the use-case

Generative modality
- Modalities
- Text
- Image
- Vision
-
Embedding
-
Some models are
multimodal
Large Language Models (LLM)
- Text To Text
- Relies on a foundation model
- Designed to generate coherent human-like text
- We usually interact with LLM via a
prompt -
The generated text is
non-deterministic -
Context Window - The number of tokens an LLM can consider when generating text (the "size" of your prompt)
- Large context windows require more memory and processing power

Diffusion Models
- Text To Image
- Trains using a
forward diffusion process - Adds noise to the image
- Generates image from noise
- Reverse the
diffusion process
