AWS::S3::Bucket

Store objects (files) in buckets (directories)
Buckets have a globally unique name
All operations are strong consistent: after write/delete in an object (PUT/DELETE), a subsequent read (GET) will have the latest version of the object
By default, S3 objects are owned by the AWS account that uploaded it

Objects

key: the unique identification for the object is its full path. E.g., s3://my-bucket/my-folder/my-file.txt
size: Max object size is 5TB (but upload must happen with 5GB chunks)
metadata: list of key-value pairs
tags: up to 10
version: version id

MFA Delete

Multiple confirmations required for deleting
To permanently delete a version, to suspend the versioning, etc
Only root account can enable/disable this option through the CLI

# enable MFA delete (must be root account)
aws s3api put-bucket-versioning \
  --bucket "my-bucket" \
  --versioning-configuration "Status=Enabled,MFADelete=Enabled"

Requester Pays

Charger the 3rd party who is willing to request data to your bucket for the network costs only
The owner will pays for the storage costs
requester must be authenticated in AWS

Requester Pays

Pre-signed URL

Pre-Signed URL: URLs valid for a limited time (3600s by default)
For downloads, CLI can be used
For uploads, SDK can be used
Access multiple operations GET, PUT, POST, ...

# generate pre-signed URL for an object
aws s3 presign "s3://mybucket/myobject.txt" --region "sa-east-1"
aws s3 presign "s3://mybucket/myobject.txt" --region "sa-east-1" --expires-in "300" # 3600 by default
  ```

## Performance

- `3500` PUT/COPY/POST/DELETE per second per prefix
- `5000` GET/HEAD per second per prefix

- `KMS` limits the performance because KMS has a per-request quota
  ![S3 KMS Quota](.images/s3-kms-quota.png)

- `Multi-part upload`
  - It is recommended for files > 100MB and required for files > 5GB
  - It parallelize the upload and achieve higher throughput

- `S3 Byte-Range Fetches`
  - Optimizes READ
  - Parallelize GET requests
  - Can also be used to retrieve only a part of the file (e.g., only the header)

- `S3 Select & Glacier Select`
  - Allow queries to the S3 using `SQL language` (server-side filtering)
  - Avoid unnecessary data filtering by the application
  - Can also search in a csv file
  - Less network traffic!
    ![S3 Select](.images/s3-select.png)

## Properties

- <https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-s3-bucket.html>

```yaml
Type: AWS::S3::Bucket
Properties:
  AccelerateConfiguration:
    AccelerateConfiguration
  AccessControl: String
  AnalyticsConfigurations:
    - AnalyticsConfiguration
  BucketEncryption:
    BucketEncryption
  BucketName: String
  CorsConfiguration:
    CorsConfiguration
  IntelligentTieringConfigurations:
    - IntelligentTieringConfiguration
  InventoryConfigurations:
    - InventoryConfiguration
  LifecycleConfiguration:
    LifecycleConfiguration
  LoggingConfiguration:
    LoggingConfiguration
  MetricsConfigurations:
    - MetricsConfiguration
  NotificationConfiguration:
    NotificationConfiguration
  ObjectLockConfiguration:
    ObjectLockConfiguration
  ObjectLockEnabled: Boolean
  OwnershipControls:
    OwnershipControls
  PublicAccessBlockConfiguration:
    PublicAccessBlockConfiguration
  ReplicationConfiguration:
    ReplicationConfiguration
  Tags:
    - Tag
  VersioningConfiguration:
    VersioningConfiguration
  WebsiteConfiguration:
    WebsiteConfiguration

AccelerateConfiguration

S3 Transfer Acceleration (S3TA)
Increase transfer speed
Transfer to AWS edge location and forwards to target region
With S3TA, you pay only for transfers that are accelerated

BucketEncryption

SSE-S3
Keys managed by AWS
SSE: server side encryption
Header "x-amz-server-side-encryption":"AES256"
SSE-KMS
Encryption and keys managed by KMS (key management service)
User control, audit trail support, rotation policy of the key
Header "x-amz-server-side-encryption":"aws:kms"
SSE-C
You manage your own keys
Through CLI or SDK only (not console)
HTTPS is mandatory because you have to pass the key to aws
Key must be provided in the header in every request
AWS just use the key to encrypt and discard it
Header "x-amz-server-side-encryption":"aws:c"
Client Side Encryption
You encrypt the data and send it
S3 encryption SDK can help with that
A default encryption method can be set for all files. Also, different methods can override the default encryption for each single file and version

CorsConfiguration

CORS (Cross-Origin Resource Sharing): get resource from another origin
Web Browser Policy: allow requests to other origins only if this origin being requested allows CORS
Origin: scheme (protocol) + host (domain) + port. E.g., https://www.example.com:443
Same origin: http://example.com/app1 & http://example.com/app2
Different origins: http://example.com/ & http://other.example.com/
Preflight Request
The browser will perform a preflight request to the target origin in order to check if it allows CORS.
It's a OPTIONS request with Host (target origin) and Origin (source origin)
The target origin responds with the allowed methods
- Access-Control-Allow-Origin: http://www.source-origin.com
- Access-Control-Allow-Methods: GET, PUT, DELETE
S3 CORS
CORS must be enabled for the bucket in order to allow requests from other origins
CORS is enabled and defined under permissions config in the bucket

[
  {
    "AllowedHeaders": ["Authorization"],
    "AllowedMethods": ["GET"],
    "AllowedOrigins": ["http://source-bucket.com"],
    "ExposeHeaders": [],
    "MaxAgeSeconds": 3000
  }
]

This responds with the following headers:
Access-Control-Allow-Origin: http://source-bucket.com/
Access-Control-Allow-Methods: GET

S3 CORS

<html>
  <head>
    <title>My First Webpage</title>
  </head>
  <body>
    <h1>I love coffee</h1>
    <p>Hello world!</p>
  </body>
  <img src="coffee.jpg" width="500" />

  <!-- CORS demo -->
  <div id="tofetch" />
  <script>
    var tofetch = document.getElementById("tofetch");

    // load from URL in the same origin
    fetch("extra-page.html")
      .then((response) => {
        return response.text();
      })
      .then((html) => {
        tofetch.innerHTML = html;
      });

    // load from URL in different origin (CORS in the target origin must be enabled)
    fetch(
      "http://target-bucket.s3-website-eu-west-1.amazonaws.com/extra-page.html"
    )
      .then((response) => {
        return response.text();
      })
      .then((html) => {
        tofetch.innerHTML = html;
      });
  </script>
</html>

LifecycleConfiguration

StorageClass
Standard - General Purpose
- Durability 99.999999999%
- Availability 99,99%
- Use cases: big data analytics, mobile&gaming apps, content distribution
Standard - Infrequent Access (IA)
- Infrequent access, but rapid access when needed
- Availability 99,9%
- Use cases: backups
Intelligent Tiering
- Small monthly fee for monitoring and auto-tiering
- Automatically move objects between tiers. E.g. GA -> IA
- Availability 99,9%
One Zone - Infrequent Access (IA)
- Same as Standard IA but in a single AZ
- Availability 99,5%
- Use cases: secondary backup
Glacier
- $0.004/GB + retrieval cost
- Each archive has up to 40 TB
- Archive are stored in vaults
- Retrieval options: expedited (1-5min), standard (3-5h), bulk (5-12h)
- Minimum storage duration of 90 days
Glacier Deep Archive
- Even cheaper!
- Retrieval options: standard (12h), bulk (48h)
- Minimum storage duration of 180 days

Storage Classes

Storage Classes Costs

The storage class of an object can be set for each file (upon uploading or afterwards)
Lifecycle Rules
Lifecycle rules can be created under Management Tab
S3 Analytics can be setup to automatically transition objects. Only Standard to Standard-IA
Transition actions: E.g., move to Standard IA 60 days after creation and to glacier 6 months after
- Transition current versions of objects between storage classes
- Transition previous versions of objects between storage classes
- Expire current versions of objects
- Permanently delete previous versions of objects
- Delete expired delete markers or incomplete multipart uploads

Lifecycle Rules

LoggingConfiguration

Access Logs can be stored in another s3 bucket. Do not store it in the same bucket otherwise it will loop forever
API calls can be logged in cloudtrail
Can be activated under Server Logging Access in properties tab

NotificationConfiguration

S3 Event Notifications
S3:ObjectCreated, S3:ObjectRemoved, ...
Events notification is defined under properties tab
Rules can be applied to monitor only certain objects
Example of event notification use: generate thumbnail as soon as a jpg file is created in the bucket
Events can be sent to SNS, SQS or Lambda Functions in order for the event to be further processed
The target broker (SNS, SQS, etc) must have access policies to allow s3 to publish in it

ObjectLockConfiguration

Places an Object Lock configuration on the specified bucket.
The default retention can also be override when you explicitly apply a retention period to an object version (Retain Until Date)
Different versions of an object can have different retention modes and periods
The rule specified in the Object Lock configuration will be applied by default to every new object placed in the specified bucket
The DefaultRetention specifies:
Mode
- GOVERNANCE: users can't modify or delete versions
- COMPLIANCE: versions can't be overwritten (even by root)
Period
- Retention Period
- Legal Hold (no expiry date)
For archives you can add Vault Lock and adopt WORM (write once read many)
This way your archive cannot be modified/deleted. Good for compliance & audit!

PublicAccessBlockConfiguration

Configuration to block public access to objects
This is used to prevent data leaks
This configuration can also be applied at the account level (for all buckets)

ReplicationConfiguration

CRR: Cross Region Replication
SRR: Same Region Replication
Copy is asynchronous
Versioning must be enabled in order to replicate
S3 must have proper IAM permissions
Use cases: compliance, lower latency
After activating, only new objects are replicated
Replication cannot be chained! Replica from 1 to 2 won't replicate from 2 to 3
Replication rule
It's configured under replication rules in Management Tab
Replication can be activated for all objects or specific objects with filter
For DELETE operations, you can choose whether the delete markers will be replicated

VersioningConfiguration

Once you version-enable a bucket, it can never return to an unversioned state
Versioning can only be suspended once it has been enabled. Suspending the versioning won't remove the versions already created
With versioning, you can easily recover from both unintended user actions and application failures
When overriding a file, a new version is created
Objects with version id null means they were created when versioning was not activated
Deleted objects receive a delete marker. And the previous versions of it are preserved. To delete an object completely, all the versions must be deleted

WebsiteConfiguration

S3 can host static websites and have them accessible on the www
Under bucket properties, static website hosting must be activated
block public access must be disabled
A bucket policy must enable access to its files

{
  "Id": "Policy1633026957759",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1633026955731",
      "Action": ["s3:GetObject"],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::hvitoi/*",
      "Principal": "*"
    }
  ]
}

<html>
  <head>
    <title>My First Webpage</title>
  </head>
  <body>
    <h1>I love coffee</h1>
    <p>Hello world!</p>
  </body>
  <img src="coffee.jpg" width="500" />
</html>