AWS::DynamoDB::Table
- Serverless
NoSQL database
- Multi-AZ
-
Integration with IAM for authentication & authorization
-
Structure
- DynamoDB is made of
Tables
(Collection) - Each table can have infinite number of
Items
(Document). With maximum size of 400KB - Each item has
Attributes
(Field)
Features
- On demand backup
- PITR recovery
- SQL developer tools
- Time to live (TTL)
- In-memory performance
-
ACID transactions
-
Integrations
- DynamoDB streams and Kineses Data Streams
- CloudWatch
Transactions
- Write to two tables at the same time or none
- Available as of 2018
- Does not use locks! It just fails if any item is modified during the transaction
- You should design your system to handle these
write collisions
- Limited to
25 items
per transaction - Transactions lead to
double to the cost
. Given that an additional read have to be done to ensure the item hasn't changed (read + commit) - Uses the
TransactGetItems
andTransactWriteItems
APIs
Expressions
- https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.OperatorsAndFunctions.html
- Expressions
Condition expression
: specify which items should be modifiedFilter expression
- Those can be used in
PutItem
,UpdateItem
, andDeleteItem
APIs
Read
Scan
- Scan the whole table to find an item
- Consumes lots of RCU
- filter expressions are applicable only for
scan
andquery
operations
import boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
client = boto3.resource('dynamodb')
table = client.Table('MyTable')
# filter expression can be any attribute (not only hash or sort key)
response = table.scan(
FilterExpression = Attr('MyKey').eq('USA')
)
response = table.scan(
FilterExpression =
Attr('MyKey').eq('USA') &
Attr('MyKey').begins_with('2019')
)
Query
- Returns a list of items
- Query operations require at least a
hash key
- Additionally you can use
filter expressions
to query based on other fields that are not hash/range keys - It's like a scan but within a partition only
import boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
client = boto3.resource('dynamodb')
table = client.Table('MyTable')
response = table.query(
KeyConditionExpression =
Key('MyPartitionKey').eq('Lala') &
Key('MySortKey ').gt('2019-01-01')
)
GetItem
- Returns one specific item
- Requires the
hash key
and therange key
(if any)
import boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
client = boto3.resource('dynamodb')
table = client.Table('MyTable')
response = table.get_item(
Key = {
'MyPartitionKey': 'Lala',
'MySortKey': '2019-11-17'
}
)
TransactGetItems
- Get multiple items in multiple tables at once. If during this process one items changes, the whole transaction fails
Write
PutItem
aws dynamodb update-item \
--table-name "proposals" \
--key '{"status": {"S": "open"}, "expires-at": {"S": "???now"}}' \
--update-expression "SET status = expires" \
--expression-attribute-values '{"nextStartTime": { "N": "4"}}'
BatchWriteItems
- Allow partial write (if any of the data changes during the operation)
TransactWriteItems
- Write to multiple items in multiple tables at once. If any item modifies while the operation is taking place, the whole transaction is aborted
Properties
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions:
- AttributeDefinition
BillingMode: String
ContributorInsightsSpecification:
ContributorInsightsSpecification
DeletionProtectionEnabled: Boolean
GlobalSecondaryIndexes:
- GlobalSecondaryIndex
ImportSourceSpecification:
ImportSourceSpecification
KeySchema:
- KeySchema
KinesisStreamSpecification:
KinesisStreamSpecification
LocalSecondaryIndexes:
- LocalSecondaryIndex
OnDemandThroughput:
OnDemandThroughput
PointInTimeRecoverySpecification:
PointInTimeRecoverySpecification
ProvisionedThroughput:
ProvisionedThroughput
ResourcePolicy:
ResourcePolicy
SSESpecification:
SSESpecification
StreamSpecification:
StreamSpecification
TableClass: String
TableName: String
Tags:
- Tag
TimeToLiveSpecification:
TimeToLiveSpecification
KeySchema
- Each table has
Partition Key
(hash) and an optionalSort Key
(range) - Partition Key
- It's used as input to the hashing function
- The output of the hashing function will tell to which physical partition the item will go
- Same partition key goes to same partition (similar to kafka)
- This speeds the lookup, given that with the partition key it's possible to know in which partition the data is stored
- Sort key
- It's used to tell where dynamo will store the data inside of the partition
- This way, the values for the same partition key can be sorted physically within a partition
- The combination of both is the
Primary Key
GlobalSecondaryIndexes
- Creating a GSI clones the table using a new partition key (
GSI Partition Key
) and optionally a new sort key - The main table and the GSI tables are kept in-sync
- The original partition key becomes a conventional attribute in the GSI Table
- On the
GSI table
you can thenquery on attributes
(the GSI partition key) that is not original partition key or sort key - More efficient! Avoids scanning the whole table
- The RCU / WCU is defined separately for the
GSI table
- Allows search across partitions
- Writes to the Main Table leads to writes to GSI, which doubles the cost of writing
- Use the WCU for the GSI equal to the WCU of the main table!
- There we be an inconsistency between the main table and the GSI while it's being sync (
eventual consistency
) - You can have up to
20 GSIs
per table - It's not possible to write to an GSI directly. It's used only for read operations. The write to a GSI is executed automatically under the hood when you write to the main table
- If you need to update by a hash key of the GSI, first you need to get the item by GSI hash key and then use the hash key of the main table to update it
LocalSecondaryIndexes
- An LSI adds an additional sort key (the
LSI Sort Key
) - It's an index that has the same partition key as the base table, but a different sort key
- This way, you can fetch the item directly using the partition key + the LSI sort key
- Allows searching within the same partition (or same partition key)
- Can only be defined at table creation time
- No extra cost! (this doesn't clone the table like GSI does)
- You can have up to
5 LSIs
per table
AttributeDefinitions
- Define all the attributes that are going to be used for query
- Each attribute defined here must be either in KeySchema or GlobalSecondaryIndexes
-
Attribute created on-the-fly cannot be used for searching
-
Data types
Scalar Types
: String, Number, Binary, Boolean, NullDocument Types
: List, MapSet Types
: String Set, Number Set, Binary Set
{
"my-string": {
"S": "aa"
},
"my-number": {
"N": "0"
},
"my-boolean": {
"BOOL": false
},
"my-binary": {
"B": ""
},
"my-null": {
"NULL": true
},
"my-string-set": {
"SS": ["aa", "bb", "cc"]
},
"my-number-set": {
"NS": ["0", "1", "2"]
},
"my-binary-set": {
"BS": ["", ""]
},
"my-list": {
"L": [
{
"S": "aa"
},
{
"N": "0"
}
]
},
"my-map": {
"M": {
"key1": {
"S": "aa"
},
"key2": {
"N": "0"
}
}
}
}
BillingMode
-
It's how to control the table's
capacity
(read/write throughput) -
Provisioned Mode (default)
RCU
: read capacity unitWCU
: write capacity unitConsumed Capacity
: RCU + WCU consumed so farProvisioned Capacity
- RCU + WCU total provisioned
- This is actually what you pay for. This is specified beforehand
- The total capacity unit (read or write) is shared (split equally) for all partitions. Therefore it's important the spread the data evenly across the partitions
Burst Capacity
: an additional pool of capacity that dynamo saves to avoid throttling when the consumed capacity exceeds the provisioned capacity a bit. Usually for random spikes in usage. It's transparent for the user/developer- Autoscaling (self adjusting provisioned capacity) can be configured for peak hours. Under the hood, autoscaling in done by cloud watch alarms that trigger a table config update
Adaptive capacity
can also be used. With that, a hot partition can borrow capacity from another idler partition- If the capacity is exceeded (
throttling
) dynamo will reject the request - There is a hard limit of
3000 RCU
and1000 WCU
per partition, if you need to go over it you need to use DAX (caching layer) - On-Demand Mode
- Scales automatically based on the workload
- More expensive!
- Useful for very unpredictable workloads
StreamSpecification
DynamoDB Streams
offers an ordered stream of modifications in a table (similar to a Kafka Connector Source)- INSERT
- UPDATE
- REMOVE
- It's a change data capture (CDC)
- Capture item-level changes in the table and push the changes to DynamoDB streams
- Events are guaranteed in the same order the modification took place
- The change can be accessed through
DynamoDB Streams API
- Streams can be sent to
- Kinesis Data Streams (does not guarantee ordering of the events or even duplicate events)
- AWS Lambda
- Kinesis Client Library applications
- Data retention
up to 24 hours
- Use cases:
- React to changes in real-time (e.g., welcome new users)
- Analytics
- Real Time Dashboards
- Insert into derivative tables
- Insert into elasticsearch
- Implement cross-region replication
- The event
- Keys
- NewImage
- OldImage
- NewImage & OldImage
{
"Records": [
{
"eventId": "1",
"eventName": "INSERT",
"eventVersion": "1.0",
"eventSource": "aws:dynamodb",
"awsRegion": "us-east-1",
"dynamodb": {
"NewImage": {
"playerId": {
"S": "11111"
},
"date": {
"S": "Aug 10 2022 10:00:00"
},
"score": {
"N": "100"
}
}
},
"eventSourceARN": "MyTableARN"
}
]
}
- IAM role setup
- For lambda you can use the pre-built policy
AWSLambdaDynamoDBExecutionRole
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
// to access the dynamodb stream
"dynamodb:DescribeStream",
"dynamodb:GetRecords",
"dynamodb:GetShardIterator",
"dynamodb:ListStreams",
// to log on cloud watch
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
TimeToLiveSpecification
- TTL
- Automatically expire an item using a
timestamp
key that can be configured - The timestamp key must be in
epoch seconds
format - Stores the time when it will be expired
- Must be
N
type key (number) - The deletion is not immediate. It can take up to 48 hours. If the table is long it can take even longer. Don't rely on it.
- Free! Does not consume write throughput
- Consumes
burst capacity
. That means that if the burst capacity is over, the deletion will be delay until it's recovered
PointInTimeRecoverySpecification
- Snapshots of the table that allows reverting back to a specific point in time