Cosmos DB
- Azure Cosmos DB is a
fully managed document NoSQL database
- CosmosDB
automatically scales storage
,automatically scale compute
- https://hvitoi.documents.azure.com:443/
-
Data can be accessed by the
Data Explorer
-
NoSQL characteristics
-
No need to define schemas
- Querying data lead to performance overhead
- No need to maintain relationships
-
NoSQL db types: key-value pair, document, graph
-
CosmosDB levels
-
Database account
: the cosmos db account Database
: databaseContainer
: like a table (E.g., Users)-
Item
: record -
Cosmos APIs
-
Cosmos DB supports the following APIs to interact with its data
Core (SQL)
MongoDB
Cassandra
Azure Table
Gramlin (graph)
-
Capacity mode
-
Provisioned throughput
: charged based on the amount of storage (Request units). 400 RU/s and 5 GB storage in the free tier Serverless
Consistency levels
- Replication of data for redundancy
- There is a
leader
partition andfollowers
partitions in a single region - The data is then replicated across regions and the data might differ due to the delay on syncing
- Consistency types
Strong
: Guarantee that the data accross regions is equal! Loses performanceBounded staleness
: Reads can lag behind the writes by at mostK
versions of an item or byT
time intervalSession (default)
: reads can be guaranteed for the same sessionConsistent prefix
: delay in the reads of the most recent data but you will never see out of order writesEventual
: Wins on performance but loses on consistency
Partitioning
- Every container has a
partition key
, which is the key in the item that tells to whichpartition
this item belongs to - Once the partition key is defined, the data is stored automatically to the corret partition
- Segregating the data slipt across
physical partitions
makes it easier to fetch data. It limits the scope of search for a particular item - Once you set the partition key, you can't change it!
- The combination
id + partitionKey
uniquely identifies the item within a container - You have to define your partitions so that the Request Units are distributed evenly across the partitions. Do not create overloaded hotspot partitions!
- On designing the partition, try to make it so that the queries fetch the data from only only partition
- E.g.,
SELECT * FROM customer WHERE customer.city="New York"
with customer.city as the partition key -
Rules and good practices for choosing the partition key
-
Properties that do not change
- You cannot change the partition key for a container
- Ensure the property has a high range of values. Reads can be distributed across partitions
-
Your queries should use the partition key. Avoid cross partitions queries
-
Synthetic Partition Key
-
A field inserted into the item to be the partition key
- The synthetic key value is usually composed of combinations of values of other fields E.g.:
{
"year": "2020",
"class": "A",
"key": "2020-A"
}
- You can use as a synthetic key
- A combination of values
- Append to a value a random suffix
- Append to a value a hash prefix
Data Explorer
- View, add, remove items
- The
id
key is automatically assigned with an UUID
{
"id": "3cc408f2-c3bd-4268-a6a6-bd7a75f0ba7e",
"customerId": "C2",
"customerName": "John",
"orders": [
{
"orderId": "O1",
"couse": "AZ-104",
"price": "100"
}
]
}
SQL API
-- c is the alias for the customer container
SELECT * from c
--
SELECT * from c.orders
-- nested query
SELECT o.orderId, o.course
FROM o IN course.orders
WHERE o.course='AZ-104'
-- join
SELECT li.id AS lineitemid, li.price
FROM Order o
JOIN li IN o.line_items
ORDER BY o.address.city ASC
Dotnet connectivity
- Uses the NuGet package
Microsoft.Azure.Cosmos
- Get the
primary connection string
underKeys
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net5.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Azure.Cosmos.Table" Version="1.0.8" />
</ItemGroup>
</Project>
CRUD operations
class Course
{
public string id { get; set; }
public string courseid { get; set; }
public string coursename { get; set; }
public decimal rating { get; set; }
public List<Order> orders { get; set; }
}
using Microsoft.Azure.Cosmos;
using System;
namespace AzureCosmosDB
{
class Program
{
private static readonly string _connection_string = "AccountEndpoint=https://hvitoi.documents.azure.com:443/;AccountKey={account-key};";
private static readonly string _database_name = "appdb";
private static readonly string _container_name = "course";
private static readonly string _partition_key = "/coursename";
static void Main(string[] args)
{
// Get Cosmost account instance
CosmosClient _cosmos_client = new CosmosClient(_connection_string, new CosmosClientOptions() { AllowBulkExecution=true} ); // second parameter is the client options
// Create database
_cosmos_client.CreateDatabaseAsync(_database_name).GetAwaiter().GetResult();
Console.WriteLine("Database created!");
Database _database_client = _cosmos_client.GetDatabase(_database_name); // get database instance
// Create a container
_database_client.CreateContainerAsync(_container_name, _partition_key).GetAwaiter().GetResult(); // upon creation a partition key must be defined
Console.WriteLine("Container created!");
Container _container_client = _cosmos_client.GetContainer(_database_name, _container_name); // get container instance
// Query items
string _query = "SELECT * FROM c WHERE c.courseid='Course0002'"; // c is he course container
QueryDefinition _definition = new QueryDefinition(_query);
FeedIterator<Course> _iterator = _container_client.GetItemQueryIterator<Course>(_definition); // Iterates through items of course class
Console.WriteLine("Listing courses:");
while(_iterator.HasMoreResults)
{
FeedResponse<Course> _response = _iterator.ReadNextAsync().GetAwaiter().GetResult();
foreach(Course _course in _response)
{
Console.WriteLine($"Id is {_course.id}");
Console.WriteLine($"Course id is {_course.courseid}");
Console.WriteLine($"Course name is {_course.coursename}");
Console.WriteLine($"Course rating is {_course.rating}");
}
}
// Insert single item
Course _course = new Course()
{
id = "1",
courseid = "C00010",
coursename = "AZ-204",
rating = 4.5m
};
_container_client.CreateItemAsync<Course>(_course, new PartitionKey(_course.courseid)).GetAwaiter().GetResult(); // set the partition key for that item
Console.WriteLine("Item created!");
// Insert multiple items
List<Course> _lst = new List<Course>()
{
new Course() { id="1", courseid="Course0001", coursename = "AZ-204 Developing Azure solutions", rating = 4.5m },
new Course() { id="2", courseid="Course0002", coursename = "AZ-303 Architecting Azure solutions", rating = 4.6m },
new Course() { id="3", courseid="Course0003", coursename = "DP-203 Azure Data Engineer", rating = 4.7m },
new Course() { id="4", courseid="Course0004", coursename = "AZ-900 Azure Fundamentals", rating = 4.6m },
new Course() { id="5", courseid="Course0005", coursename = "AZ-104 Azure Administrator", rating = 4.5m }
};
List<Task> _tasks = new List<Task>(); // Task is C# class to represent a function to be executed (similar to a promise in node)
foreach (Course _course in _lst)
{
_tasks.Add(_container_client.CreateItemAsync<Course>(_course, new PartitionKey(_course.courseid)));
}
Task.WhenAll(_tasks).GetAwaiter().GetResult();
Console.WriteLine("Items created!");
// Update item
string _id = "2";
string _partition_key = "Course0002";
ItemResponse<Course> _response = _container_client.ReadItemAsync<Course>(_id, new PartitionKey(_partition_key)).GetAwaiter().GetResult();
Course _course = _response.Resource; // modify the item in memory
_course.rating = 4.8m;
_container_client.ReplaceItemAsync<Course>(_course, _id, new PartitionKey(_partition_key)).GetAwaiter().GetResult(); // update!
Console.WriteLine("Item updated!");
// Delete item
string _id = "2";
string _partition_key = "Course0002";
_container_client.DeleteItemAsync<Course>(_id, new PartitionKey(_partition_key)).GetAwaiter().GetResult();
Console.WriteLine("Item deleted!");
// Pause execution
Console.ReadKey();
}
}
}
Import data from json
class Data
{
public string id { get; set; }
public string Correlationid { get; set; }
public string Operationname { get; set; }
public string Status { get; set; }
public string Eventcategory { get; set; }
public string Level { get; set; }
}
FileStream fs = new FileStream(System.Environment.CurrentDirectory + @"/data/QueryResult.json", FileMode.Open, FileAccess.Read);
StreamReader reader = new StreamReader(fs);
JsonTextReader jsonReader = new JsonTextReader(reader);
CosmosClient cosmosClient = new CosmosClient(connectionString);
Container container = cosmosClient.GetContainer(databaseName, containerName);
while (jsonReader.Read())
{
if (jsonReader.TokenType==JsonToken.StartObject)
{
JObject _object = JObject.Load(jsonReader);
Activity activity = _object.ToObject<Activity>();
activity.id = Guid.NewGuid().ToString();
container.CreateItemAsync<Activity>(activity, new PartitionKey(activity.Operationname)).GetAwaiter().GetResult();
Console.WriteLine($"Adding item {activity.Correlationid}");
}
}
Console.WriteLine("Written data to Azure Cosmos DB");
Console.ReadKey();
Embedding (nesting) data
- When there are contained relationships between entities
- When there is
one-to-few
relationships - When the embedded data changes infrequently
- When the embedded data is queried frequently together
class Order
{
public string orderid { get; set; }
public int price { get; set; }
}
class Course
{
public string id { get; set; }
public string courseid { get; set; }
public string coursename { get; set; }
public decimal rating { get; set; }
public List<Order> orders { get; set; }
}
Course _course = new Course()
{
id = "1",
courseid = "C00010",
coursename = "AZ-204",
rating = 4.5m
orders = new List<Order>() {
new Order() { orderid = "O2", price = 50},
new Order() { orderid = "O3", price = 80}
} // embedding data
};
Referencing data
- When representing
one-to-many
relationships - When representing
many-to-many
relationships - When there are related data changes
Change feed
Listens
to changes in a container- Outputs list of items in the order they modified to a
destination
- Ideal to implement workflows
Stored Procedure
- A set of instructions to be stored at the
container
that can be invoked from the client - The stored procedure is configured at the cosmos db data explorer
- The stored procedure must be written in javascript
// Stored Procedure id: sample
function sample() {
const context = getContext();
const response = context.getResponse();
response.setBody('Executing a stored procedure');
}
function sample2() {
const context = getContext();
const request = context.getRequest();
const body = request.getBody();
if !("customertier" in body) {
body["customertier"] = 0;
}
request.setBody(body);
}
// Cosmos DB instance
CosmosClient cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions() { AllowBulkExecution=true})
// Container instance
Container containerClient = cosmosClient.GetContainer(databaseName, containerName)
// Execute stored procedure
string output = containerClient.Scripts.ExecuteStoredProcedureAsync<string>('sample', new PartitionKey(String.empty), null).GetAwaiter().GetResult();
// Write the result of the executed stored procedure
Console.WriteLine(output);
Triggers
- Set a
trigger
to execute a set of instructionsbefore
an action happens orafter
it has been completed - A trigger is defined at the
Items
level in the Data Explorer
function AddTimestamp() {
var context = getContext();
var request = context.getRequest();
var itemToCreate = request.getBody();
if (!('timestamp' in itemToCreate)) {
var ts = new Date();
itemToCreate['timestamp'] = ts.getTime();
}
request.setBody(itemToCreate);
}
// The trigger must be mentioned in the RequestOptions
containerClient.CreateItemAsync<Course>(_course, null, new ItemRequestOptions { PreTriggers = new List<string> {"AddTimestampTrigger"}}).GetAwaiter().GetResult();
Time to Live
- After time to live is reached, the item will be deleted automatically
- If there are no left-over request units (
RU
), the deletion of the item will be delayed TTL
can be set up atcontainer
oritem
level- It must be enabled on Data Explorer/Container/
Settings
-> Time To Live on (no default)
: to pick the ttl from the item itselfon
: to pick the ttl from a default value to be
{
"id": "1",
"name": "Maria",
"ttl": 20
}
Indexing
Indexing Policy
: Specify the fields to be indexedComposite index
: Necessary to perform "order by" queries with two or more properties
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
{
"path": "/city",
"order": "ascending"
},
{
"path": "/customername",
"order": "ascending"
}
]
}