Skip to content

Cosmos DB

  • Azure Cosmos DB is a fully managed document NoSQL database
  • CosmosDB automatically scales storage, automatically scale compute
  • https://hvitoi.documents.azure.com:443/
  • Data can be accessed by the Data Explorer

  • NoSQL characteristics

  • No need to define schemas

  • Querying data lead to performance overhead
  • No need to maintain relationships
  • NoSQL db types: key-value pair, document, graph

  • CosmosDB levels

  • Database account: the cosmos db account

  • Database: database
  • Container: like a table (E.g., Users)
  • Item: record

  • Cosmos APIs

  • Cosmos DB supports the following APIs to interact with its data

    1. Core (SQL)
    2. MongoDB
    3. Cassandra
    4. Azure Table
    5. Gramlin (graph)
  • Capacity mode

  • Provisioned throughput: charged based on the amount of storage (Request units). 400 RU/s and 5 GB storage in the free tier

  • Serverless

Consistency levels

  • Replication of data for redundancy
  • There is a leader partition and followers partitions in a single region
  • The data is then replicated across regions and the data might differ due to the delay on syncing
  • Consistency types
  • Strong: Guarantee that the data accross regions is equal! Loses performance
  • Bounded staleness: Reads can lag behind the writes by at most K versions of an item or by T time interval
  • Session (default): reads can be guaranteed for the same session
  • Consistent prefix: delay in the reads of the most recent data but you will never see out of order writes
  • Eventual: Wins on performance but loses on consistency

Partitioning

  • Every container has a partition key, which is the key in the item that tells to which partition this item belongs to
  • Once the partition key is defined, the data is stored automatically to the corret partition
  • Segregating the data slipt across physical partitions makes it easier to fetch data. It limits the scope of search for a particular item
  • Once you set the partition key, you can't change it!
  • The combination id + partitionKey uniquely identifies the item within a container
  • You have to define your partitions so that the Request Units are distributed evenly across the partitions. Do not create overloaded hotspot partitions!
  • On designing the partition, try to make it so that the queries fetch the data from only only partition
  • E.g., SELECT * FROM customer WHERE customer.city="New York" with customer.city as the partition key
  • Rules and good practices for choosing the partition key

  • Properties that do not change

  • You cannot change the partition key for a container
  • Ensure the property has a high range of values. Reads can be distributed across partitions
  • Your queries should use the partition key. Avoid cross partitions queries

  • Synthetic Partition Key

  • A field inserted into the item to be the partition key

  • The synthetic key value is usually composed of combinations of values of other fields E.g.:
{
  "year": "2020",
  "class": "A",
  "key": "2020-A"
}
  • You can use as a synthetic key
    • A combination of values
    • Append to a value a random suffix
    • Append to a value a hash prefix

Data Explorer

  • View, add, remove items
  • The id key is automatically assigned with an UUID
{
  "id": "3cc408f2-c3bd-4268-a6a6-bd7a75f0ba7e",
  "customerId": "C2",
  "customerName": "John",
  "orders": [
    {
      "orderId": "O1",
      "couse": "AZ-104",
      "price": "100"
    }
  ]
}

SQL API

-- c is the alias for the customer container
SELECT * from c

--
SELECT * from c.orders

-- nested query
SELECT o.orderId, o.course
FROM o IN course.orders
WHERE o.course='AZ-104'

-- join
SELECT li.id AS lineitemid, li.price
FROM Order o
JOIN li IN o.line_items
ORDER BY o.address.city ASC

Dotnet connectivity

  • Uses the NuGet package Microsoft.Azure.Cosmos
  • Get the primary connection string under Keys
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.Azure.Cosmos.Table" Version="1.0.8" />
  </ItemGroup>

</Project>

CRUD operations

class Course
{
  public string id { get; set; }
  public string courseid { get; set; }
  public string coursename { get; set; }
  public decimal rating { get; set; }
  public List<Order> orders { get; set; }
}
using Microsoft.Azure.Cosmos;
using System;

namespace AzureCosmosDB
{
  class Program
  {
    private static readonly string _connection_string = "AccountEndpoint=https://hvitoi.documents.azure.com:443/;AccountKey={account-key};";
    private static readonly string _database_name = "appdb";
    private static readonly string _container_name = "course";
    private static readonly string _partition_key = "/coursename";

    static void Main(string[] args)
    {
      // Get Cosmost account instance
      CosmosClient _cosmos_client = new CosmosClient(_connection_string, new CosmosClientOptions() { AllowBulkExecution=true} ); // second parameter is the client options

      // Create database
      _cosmos_client.CreateDatabaseAsync(_database_name).GetAwaiter().GetResult();
      Console.WriteLine("Database created!");
      Database _database_client = _cosmos_client.GetDatabase(_database_name); // get database instance

      // Create a container
      _database_client.CreateContainerAsync(_container_name, _partition_key).GetAwaiter().GetResult(); // upon creation a partition key must be defined
      Console.WriteLine("Container created!");
      Container _container_client = _cosmos_client.GetContainer(_database_name, _container_name); // get container instance

      // Query items
      string _query = "SELECT * FROM c WHERE c.courseid='Course0002'"; // c is he course container
      QueryDefinition _definition = new QueryDefinition(_query);
      FeedIterator<Course> _iterator = _container_client.GetItemQueryIterator<Course>(_definition); // Iterates through items of course class
      Console.WriteLine("Listing courses:");
      while(_iterator.HasMoreResults)
      {
        FeedResponse<Course> _response = _iterator.ReadNextAsync().GetAwaiter().GetResult();
        foreach(Course _course in _response)
        {
          Console.WriteLine($"Id is {_course.id}");
          Console.WriteLine($"Course id is {_course.courseid}");
          Console.WriteLine($"Course name is {_course.coursename}");
          Console.WriteLine($"Course rating is {_course.rating}");
        }
      }

      // Insert single item
      Course _course = new Course()
      {
        id = "1",
        courseid = "C00010",
        coursename = "AZ-204",
        rating = 4.5m
      };
      _container_client.CreateItemAsync<Course>(_course, new PartitionKey(_course.courseid)).GetAwaiter().GetResult(); // set the partition key for that item
      Console.WriteLine("Item created!");

      // Insert multiple items
      List<Course> _lst = new List<Course>()
      {
        new Course() { id="1", courseid="Course0001", coursename = "AZ-204 Developing Azure solutions", rating = 4.5m },
        new Course() { id="2", courseid="Course0002", coursename = "AZ-303 Architecting Azure solutions", rating = 4.6m },
        new Course() { id="3", courseid="Course0003", coursename = "DP-203 Azure Data Engineer", rating = 4.7m },
        new Course() { id="4", courseid="Course0004", coursename = "AZ-900 Azure Fundamentals", rating = 4.6m },
        new Course() { id="5", courseid="Course0005", coursename = "AZ-104 Azure Administrator", rating = 4.5m }
      };
      List<Task> _tasks = new List<Task>(); // Task is C# class to represent a function to be executed (similar to a promise in node)
      foreach (Course _course in _lst)
      {
        _tasks.Add(_container_client.CreateItemAsync<Course>(_course, new PartitionKey(_course.courseid)));
      }
      Task.WhenAll(_tasks).GetAwaiter().GetResult();
      Console.WriteLine("Items created!");

      // Update item
      string _id = "2";
      string _partition_key = "Course0002";
      ItemResponse<Course> _response = _container_client.ReadItemAsync<Course>(_id, new PartitionKey(_partition_key)).GetAwaiter().GetResult();
      Course _course = _response.Resource; // modify the item in memory
      _course.rating = 4.8m;
      _container_client.ReplaceItemAsync<Course>(_course, _id, new PartitionKey(_partition_key)).GetAwaiter().GetResult(); // update!
      Console.WriteLine("Item updated!");

      // Delete item
      string _id = "2";
      string _partition_key = "Course0002";
      _container_client.DeleteItemAsync<Course>(_id, new PartitionKey(_partition_key)).GetAwaiter().GetResult();
      Console.WriteLine("Item deleted!");

      // Pause execution
      Console.ReadKey();
    }
  }
}

Import data from json

class Data
{
  public string id { get; set; }
  public string Correlationid { get; set; }
  public string Operationname { get; set; }
  public string Status { get; set; }
  public string Eventcategory { get; set; }
  public string Level { get; set; }
}
FileStream fs = new FileStream(System.Environment.CurrentDirectory +  @"/data/QueryResult.json", FileMode.Open, FileAccess.Read);
StreamReader reader = new StreamReader(fs);
JsonTextReader jsonReader = new JsonTextReader(reader);

CosmosClient cosmosClient = new CosmosClient(connectionString);
Container container = cosmosClient.GetContainer(databaseName, containerName);

while (jsonReader.Read())
{
  if (jsonReader.TokenType==JsonToken.StartObject)
  {
    JObject _object = JObject.Load(jsonReader);
    Activity activity = _object.ToObject<Activity>();
    activity.id = Guid.NewGuid().ToString();
    container.CreateItemAsync<Activity>(activity, new PartitionKey(activity.Operationname)).GetAwaiter().GetResult();
    Console.WriteLine($"Adding item {activity.Correlationid}");
  }
}

Console.WriteLine("Written data to Azure Cosmos DB");
Console.ReadKey();

Embedding (nesting) data

  • When there are contained relationships between entities
  • When there is one-to-few relationships
  • When the embedded data changes infrequently
  • When the embedded data is queried frequently together
class Order
{
  public string orderid { get; set; }
  public int price { get; set; }
}
class Course
{
  public string id { get; set; }
  public string courseid { get; set; }
  public string coursename { get; set; }
  public decimal rating { get; set; }
  public List<Order> orders { get; set; }
}
Course _course = new Course()
{
  id = "1",
  courseid = "C00010",
  coursename = "AZ-204",
  rating = 4.5m
  orders = new List<Order>() {
    new Order() { orderid = "O2", price = 50},
    new Order() { orderid = "O3", price = 80}
  } // embedding data
};

Referencing data

  • When representing one-to-many relationships
  • When representing many-to-many relationships
  • When there are related data changes

Change feed

  • Listens to changes in a container
  • Outputs list of items in the order they modified to a destination
  • Ideal to implement workflows

Stored Procedure

  • A set of instructions to be stored at the container that can be invoked from the client
  • The stored procedure is configured at the cosmos db data explorer
  • The stored procedure must be written in javascript
// Stored Procedure id: sample
function sample() {
  const context = getContext();
  const response = context.getResponse();
  response.setBody('Executing a stored procedure');
}
function sample2() {
  const context = getContext();
  const request = context.getRequest();

  const body = request.getBody();
  if !("customertier" in body) {
    body["customertier"] = 0;
  }

  request.setBody(body);
}
// Cosmos DB instance
CosmosClient cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions() { AllowBulkExecution=true})

// Container instance
Container containerClient = cosmosClient.GetContainer(databaseName, containerName)

// Execute stored procedure
string output = containerClient.Scripts.ExecuteStoredProcedureAsync<string>('sample', new PartitionKey(String.empty), null).GetAwaiter().GetResult();

// Write the result of the executed stored procedure
Console.WriteLine(output);

Triggers

  • Set a trigger to execute a set of instructions before an action happens or after it has been completed
  • A trigger is defined at the Items level in the Data Explorer
function AddTimestamp() {
  var context = getContext();
  var request = context.getRequest();
  var itemToCreate = request.getBody();

  if (!('timestamp' in itemToCreate)) {
    var ts = new Date();
    itemToCreate['timestamp'] = ts.getTime();
  }

  request.setBody(itemToCreate);
}
// The trigger must be mentioned in the RequestOptions
containerClient.CreateItemAsync<Course>(_course, null, new ItemRequestOptions { PreTriggers = new List<string> {"AddTimestampTrigger"}}).GetAwaiter().GetResult();

Time to Live

  • After time to live is reached, the item will be deleted automatically
  • If there are no left-over request units (RU), the deletion of the item will be delayed
  • TTL can be set up at container or item level
  • It must be enabled on Data Explorer/Container/Settings -> Time To Live
  • on (no default): to pick the ttl from the item itself
  • on: to pick the ttl from a default value to be
{
  "id": "1",
  "name": "Maria",
  "ttl": 20
}

Indexing

  • Indexing Policy: Specify the fields to be indexed
  • Composite index: Necessary to perform "order by" queries with two or more properties
{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*"
    }
  ],
  "excludedPaths": [
    {
      "path": "/\"_etag\"/?"
    }
  ],
  "compositeIndexes": [
    {
      "path": "/city",
      "order": "ascending"
    },
    {
      "path": "/customername",
      "order": "ascending"
    }
  ]
}