January 7, 2019February 9, 2019 Nataraj Srikantaiah

Things to know before starting with DynamoDB

DynamoDB is a fully managed NoSQL database offered by Amazon Web Services. By the way it works great for smaller scale applications, the limitations it poses in the context of larger scale applications are not well understood. This post aims to help developers and operations team understand the strengths and weaknesses of DynamoDB.

1. Data Modeling

DynamoDB supports a document oriented data model. To create a table, we just define the primary / partition key. Items can be added into these tables with a dynamic set of attributes similar to MongoDB. Items in DynamoDB correspond to rows and attributes correspond to columns in RDBMS.

DynamoDB supports the following data types:
Data Types: Number, String, Binary, Boolean
Collection Data Types: Set, List, Map

2. Operations Ease

As it’s a managed service from Amazon, users are abstracted away from the underlying infrastructure and interact only with the database over a remote endpoint. There is no need to worry about operational concerns such as hardware, setup/configuration, throughput capacity planning, replication, software patching, or cluster scaling — making it very easy to get started.

In fact, there is no way to access the underlying infrastructure components such as the instances or disks. DynamoDB tables require users to reserve read capacity units (RCUs) and write capacity units (WCUs) upfront. Users are charged by the hour for the throughput capacity reserved (whether or not these tables are receiving any reads or writes).

3. Linear Scalability

DynamoDB supports auto sharding and load-balancing. This allows applications to transparently store ever-growing amounts of data. The linear scalability of DynamoDB is good for applications that need to handle growing datasets and IOPS requirements. However, this linear scalability comes with extreme costs beyond a certain point.

4. Amazon Ecosystem Integration

DynamoDB is well integrated into the AWS ecosystem. It means that end users do not need to figure out how to perform various integrations by themselves. Below are couple of examples of these integrations:

Data can easily and cost-effectively be backed up to S3
Security and access control is integrated into AWS IAM

5. Cost Effectiveness

DynamoDB’s pricing model can easily make it the single most expensive AWS service for a fast growing data set. Here are some reasons:

Higher provisioning to handle partitions

In DynamoDB, the total provisioned IOPS is evenly divided across all the partitions. Therefore, it is extremely important to choose a partition key that will evenly distribute reads and writes across these partitions.

Cost explodes for fast growing data sets

As data grows, so do the number of partitions in order to automatically scale out the data (each partition is a maximum of 10GB). However, the total provisioned throughput for a table does not increase. Thus, the throughput available for each partition will constantly decrease with data growth.

Indexes will result in additional cost

Applications wanting to query data on attributes that are not a part of the primary key need to create secondary indexes. Local Secondary Indexes do not incur extra cost, but Global Secondary Indexes require additional read and write capacity provisioned leads to additional cost.

Additional cost for caching tier

Applications wanting less latency then we should add cache to increase the performance. The caching tier DAX or Elastic Cache is an additional expense on the top of the database tier.

The ideal workloads for DynamoDB should have the following characteristics:

Low write throughput.
Small and constant dataset size, doesn’t have unknown data growth.
Constant or predictable read throughput, should not be explosion or unpredictable.
Applications that can tolerate eventual consistent reads, the least expensive data access operation in DynamoDB.

Some guidelines if you are going to use DynamoDB.

Use GUID’s or Unique Attributes, instead of incremental IDs.
Don’t try to normalize your tables.
Keeping pre-computed data upon updates is efficient with DynamoDB if you need to query them often.
Don’t try to keep many relationships across tables. This will end up needing to query multiple tables to retrieve required attributes.
Design your tables, attributes, and indexes thinking of the nature of queries.
Think about item sizes and using indexes effectively when listing items to minimize throughput requirements.
Avoid using DynamoDB Scan operation whenever possible.