A Brief Introduction to DynamoDB Single-Table Design.
Relational Database Management Systems (RDMS) have been the most prevalent method for storing data since its inception. Relational databases are powerful tools for managing data and are still the most widely used, and some of the most efficient data management systems in existence.
These database systems were conceptualised in a time where data storage was a costly and cumbersome process, and were subsequently designed to save as much as possible on storage space. When designing for a RDMS, certain standards and common practices are implemented, such as avoiding data redundancy and normalization.
However, in the 21st century, the cost of storing data per Gigabyte has decreased dramatically, and is set to continue decreasing in the future, whereas the need for faster and faster processing times have conversely increased.
So the question now arises, is there an alternative design for storing data, in which lightning-fast responses, rather than saving storage space, is the main priority?
Single-Table design aims to answer this question.
But before we get ahead of ourselves…
What is Single-Table Design?
Single-table design refers to a database design methodology that, in contrast to most other designs, stores all data in one physical table, instead of entity related tables. Storing data in this way may seem ludicrous when relational database thinking is applied, but considering DynamoDB’s design philosophy, the advantages become more apparent.
How would Single-Table Databases actually work?
Before any actual designing of the single-table can take place, we have to carefully choose our keywords that relate to storing data.
- PK:
Refers to the Partition Key in DynamoDB. This is the unique key defining the object, and is used in conjunction with the SK to form a Primary Key.
- SK:
Refers to the Sort Key in DynamoDB. This key needs to always be designed carefully, as it represents the most common access path for the record. When a Query is performed, the value of the SK is used to sort the data.
- GSI1PK/SK:
Refers to the Global Secondary Indexes within DynamoDB. As we will be storing multiple types of entities, we need to define a generic name to the GSI partition keys and sort keys. This will avoid having an excessive amount of GSI’s whilst still maintaining the advantages provided by them.
Additionally, some best practices need to be defined.
Firstly the DynamoDB “SCAN” operation should be avoided at all costs. This operation will fetch ALL data in the entire table and then filter through each record. Some resources go as far as to suggest that administrators should revoke all access to the DynamoDB Scan operation in systems that utilise single-table design to force developers to think in terms of DynamoDB queries when fetching data.
The next issue we have to consider is to store the entities in such a way that they can be easily distinguished. One method for storing data in this way, is to “tag” the data with meaningful predefined values.
For example:
If I would like to store the details of a book in the database I might choose a design like so:
PK | SK | title |
BOOK#1241-123-1245 | BOOK#1241-123-1245 | The Fellowship Of The Ring |
The prefix “BOOK#” is used to identify this entry as a “book” entity. This prefix is also repeated in the SK for read convenience.
Usually when a different entity, such as book authors, needs to be stored, we would store it in a different table. With this type of design only a different prefix is used. To separate the author entries from the book entries, we can use a new prefix “AUTHOR#”.
PK | SK | title | name |
BOOK#1241-123-1245 | BOOK#1241-123-1245 | The Fellowship Of The Ring | |
AUTHOR#3125-1235 | AUTHOR#3125-1235 | J.R.R. Tolkien |
Now that both the book and author have been created in the database, we can now consider how they can be linked together.
PK | SK | title | name | publishDate |
BOOK#1241-123-1245 | BOOK#1241-123-1245 | The Fellowship Of The Ring | ||
BOOK#1241-123-1245 | AUTHOR#3125-1235 | 1954-07-29 | ||
BOOK#9231-321-1425 | BOOK#9231-321-1425 | The Two Towers | ||
BOOK#9231-321-1425 | AUTHOR#3125-1235 | 1954-11-11 | ||
AUTHOR#3125-1235 | AUTHOR#3125-1235 | J.R.R. Tolkien |
As can be seen in the table above books and authors are linked using a separate entry into the database, that contains a “BOOK#” tag in the partition key, and a “AUTHOR#” tag in the sort key. This entry contains only the necessary information such as the book id, author id, and when the book was published.
What are some challenges associated with this design?
Single-Table design poses some unique challenges that need to be addressed throughout the design phase.
Readability should always be considered.
“A well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet.”
https://www.trek10.com/blog/dynamodb-single-table-relational-modeling/
Due to the way the data is structured within a single-table DynamoDB database, readability should always be a key focus. As the quote above emphasizes, the table will become very unreadable as the number of entities, and the complexity of access paths increase. Therefore as a DynamoDB designer, one should always consider the readability of the records within the table. Longer descriptive prefixes are generally more readable than shortened abbreviations, for example.
P#11231-123151
T#41231-123123
Is much less descriptive than:
PROPERTY#11231-123151
TENANT#41231-123123
Define Access Paths.
Another important aspect of Single-table design is that Access paths should be planned ahead of time. When careful access path planning is utilised, future redesigns of the table can be avoided. Redesigns are time-consuming and increase chances of logical fallacies and errors across different entities. Below is an example of how an access path can be planned for a single-table database using tagged entities.
As can be seen, when given a bookId, with the current design, it is possible to fetch a book’s details, as well as the authors that are associated with the specific book. Similar access paths need to be planned for other entities and business requirements before any actual design takes place.
Using GSI’s
Consider the type of operations that will be performed on the database. One possible scenario is to get all books sorted by name. In a conventional DynamoDB database structure, good practice is to create a Global Secondary Index on the book name. This will enable queries to be performed on the book names such that they can be returned alphabetically.
In single-table design, the approach is slightly different. Rather than creating an index like “NameIndex” for alphabetical names, we need a more generic index which can be reused for multiple entities. The name entirely depends on preference, but a commonly used naming scheme is to name them as follows:
GSI1PK: For the Global Secondary Index Partition Key.
GSI1SK: For the Global Secondary Index Sort Key.
The sequence can continue as required to allow for more GSI’s in the table, i.e. GSI2PK, GSI2SK…
Here is an example which expands on the earlier table by using GSI’s:
PK | SK | title | name | Publish
Date |
GSI1PK | GSI1SK |
BOOK#1241-123-1245 | BOOK#1241-123-1245 | The Fellowship Of The Ring | BOOK | BOOK#THE_FELLOWSHIP_OF_THE_RING | ||
BOOK#1241-123-1245 | AUTHOR#3125-1235 | 1954-07-29 | ||||
BOOK#9231-321-1425 | BOOK#9231-321-1425 | The Two Towers | BOOK | BOOK#THE_TWO_TOWERS | ||
BOOK#9231-321-1425 | AUTHOR#3125-1235 | 1954-11-11 | ||||
AUTHOR#3125-1235 | AUTHOR#3125-1235 | J.R.R. Tolkien | AUTHOR | AUTHOR#JRR_TOLKIEN |
Now that the GSI’s have been added, it is possible to fetch both the books and the authors alphabetically sorted by name. These GSI’s can be utilised to allow for all the needed access paths in most systems.
Conclusion
Single-table design as a concept is initially very off-putting to the uninitiated, and it may take quite some time to get used to this alternate way of thinking. This design, however, provides the benefit of dramatically increasing read speeds for data access paths by utilising the full potential of DynamoDB’s partitioning system.
Single-table design questions on a fundamental level how databases are designed. Whether this methodology will be disruptive, or just another niche design remains to be seen as general understanding and limitations are explored further by the community.
References
Below are some resources and references that provide more in depth information.
Alex DeBrie: Single-Table Design with DynamoDB.
Alex DeBrie: The What, Why, and When of Single-Table Design with DynamoDB
https://www.alexdebrie.com/posts/dynamodb-single-table/
Forrest Brazeal: From Relational DB to Single-Table Design
https://www.trek10.com/blog/dynamodb-single-table-relational-modeling/
Rick Houlihan: Fundamentals of Amazon DynamoDB Single-Table Design.