![]() ![]() However, for the big database we’d need a unique solution. We removed the existing child keys, and added the new key definition to the parent, which propagates to the children. For the smaller databases we modified the child and parent Primary Keys and could tolerate the lock duration for new writes.įor the large database, we could in fact tolerate the lock duration, so we went for the straightforward solution. This same table would need to be modified on about 10 production databases, ranging in size from small to large. How could we solve this problem? Solution and Roll Out We didn’t want to take planned downtime, we did want to modify the Primary Key definition, and we knew the modification would take a long time to run. ![]() No problem, we’d just modify the Primary Keys for all tables, right? Unfortunately, this type of modification locks the table for writes, meaning we’d have an extended period of data loss if we missed writes, meaning this was not an acceptable solution. We wanted a composite Primary Key covering the id and created_at columns on the parent. ![]() On a partition parent table, the Primary Key must include the partition key column, which created an inconsistency with the child we’d need to solve. In PostgreSQL table partitioning, the parent and child Primary Keys must match. While the solution was clear, applying the change had immediate problems. This inefficiency was spiking the costs on the data warehouse side, where we pay on a per query basis, making this a problem that needed to be fixed quickly!Īfter the team met, we decided the best course of action was to modify the Primary Key definition so that it existed on the parent. The data warehouse was attempting to identify the new and changed rows, but it had become inefficient to do so without a Primary Key on the parent table. What happened with the data pipeline? The ReckoningĪn engineer noticed an excessive amount of queries in the data warehouse since we’d partitioned the table. The need for this was outside the application, but for an important consumer, our data pipeline process that detects row modifications and copies them to our data warehouse. We found out later this was short sighted, and in fact we did have a need for a parent table primary key. We didn’t have a need for a Primary Key on the parent table. When we decided on the Primary Key and Unique Indexes structure for the table, the needs of the application and conventions of pgslice drove the decision. ![]() PostgreSQL allows the child table to have a Primary Key constraint with none defined on the parent, but not the other way around.įurther, PostgreSQL prevents adding a Primary Key on the parent that conflicts with the child. This met the needs of the application.Įach child partition Primary Key was on the id column only. In that design, we’d set up a PRIMARY KEY constraint on each child partition, but none on the parent. In the earlier post, we discussed why and how we conducted on online table partition conversion. Partition key column is created_at timestamp.Primary Key constraint on child partitions on id column.75+ partitions, table size is 500 GB and contains around 1 billion rows.New writes at a rate of around 10-15 rows/second.Table writes behavior is “Append Mostly”.Declarative partitioning with RANGE type.The context there will help explain the circumstances we were operating from. If you haven’t already read Part 1 of this series, please first read PostgreSQL Table Partitioning - Growing the Practice - Part 1 of 2 which describes why and how this table was converted to a partitioned table. This is a disruptive operation, so we had to use some tricks to pull this off. In Part 2 of this 2 part PostgreSQL □ Table Partitioning series, we’ll focus on how we modified the Primary Key online for a large partitioned table. By Andrew Atkinson, Sriram Rathinavelu, and Alesandro Norton. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |