In PostgreSQL, table partitioning
2024
n PostgreSQL, partitioning means dividing a large table into smaller, more manageable pieces, called partitions, but they all act as part of a single logical table. These partitions are physically separate from each other on disk, which helps optimize performance for certain queries. However, from a user’s perspective, it appears as a single table when querying, so the basic SQL syntax remains mostly the same.
Key Points about PostgreSQL Partitioning:
- Physical Division:
- Each partition is a separate physical table under the hood, storing a subset of the data.
- Partitions share the same logical schema (column definitions) as the parent table.
- Data is routed to the correct partition automatically based on the partition key (e.g., a date, a region, etc.).
- Transparent to Queries:
- When querying a partitioned table, you query the parent table as if it’s a normal table. PostgreSQL automatically figures out which partitions to scan based on the query conditions.
- You generally do not need to modify your SQL syntax to account for partitioning. The database engine handles it for you.
Example of Querying a Partitioned Table:
Suppose we have a table sales
partitioned by sale_date
into yearly partitions.
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
sale_date DATE NOT NULL,
amount DECIMAL(10, 2)
) PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
f you want to query all sales in 2023, you don’t need to worry about the partitions in your SQL syntax. You can write a normal query, and PostgreSQL will handle partition selection (called partition pruning) for you:
SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31';
PostgreSQL will automatically prune (ignore) irrelevant partitions, scanning only the sales_2023
partition in this case.
What Happens Behind the Scenes:
- Partition Pruning: PostgreSQL will only scan the partitions relevant to the query, ignoring those that don’t match the partitioning key conditions. This improves query performance.
- In the example above, PostgreSQL will automatically only read the
sales_2023
partition if thesale_date
condition matches that partition.
- In the example above, PostgreSQL will automatically only read the
- Indexes: Indexes created on the parent table apply to all partitions. Each partition can have its own physical storage, and indexes can improve query performance within a partition.
Special Considerations for Partitioned Tables:
- Insert Queries:
- When inserting data, you insert into the parent table, and PostgreSQL automatically routes the data to the correct partition.
INSERT INTO sales (sale_date, amount) VALUES ('2023-07-15', 100.00);
The row will be stored in the sales_2023
partition based on the sale_date
.
Partition Key Constraints:
- Queries that use conditions based on the partition key (e.g.,
sale_date
in the example above) are optimized because PostgreSQL can prune irrelevant partitions. - Queries that don’t filter on the partition key may scan all partitions, leading to slower performance.
Maintenance:
- You may need to manage partitions manually, such as adding new partitions for upcoming time periods (e.g., for future years in a time-based partitioned table).
- If no matching partition exists for the data being inserted, you might get an error unless you define a default partition to catch such data.
Dropping Partitions:
- If you want to delete old data, you can detach or drop individual partitions without affecting the parent table or other partitions. This can be more efficient than running a
DELETE
query on the entire table.
Example of dropping a partition:
ALTER TABLE sales DETACH PARTITION sales_2023;
DROP TABLE sales_2023;
SQL Syntax and Partitioning:
- Querying: The same as with regular tables. No special SQL syntax is required to account for partitions.
- Inserting: Also the same, insert into the parent table, and PostgreSQL automatically routes the data to the correct partition.
- Indexes: Indexes can be created on the parent table and apply to the partitions.
Example of Partition Pruning:
Let’s say you have partitioned a table by date, with partitions for each year. When you run a query that involves a specific year, PostgreSQL will skip over partitions that don’t match that year, thanks to partition pruning.
Query:
SELECT * FROM sales WHERE sale_date = '2023-06-15';
Behind the scenes:
- PostgreSQL looks at the condition
sale_date = '2023-06-15'
. - It prunes all partitions except the
sales_2023
partition, as that’s the only partition where the condition might match. - It scans only the
sales_2023
partition, improving performance.
Conclusion:
- Partitioning in PostgreSQL physically divides a table into smaller partitions, but these partitions remain logically part of a single table.
- Querying partitioned tables doesn’t require special syntax; PostgreSQL handles partition selection automatically.
- Consider the partition key in your queries to take advantage of partition pruning and improve performance.