Partitioning tables
You can distribute portions of individual tables across a filesystem using partitions. Different rows of a table may be assigned to different physical partitions, which is called horizontal partitioning. MySQL does not have support for vertical partitioning, in which different columns of a table are assigned to different physical partitions.
There are many ways to partition a table:
RANGE: This type of partitioning assigns rows to partitions based on column values falling within a given range.LIST: Similar to partitioning byRANGE, except that the partition is selected based on columns matching one of a set of discrete values.HASH: With this type of partitioning, a partition is selected based on the value returned by a user-defined expression that operates on column values in rows to be inserted into the table. The function may consist of any expression valid in MySQL that yields a non-negative integer value.KEY: This type of partitioning is similar to partitioning byHASH, except that only one or more columns to be evaluated are supplied, and the MySQL server provides its own hashing function. These columns can contain other than integer values, since the hashing function supplied by MySQL guarantees an integer result regardless of the column data type.
A very common use of database partitioning is to segregate data by date.
Note that partitions work only for InnoDB tables, and foreign keys are not yet supported in conjunction with partitioning.
RANGE partitioning
If you want to partition the employees table based on emp_no and you want to keep 100,000 employees in one partition, you can create it like this:
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`address` varchar(100) DEFAULT NULL,
PRIMARY KEY (`emp_no`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
PARTITION BY RANGE (emp_no)
(PARTITION p0 VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (300000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (400000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (500000) ENGINE = InnoDB);
If the employee number is above 500000, since there is no partition defined for them, the insert will fail with error. To avoid this, you have to regularly check and add partitions or create a MAXVALUE partition to catch all such exceptions:
DROP TABLE employees;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`address` varchar(100) DEFAULT NULL,
PRIMARY KEY (`emp_no`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
PARTITION BY RANGE (emp_no)
(PARTITION p0 VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (300000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (400000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (500000) ENGINE = InnoDB,
PARTITION pmax VALUES LESS THAN MAXVALUE ENGINE = InnoDB
);
Partitioning in MySQL is widely used on date, datetime, or timestamp columns. If you want to store some events on the database, and all the queries are based out of a time range, you can use partitioning like this.
DROP TABLE employees;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`address` varchar(100) DEFAULT NULL,
PRIMARY KEY (`emp_no`,`hire_date`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
PARTITION BY RANGE (YEAR(hire_date))
(PARTITION p1980 VALUES LESS THAN (1980) ENGINE = InnoDB,
PARTITION p1990 VALUES LESS THAN (1990) ENGINE = InnoDB,
PARTITION p2000 VALUES LESS THAN (2000) ENGINE = InnoDB,
PARTITION p2010 VALUES LESS THAN (2010) ENGINE = InnoDB,
PARTITION p2020 VALUES LESS THAN (2020) ENGINE = InnoDB,
PARTITION pmax VALUES LESS THAN MAXVALUE ENGINE = InnoDB
);
If you want to convert an existing table into a partitioned table, and if the partition key is not part of PRIMARY KEY, you need to drop PRIMARY KEY and add the partition key as part of PRIMARY KEY and all unique keys. You can do that as follows:
ALTER TABLE employees DROP PRIMARY KEY, ADD PRIMARY KEY(emp_no,hire_date);
RANGE COLUMNS partitioning
RANGE COLUMNS partitioning is similar to RANGE partitioning, but enables you to define partitions using ranges based on multiple column values. In addition, you can define the ranges using columns of types other than integer types. RANGE COLUMNS partitioning differs significantly from RANGE partitioning in the following ways:
RANGE COLUMNSdoes not accept expressions, only names of columnsRANGE COLUMNSaccepts a list of one or more columnsRANGE COLUMNSpartitioning columns are not restricted to integer columns; string,DATEandDATETIMEcolumns can also be used as partitioning columns
Instead of using the to_days() or year() function, you can directly use the column hire_date in RANGE COLUMNS:
ALTER TABLE employees
PARTITION BY RANGE COLUMNS (hire_date)
(PARTITION p0 VALUES LESS THAN ('1970-01-01'),
PARTITION p1 VALUES LESS THAN ('1980-01-01'),
PARTITION p2 VALUES LESS THAN ('1990-01-01'),
PARTITION p3 VALUES LESS THAN ('2000-01-01'),
PARTITION p4 VALUES LESS THAN ('2010-01-01'),
PARTITION p5 VALUES LESS THAN (MAXVALUE)
);
Or you can divide employees based on their last_name. This will not guarantee uniform distribution across the partitions:
ALTER TABLE employees
PARTITION BY RANGE COLUMNS (last_name)
(PARTITION p0 VALUES LESS THAN ('b'),
PARTITION p1 VALUES LESS THAN ('f'),
PARTITION p2 VALUES LESS THAN ('l'),
PARTITION p3 VALUES LESS THAN ('q'),
PARTITION p4 VALUES LESS THAN ('u'),
PARTITION p5 VALUES LESS THAN ('z')
);
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
Using RANGE COLUMNS, you can put multiple columns in the partition function:
CREATE TABLE range_columns_example (
a INT,
b INT,
c INT,
d INT,
e INT,
PRIMARY KEY(a, b, c)
)
PARTITION BY RANGE COLUMNS(a,b,c) (
PARTITION p0 VALUES LESS THAN (0,25,50),
PARTITION p1 VALUES LESS THAN (10,50,100),
PARTITION p2 VALUES LESS THAN (10,100,200),
PARTITION p3 VALUES LESS THAN (MAXVALUE,MAXVALUE,MAXVALUE)
);
If you insert values a=10, b=20, c=100, d=100, e=100, it goes to p1.
LIST and LIST COLUMNS partitioning
LIST partitioning is similar to RANGE partitioning, where each partition is defined and selected based on the membership of a column value in one of a set of value lists, rather than in one of a set of contiguous ranges of values.
If you wish to use data types other than integers, you can use LIST COLUMNS.
Unlike the case with RANGE partitioning, there is no catch-all such as MAXVALUE; all expected values for the partitioning expression should be covered in the PARTITION expression.
If you want to divide the customers with certain zip codes in a partition, you can use LIST partitions:
CREATE TABLE customer (
customer_id INT,
zipcode INT,
city varchar(100),
PRIMARY KEY (customer_id, zipcode)
)
PARTITION BY LIST(zipcode) (
PARTITION pnorth VALUES IN (560030, 560007, 560051, 560084),
PARTITION peast VALUES IN (560040, 560008, 560061, 560085),
PARTITION pwest VALUES IN (560050, 560009, 560062, 560086),
PARTITION pcentral VALUES IN (560060, 560010, 560063, 560087)
);
If you wish to use the columns directly, rather than integers, you can use LIST COLUMNS:
DROP TABLE customer;
CREATE TABLE customer (
customer_id INT,
zipcode INT,
city varchar(100),
PRIMARY KEY (customer_id, city)
)
PARTITION BY LIST COLUMNS(city) (
PARTITION pnorth VALUES IN ('city1','city2','city3'),
PARTITION peast VALUES IN ('city4','city5','city6'),
PARTITION pwest VALUES IN ('city7','city8','city9'),
PARTITION pcentral VALUES IN ('city10','city11','city12')
);
HASH and LINEAR HASH partitioning
Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions.
If you want to evenly distribute employees, instead of RANGE partitioning over YEAR(hire_date), you can use HASH of YEAR(hire_date) and specify the number of partitions. When PARTITION BY HASH is used, the storage engine determines which partition to use based on the modulus of the result of the expression.
DROP TABLE employees;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`address` varchar(100) DEFAULT NULL,
PRIMARY KEY (`emp_no`,`hire_date`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
PARTITION BY HASH(YEAR(hire_date))
PARTITIONS 8;
The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value.
In LINEAR HASH partitioning, you can use the same syntax, except for adding a LINEAR keyword. Instead of a MODULUS operation, MySQL uses a powers-of-two algorithm for determining the partition. Refer to https://dev.mysql.com/doc/refman/8.0/en/partitioning-linear-hash.html for more details:
DROP TABLE employees;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`address` varchar(100) DEFAULT NULL,
PRIMARY KEY (`emp_no`,`hire_date`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
PARTITION BY LINEAR HASH(YEAR(hire_date))
PARTITIONS 8;
KEY and LINEAR KEY partitioning
Partitioning by key is similar to partitioning by hash, except that, where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. This internal hashing function is based on the same algorithm as the PASSWORD() function.
DROP TABLE employees;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`address` varchar(100) DEFAULT NULL,
PRIMARY KEY (`emp_no`,`hire_date`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
PARTITION BY KEY()
PARTITIONS 8;
Subpartitioning
You can further divide each partition into a partitioned table. This is called subpartitioning or composite partitioning:
DROP TABLE employees;
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`address` varchar(100) DEFAULT NULL,
PRIMARY KEY (`emp_no`,`hire_date`),
KEY `name` (`first_name`,`last_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
PARTITION BY RANGE( YEAR(hire_date) )
SUBPARTITION BY HASH(emp_no)
SUBPARTITIONS 4 (
PARTITION p0 VALUES LESS THAN (1990),
PARTITION p1 VALUES LESS THAN (2000),
PARTITION p2 VALUES LESS THAN (2010),
PARTITION p3 VALUES LESS THAN (2020),
PARTITION p4 VALUES LESS THAN MAXVALUE
);