The importance of Primary Keys in MySQL
Note: I have used MySQL and it’s storage engine InnoDB interchangeably.
A couple of days ago, someone at work asked me whether inserts are faster in a table when there’s a primary key
present in the table. My immediate answer was that they were faster. I had some idea about this from experience and previous reading but didn’t remember the reason why it was so.
For a business user unaware about InnoDB internals or InnoDB’s existence, it might seem perfectly intuitive to assume that inserts will be faster without a primary key
because a primary key
is an additional constraint on the table, i.e., MySQL has to check for uniqueness, nullability. When it comes to InnoDB, that is not the case. Also, a table without a primary key
will occupy more space than a table with a primary key
. See how.
Background
MySQL has various mutexes
for preventing multiple threads accessing the same resource at the same time. There is a mutex called dict_sys
which locks the data dictionary, i.e., the information_schema
. Ovais Tariq has written a very insightful blogpost on this.
When a primary key
is not defined on a table, MySQL generates row-ids
for every row that is inserted in that table. This, because InnoDB is an index-organized storage engine, it mandates a primary key
to store the contents of the table. If you don’t define a primary, MySQL will do this itself by generating a 6 byte auto increment primary key
which is hidden. It is referred by the name DB_ROW_ID
in the documentation. This only has to exist if a table doesn’t have a primary key
.
Now, when this row-id
is generated on every new insert, MySQL acquires dict_sys
mutex — and it becomes a system-wide contention point because what it is acquiring a mutex on is the data dictionary — which might be in use by a couple of other DDL and DML operations. This would mean that insert performance on a table might be affected by another operation taking place on another table. This contention would otherwise not exist if a primary key
is defined on the table.
To check how a similar insert workload behaves on a table with a defined primary key
and another table without a defined primary key
, I performed a simple mysqlslap
insert performance test on my local
. I have attached the scripts to perform these tests. Note: Doing this with rand()
is probably not ideal.
mysqlslap
— is a sysbench
like tool for MySQL that can be used for performance and load testing of individual tables — Read more about it here.
Result
The insert performance in this case was surprisingly not very different in both the cases — checked for 10 million records. This was with the concurrency of 4 threads
- With
primary key
, 1284.848 seconds - Without
primary key
, 1282.350 seconds
In an isolated environment and with only 4 thread, there were not many dict_sys
mutexes on information_schema
, hence, there was no significant difference between the insert performance. Although, with the same table configuration, these two tables occupy significantly different amount of space on disk — which comes from the fact that MySQL had to create row ids
for every record that was being inserted in the table without a defined primary key
.
- With
primary key
, 900 MB - Without
primary key
, 1.1 GB
I conducted five more tests with increasing concurrency and a consistent load of 100K records. See the difference in insert performance here.
With increasing threads, time taken to insert into tables without primary key
increases quite a lot because of the contention point added because of dict_sys
mutex.
Always use a primary key
!
P.S. — From Jeremy Cole’s blog about this
Given how much other code within InnoDB is protected by dict_sys->mutex I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable.
References
- Ovais Tariq’s Original Blogpost about Scalability Issues
- Chapter 4 from High Performance MySQL — strongly recommended book for MySQL power users
- Tuning Primary Keys in InnoDB
- Wasting InnoDB Memory — not immediately related to this, but interesting nevertheless
- MySQL Server Team’s Blog about InnoDB’s Intrinsic Tables
- Community Discussion about using mysqlslap
- Importance of Primary Keys in MySQL Performance
- UUID vs incremental ID insert performance in MySQL — not immediately relevant
- Jeremy Cole’s blog on How InnoDB behaves without a Primary Key — must read
- Lack of Primary Key May Effectively Stop Slave — this needs a deeper dive
- Another blogpost about Replication Issues when Primary Keys are not present
- InnoDB Row Format Specification