June 7th 2020
I run Dev Spotlight – we create tech content for tech companies like Heroku, Rollbar, and more.
Introduction: The Postgres Connection Pool Problem
At the time, the decision to use processes was fine. But modern apps (microservices, for example) tend to require a lot of connections, and use and release those connections very quickly. So while the “no threads” decision was tenable ten years ago, today the inability for Postgres to scale is a serious issue. Luckily, this is a well known issue that developers have been facing and solving for years.
So what is the answer? Connection pools.
In this article, we’re going to look at connection pooling and options for connection pools on Postgres. Then, we’ll implement a PgBouncer connection pool on an app.
What Are Connection Pools?
Connection pools are a cache of open database connections that can be reused by clients. Using a pool mitigates strain on the database by reducing requests for new connections. Pools also increase performance of individual database calls, since no time is spent requesting and opening the connection).
From a high level, a connection pool works like this:
- The user requests a database connection.
- The pool checks for an existing and available cached connection.
- If a cached connection is found, it’s returned to the user.
- If a cached connection is not found, a new connection is created — as long as the pool adheres to the connection pool settings, such as the max pool size, max connections allowed to the database, max connections allowed per user, and so on.
- If creating a new connection violates any of those settings, the request is queued until a connection is available or a new one can be created.
Where to Create a Connection Pool
Now that we understand what a connection pool is, let’s look at one of the first decisions you’ll face when you implement a connection pool: Where does it go on your stack? You have several options: You can create your pool at the language level, on your client, as middleware, or as some hybrid of those choices. As with many technical decisions, the right choice often depends on your unique situation.
Here are some pros and cons to help you decide:
- Your pool runs locally wherever your code needs it using libraries created specifically for your language. Most languages include native — or add-on — connection pooling libraries (for example, JDBC with Java or Psycopg for Python).
- Low latency since the pool is on the same box as the requester
- Better security since the connections are constrained to one client
- No need to learn a new tool
- Difficult to monitor and control connections to the database since you can end up with multiple pools from multiple clients
- Optimized for your language, not necessarily for Postgres
- Your pool is separate from your code, but runs on the same machine as your client app.
- Low latency and better security, similar to language level
- Optimized for Postgres, not for your language
- Again, similar to language level, it can be difficult to monitor and control connections
- Your pool runs between the client and database, either on a stand-alone server, or on the same machine as your database.
- Flexible — database can be swapped out
- Optimized for Postgres, not for your language
- Centralized control of connections, which make it easier to monitor and control connections
- You’ve introduced a new layer, so also new latency
- Single point of failure for database calls across all clients
- Potential security issues since you’re sharing connections between layers
- Yet another layer to maintain
The ideal location of your pool will depend on your unique situation, technical needs, and personal strengths. However, in most cases, middleware is likely your best choice. For a modern app with many services, middleware gives you more control into, and visibility over, your connections.
Next, let’s dive into that most common use case: middleware connection pools on Postgres.
Postgres Middleware Connection Pool Options
- Basic connection pooling
- Also offers load balancing, replication, and other advanced features
- Basic connection pooling. That’s it.
- Lightweight and better performance
This is usually a pretty easy decision. Both options are established solutions, work well, and have a solid user base. The deciding factor is if you need just connection pooling, or more. Pgpool-II is a little slower and heavier, but has the advanced features such as load balancing.
On the other hand, if you just need a connection pool and nothing else, PgBouncer is your lightweight and fast solution.
Implementing a Middleware Connection Pool with PgBouncer
So let’s run through a sample deployment of PgBouncer on an existing app to see how it works. For this setup, I’m going to use Heroku so we can quickly get something deployed and working with minimal hassle. The process will be simple since Heroku is a PaaS provider and handles most of the DevOps steps for us.
Connection Pooling Mode
There are three different connection pooling modes available in PgBouncer. These modes determine exactly when a connection is returned to the connection pool. The modes are:
- Statement — the connection is returned to the pool as soon as the statement is executed (auto-commit is always on)
- Transaction — the connection is returned to the pool as soon as the transaction is complete (a commit or rollback is executed)
- Session — the connection is returned to the pool as soon as the user session is closed
There are quite a few pool settings for PgBouncer. For example, here are five common settings:
- pool_size — just like it sounds, the size of the pool. The default is 20. For Heroku server-side plans, the default is your plan’s connection limit times 0.5.
- reserve_pool_size — a reserve pool used in times of usage bursts
- max_db_connections — the max number of connections allowed to the database. For Heroku server-side plans, the default is your plan’s connection limit times 0.75.
- max_user_connections — the max number of connections allowed per user
- max_client_conn — the max number of incoming client connections allowed (among all users). For Heroku server-side plans, the default is 1000.
For our example, we’ll use transaction mode and the default pool settings as configured by Heroku on installation.
Deploying Our Example
If you haven’t used Heroku before but still want to follow along, don’t worry — the guide walks you through step-by-step. The only change you’ll need to make is in provisioning the database near the end of the setup.
$ heroku addons:create heroku-postgresql:standard-0
at the end of the URL) and see this page:
Great! If you followed that guide, you’re all set with a Node.js app connected to Postgres with a test table. Now let’s see how easy it is to start using a connection pool with PgBouncer. It’s just a few simple steps using the Heroku CLI:
1. Enable pooling. The below command creates both the pool and a connection pool URL (using the current database URL).
$ heroku pg:connection-pooling:attach DATABASE_URL — as DATABASE_CONNECTION_POOL
2. Change your configuration to use the pool URL instead of the database URL.
3. Commit your changes and restart.
That’s it! As a quick test, I made a nonsensical code change to the client to see what would happen. Here I loop and open 999 select statements to the pool:
URL twice. Let’s look at the stats after each run to see what happened. To see the stats, first use the commend
to find your database connection pool URL. Then issue the command:
$ psql postgres://username:[email protected]–192–168–1–1.compute-1.amazonaws.com:5433/pgbouncer
using your database connection pool URL and replacing the final component of the path with ‘pgbouncer’.
after the two runs. There are lots of queries running at the db level:
while the queries are running. There’s just one “active” connection being reused for all those queries:
Success! We’re up and running a PgBouncer connection pool on Postgres.
There are a several ‘gotchas’ to watch out for with PgBouncer and other connection pools. Here are some items to be aware of and resources to help you dig further.
Client vs. Server
Hopefully you now have an idea of why you need connection pooling with Postgres, what your implementation options are, and how a typical PgBouncer implementation looks.