Skip to content

Broken Postgres clients are released back into the pool, but should be removed #5112

Closed
@clarkdave

Description

Issue type:

[x] bug report

Database system/driver:

[x] postgres

It may impact other drivers if they have similar semantics/expectations as pg.Pool.

TypeORM version:

[x] latest

Explanation of the problem

Currently in TypeORM, if a client suffers an unrecoverable error - for example, if the underlying connection goes away, such as during a DB failover - there is no protection in place to stop that client being added back into the pg.Pool. This broken client will then be handed out in future even though it'll never be able to execute a successful query.

Although pg.Pool itself does listen for error events on clients within the pool - and actively removes any which do emit errors - it doesn't catch everything. It is considered the responsibility of the user of the pool to release known broken clients by calling client.release(true). The truthy argument tells the pool to destroy the connection instead of adding it back to the pool

https://node-postgres.com/api/pool#releaseCallback

The releaseCallback releases an acquired client back to the pool. If you pass a truthy value in the err position to the callback, instead of releasing the client to the pool, the pool will be instructed to disconnect and destroy this client, leaving a space within itself for a new client.

If a client's connection does break, it's very difficult to debug, as a handful of queries will begin failing with an error like Error: Client has encountered a connection error and is not queryable while others will continue executing fine.

There is further discussion about the impact of this in the node-postgres repo: brianc/node-postgres#1942

Steps to reproduce or a small repository showing the problem

  • Set pool size to 1
  • Run a small loop which repeatedly performs a query, e.g.
while (true) {
  try {
    await getManager().query('select pg_sleep(4)')
    console.log('success')
  } catch (err) {
    console.log('error', err)
  }

  console.log('done')
  await sleep(1000)
}
  • Wait for the query to run at least once. This should allocate a connection from the pool. Subsequent invocations of the query should use the same connection.
  • Kill the connection while query is running. On a Mac or Linux, this is easily done by finding and killing the process, e.g. ps aux | grep postgres | grep SELECT
  • The loop continue to throw errors, as the broken connection has been returned to the pool and continues to be handed out. This shouldn't happen: we should have told pg.Pool the connection is broken so it can be removed from the pool, so the next query should get a new, unbroken, connection

I believe the reason this hasn't been noticed before (at least not that I could see) is because it's really only likely to happen if the actual database connection breaks. The majority of QueryFailedErrors are caused by dodgy SQL etc, none of which will render a client unusable. And, usually, if your database is killing connections, you've got other problems to think about 😅

We only noticed it because we run PgBouncer in between TypeORM and our Postgres server. When we redeployed PgBouncer, it would kill some of the active client connections in the pool, but because pg.Pool never found out about it, those connections remained in the pool indefinitely, causing a steady stream of errors even though everything else was fine.

Fix

I have a working fix here: loyaltylion@6bd52e0

If this fix looks suitable, I'd be happy to create a PR to get this merged. It only applies to postgres, but could be extended to other drivers if we think they'd benefit.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions