About PostgreSQL backups

Taking backups is an important administrative task that can have some disastrous consequences if it is not done right. The use of RAID configurations in your storage system, replication between nodes, clustering and trusting 100% that your SAN will be up ARE NOT backup strategies. These measures are necessary for HA (High availability) but do not replace the necessity of taking backups of our databases.

There are two different types of backup that can be use with PostgreSQL to implement a good backup and restore strategy. They are:

  • Physical backups
  • Logical backups

Regardless of the type of backup used to backup your databases, one needs a good backup and restore plan that takes into account intervals, retention policies and performance issues for a backup and the time needed to get a full restoration of a database.

Physical backups

This type of backup takes copies of the files where the PostgreSQL saves the databases. There are several techniques that can be used to take physical backups and we are not going to explain them here. Check Chapter 24. Backup and Restore of the PostgreSQL documentation for more information.

The important thing with physical backups is that some of these techniques together with continuous archiving of write ahead log (WAL) files can be used to implement PITR (Point in time recovery) backups and achieve a full disaster recovery solution.

There are several solutions that can be used for managing PITR backups, such as PITRTools, OmniPITR, and Barman.

Logical backups

PostgreSQL has two utilities, pg_dump and pg_dumpall, for taking logical backups of databases. They take a snapshot of a database at a given moment.

These utilities take consistent backups of a database or the whole cluster even if the databases are being used concurrently. At the same time pg_dump and pg_dumpall do not block other users accessing the database when backups are being taken.

Even though a backup or snapshot created with pg_dump or pg_dumpall can never guarantee a full disaster recovery of all data changed between the moment when the backup was taken and the moment of a future crash, they are still necessary if you need to archive versions of a database, move databases between PgSQL nodes and clone databases between production / pre-production and/or development servers.

Nevertheless, logical backups give us a great flexibility in several situations and are also an easy way of taking backups of databases not requiring PITR backups.

When taking a backup of a database we need the following information to be sure we can make a restoration that includes 100% of the data and definitions from the target database:

  1. Database schema
  2. Database data
  3. Roles owning objects in the database
  4. Roles with privileges on objects in the database
  5. Roles with privileges on the database or schemas
  6. Creation of all the roles owning something or with privileges
  7. Configuration parameters defined explicitly for a role
  8. Configuration parameters defined explicitly for the database

Unfortunately all this information cannot be obtained in a single execution for only one database. 1, 2, 3 and 4 can be obtained with pg_dump. 5, 7 and 8 can be obtained with a full pg_dumpall and 6 either with a pg_dumpall -r or a full pg_dumpall.

At the same time, pg_dumpall will return all this information for all databases in a cluster, not only the database one wants to take a backup of.

This is something that PostgreSQL will have to improve in the future so it gets easier to take a backup/snapshot of a database in a single execution.

In the meantime, PgBackMan takes care of all this and it delivers all the information needed to run a 100% restoration of a database when we define a backup in the system.