[PATCH] Back up and restore PostgreSQL databases with Shepherd

  • Open
  • quality assurance status badge
Details
3 participants
  • Giovanni Biscuolo
  • Ludovic Courtès
  • Marius Bakke
Owner
unassigned
Submitted by
Marius Bakke
Severity
normal
M
M
Marius Bakke wrote on 17 Jun 2022 23:14
(address . guix-patches@gnu.org)
87zgibuh5w.fsf@gnu.org
Hello Guix!

The attached patch adds backup and restore mechanisms to the PostgreSQL
Shepherd service. It looks like this (here with a db named 'mreg'):

$ sudo herd backup postgres mreg
$ sudo -u postgres psql -c 'drop database mreg' # whoops ...
DROP DATABASE
$ sudo herd list-backups postgres mreg
mreg@2022-06-16_21-55-07
mreg@2022-06-16_22-48-59
$ sudo herd restore postgres mreg@2022-06-16_22-48-59
$ sudo -u postgres psql mreg
mreg=#

Pretty cool, no? :-)

The restore command is "smart": if the database already exists, it
restores in a single transaction; otherwise, it will be created from
scratch (these scenarios require mutually exclusive options to
'pg_restore').

With this patch you can 'herd backup' each database, stop postgres,
_delete_ /var/lib/postgresql/data, reconfigure with a newer version, and
'herd restore' them again -- but you'll lose any role passwords (and
roles not declared by postgresql-role-service-type).

Not sure what to about roles, maybe a backup-roles command?

There is no Scheme API yet, but it would be nice to define per-database
settings (i.e. --jobs or --format) in the configuration. And also a
scheduled backup service. These tasks are up for grabs. :-)

The quest here is to provide a smooth upgrade path for end users (and
eventually bump the old 'postgresql-10' service default).

Feedback and/or testing welcome!
-----BEGIN PGP SIGNATURE-----

iIUEARYKAC0WIQRNTknu3zbaMQ2ddzTocYulkRQQdwUCYqzumw8cbWFyaXVzQGdu
dS5vcmcACgkQ6HGLpZEUEHdqGQD/RhKGfvcFZYr4Eo8ico0Y2xt18IUymNgfGq7U
oJJsZJgBAM4/gLnUqpPq/NDoTSjmMAE50Ss6rbefVrtGXFX0DMkM
=qMDU
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 22 Jun 2022 22:46
(name . Marius Bakke)(address . marius@gnu.org)(address . 56045@debbugs.gnu.org)
87v8ss1l5f.fsf@gnu.org
Hello!

Marius Bakke <marius@gnu.org> skribis:

Toggle quote (15 lines)
> The attached patch adds backup and restore mechanisms to the PostgreSQL
> Shepherd service. It looks like this (here with a db named 'mreg'):
>
> $ sudo herd backup postgres mreg
> $ sudo -u postgres psql -c 'drop database mreg' # whoops ...
> DROP DATABASE
> $ sudo herd list-backups postgres mreg
> mreg@2022-06-16_21-55-07
> mreg@2022-06-16_22-48-59
> $ sudo herd restore postgres mreg@2022-06-16_22-48-59
> $ sudo -u postgres psql mreg
> mreg=#
>
> Pretty cool, no? :-)

Indeed! :-)

Toggle quote (7 lines)
> With this patch you can 'herd backup' each database, stop postgres,
> _delete_ /var/lib/postgresql/data, reconfigure with a newer version, and
> 'herd restore' them again -- but you'll lose any role passwords (and
> roles not declared by postgresql-role-service-type).
>
> Not sure what to about roles, maybe a backup-roles command?

No idea, we need input from PG practitioners!

Toggle quote (18 lines)
> From edc8a2e5ae3c89b78fb837d4351f0ddfab8fe474 Mon Sep 17 00:00:00 2001
> From: Marius Bakke <marius@gnu.org>
> Date: Thu, 16 Jun 2022 22:46:01 +0200
> Subject: [PATCH] services: Shepherd can backup and restore PostgreSQL
> databases.
>
> * gnu/services/databases.scm (<postgresql-configuration>)[backup-directory]:
> New field.
> (postgresql-activation): Create it.
> (postgresql-backup-action, postgresql-list-backups-action,
> postgresql-restore-action): New variables.
> (postgresql-shepherd-service)[actions]: Register them.
> * gnu/tests/databases.scm (%postgresql-backup-directory): New variable.
> (run-postgresql-test): Trim unused module imports from existing tests. Add
> "insert test data", "backup database", "list backups", "drop database",
> "restore database", "update test data", "restore again", and "verify restore"
> tests.

Not being a database person, I’ll comment on the code:

Toggle quote (6 lines)
> (match-lambda
> (($ <postgresql-configuration> postgresql port locale config-file
> - log-directory data-directory
> + log-directory data-directory backup-directory
> extension-packages)

Time to use ‘match-record’!

Toggle quote (2 lines)
> +(define (postgresql-backup-action postgresql backup-directory)

Please add a docstring (and on other top-level procedures).

Toggle quote (6 lines)
> + (procedure
> + #~(lambda* (pid #:optional database #:rest rest)
> + (use-modules (guix build utils)
> + (ice-9 match)
> + (srfi srfi-19))

Non-top-level ‘use-modules’ should be avoided; it’s not really supposed
to work. If you have these three modules in the ‘modules’ field of the
parent <shepherd-service> record, that’s enough (I know, it’s not pretty).

Toggle quote (19 lines)
> + ;; Fork so we can drop privileges.
> + (match (primitive-fork)
> + (0
> + ;; Exit with a non-zero status code if an exception is thrown.
> + (dynamic-wind
> + (const #t)
> + (lambda ()
> + (setgid (passwd:gid user))
> + (setuid (passwd:uid user))
> + (umask #o027)
> + (format (current-output-port)
> + "postgres: creating backup ~a.~%"
> + (basename file-name))
> + (mkdir-p (dirname file-name))
> + (let* ((result (apply system* pg_dump database
> + "-f" file-name
> + options))
> + (exit-value (status:exit-val result)))

Would it work to use ‘fork+exec-command’ to do all this? It’d be great
if we could avoid the boilerplate.

Toggle quote (2 lines)
> +(define (postgresql-list-backups-action backup-directory)

Docstring. :-)

Toggle quote (22 lines)
> + (match (primitive-fork)
> + (0
> + (dynamic-wind
> + (const #t)
> + (lambda ()
> + (setgid (passwd:gid user))
> + (setuid (passwd:uid user))
> + (let* ((backup-file (string-append #$backup-directory
> + "/" file))
> + (database (match (string-split file #\@)
> + ((name date) name)))
> + (create? (not (database-exists? database)))
> + (options (list "--clean" "--if-exists"
> + (if create?
> + "--create"
> + "--single-transaction"))))
> + (format (current-output-port)
> + "postgres: restoring ~a.~%" file)
> + (let* ((result (apply system* pg_restore backup-file
> + "-d" (if create? "postgres" database)
> + options))

Same here: ‘fork+exec-command’?

Overall I find it nice and convenient, but I wonder how far we should go
with our services. After all, it’s just one way to make backups, there
are probably other ways, so should we have this particular method
hardwired?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 4 Aug 2022 11:10
control message for bug #56045
(address . control@debbugs.gnu.org)
8735ecpe5b.fsf@gnu.org
tags 56045 + moreinfo
quit
G
G
Giovanni Biscuolo wrote on 28 Feb 13:32 +0100
Re: [bug#56045] [PATCH] Back up and restore PostgreSQL databases with Shepherd
(address . 56045@debbugs.gnu.org)
87h6hs4vly.fsf@xelera.eu
Hello Marius and Ludovic,

maybe I'm late at the party, sorry.

I'm interested in this patch and I'd like to test it and help as I can
to upstream it: Marius could you please address Ludovic comments and
send an updated patch?

I also have a few comments/questions of mine...

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (7 lines)
> Marius Bakke <marius@gnu.org> skribis:
>
>> The attached patch adds backup and restore mechanisms to the PostgreSQL
>> Shepherd service. It looks like this (here with a db named 'mreg'):
>>
>> $ sudo herd backup postgres mreg

backup or... dump? :-)

Also: what about a dump/restore of all the databases in a cluster?

AFAIU something like this could be easily automated via an mcron job (or
extending the service with fully automated dumps management in the
future)

Toggle quote (13 lines)
>> $ sudo -u postgres psql -c 'drop database mreg' # whoops ...
>> DROP DATABASE
>> $ sudo herd list-backups postgres mreg
>> mreg@2022-06-16_21-55-07
>> mreg@2022-06-16_22-48-59
>> $ sudo herd restore postgres mreg@2022-06-16_22-48-59
>> $ sudo -u postgres psql mreg
>> mreg=#
>>
>> Pretty cool, no? :-)
>
> Indeed! :-)

This would be simply fantastic

IMO there should be a way to automatically delete old backups
(max-backup-files? max-retention-period?) when starting a new one, in
order not fill the entoire disk after some time

Toggle quote (4 lines)
>> With this patch you can 'herd backup' each database, stop postgres,
>> _delete_ /var/lib/postgresql/data, reconfigure with a newer version,
>> and 'herd restore' them again

This would be a great workflow for upgrades, the backup/restore of the
datadases (the status) could also be automated on the very first start
of the service: if PostgreSQL fails due to an incopatible database
version, make a backup using the previuos psql version (I guess that can
be easily found), restore it with the new version and then start the
service (with the new version)... no?

Toggle quote (5 lines)
>> -- but you'll lose any role passwords (and
>> roles not declared by postgresql-role-service-type).
>>
>> Not sure what to about roles, maybe a backup-roles command?

Ideally all roles should be declaratively managed (at least this is the
way i like it!) but passwords can be managed only imperatively AFAIU [1]

IMO a [dump|restore]-role command is needed, also; something doing:

pg_dumpall -U postgres -h localhost -p 5433 --clean --roles-only
--file=roles.sql

"--roles-only" or "--globals-only" (roles and tablespaces)?

AFAIU roles.sql restore should be done /before/ the (re)creation of
roles declared by postgresql-role-service-type

[...]

Toggle quote (2 lines)
> Not being a database person, I’ll comment on the code:

Not being a Guile person, I'll not comment on the code :-)

[...]

Toggle quote (5 lines)
> Overall I find it nice and convenient, but I wonder how far we should go
> with our services. After all, it’s just one way to make backups, there
> are probably other ways, so should we have this particular method
> hardwired?

Yes please :-)

Doing a pgSQL database dump (backup?) with pg_dump (that is hardwired
;-) ) is a _prerequisite_ for all other backup tools users may choose to
adopt: borgbackup/borgmatic, restic, rdiff-backup and so on.

Having an /integrated/ way to *dump* and restore database status is a
great functionality for a database service, IMO... now we can do it "by
hand" for sure, but doing this semi-declaratively (and one day meybe
fully declaratively) would be great.

In other words: for database [2] sysadmins, backup (dump) is _part_ of
the service :-D


Happy hacking! Gio'


[1] actyally I'd like to find a way to avoid this and manage roles
/only/ declaratively (actually _dropping_ all not declared roles, to
avoid "old status stratification" problems)... but this is off-topic
here.

[2] all databases with a binary on-disk format that cannot me managed
like a simple file or directory like pgSQL, MySQL, openLDAP and so on.

--
Giovanni Biscuolo

Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----

iQJABAEBCgAqFiEERcxjuFJYydVfNLI5030Op87MORIFAmXfJ/kMHGdAeGVsZXJh
LmV1AAoJENN9DqfOzDkSnmsP/j7Eym8AGq5mc+D7huJ4bkhgs222WK+zuBbsh0R1
Qi93j668rczE4mg1JruvOrPlD9uFFUvaeriFGB/8r/zO7s3gy5JW854F0vLQfwlC
UYP9KuWiQbxhXzM7RfIGjLUIrodBQD8cc1nR6MBhkvMbemjpEdz4dP6LRFc77ANG
oXzRLjOpEFvhDRPFCLCClw8OZNPOHSR63MBBF1r/F63to7RZA/11+CqC1b1Ilssy
kRrE8bgb5RTBmxJ/153QG9Vs7mJo7gxlhLBlcvzHHkCJ15Dgi1gv5GThWvIx08Zw
GTcMAO6NVZGSrP4Ae2OY+pkBnJ3Gt4sk8utRXmaCIpdfMPPeRSWckisNdRkGw6uK
isHrmDsyeYI8QnaOZdoKWw+jYEqN03/VirE6C7yr/xEqzn2uGJFCsOpjmRnxmclQ
dQNmkB5Re+4IntVy4FtgoD4ZgfaUsPaT6oPYrA30Dzg+o6wYqQNFfx2uL4SyWRGb
+uEkQV4gUfY9naN1fcD6TpLDcveddhAv+Gu9rW3yHAmKBFkhTEJy95f86w1hz7jf
ET6D9WXTTEzj10XmgaaNq2OiuioN/bqYqAaTHgU8wWJIBH33i/IWImFtNYd/L6gG
BYXm3JdYa9Ex7tWIJTl6lPenLNK9dgwnyOMwnF5W+fdUqrKL7id96+F4vJTgpd5y
OH2h
=GqXh
-----END PGP SIGNATURE-----

?