PostgreSQL + Cuirass Errors

OpenSubmitted by Eric Brown.
Details
One participant
  • Eric Brown
Owner
unassigned
Severity
normal
E
E
Eric Brown wrote 4 days ago
(address . bug-guix@gnu.org)
87h7i6otsg.fsf@ericcbrown.com
Hello:
Executive Summary:- Can't reinstall Cuirass and/or PostgreSQL- Divide by 0 error reported by postgres when computing metrics
Details:I am having issues reconfiguring Cuirass and PostgreSQL . I wonder if these are relatedto several issues in PostgreSQL, and seem to occur when I reconfigureeither cuirass and/or postgres without Cuirass present, i.e. my "database server"

/etc/config.scm:----------------
(define %cuirass-specs #~(list (specification (name "my-cbc") (build '(packages "cbc"))) (specification (name "my-ipopt") (build '(packages "ipopt"))) (specification (name "my-linux-libre") (build '(packages "linux-libre"))) (specification (name "my-openblas-ilp64") (build '(packages "openblas-ilp64"))) (specification (name "my-qtbase") (build '(packages "qtbase"))) (specification (name "my-sylpheed") (build '(packages "sylpheed"))) (specification (name "my-texlive") (build '(packages "texlive")))))
(service cuirass-service-type (cuirass-configuration (specifications %cuirass-specs)))



An example session trying to get cuirass re-installed:
1. Comment out Cuirass in /etc/config.scm and reconfigure
building /gnu/store/9nmk3q8nwk51wqanpw4a5agwak0yfhpj-upgrade-shepherd-services.scm.drv...shepherd: Removing service 'cuirass-web'...shepherd: Done.shepherd: Removing service 'postgres-roles'...shepherd: Done.shepherd: Removing service 'cuirass'...shepherd: Done.shepherd: Removing service 'postgres'...shepherd: Done.shepherd: Service host-name has been started.shepherd: Service user-homes has been started.shepherd: Service sysctl has been started.shepherd: Service host-name has been started.shepherd: Service term-auto could not be started.To complete the upgrade, run 'herd restart SERVICE' to stop,upgrade, and restart each service that was not automatically restarted.Run 'herd status' to view the list of services on your system
2) At shell:# rm -rf /var/log/cuirass /var/log/cuirass.log* /var/log/cuirass.log /var/log/cuirass-web.log /var/cache/cuirass /var/lib/postgresql/data /var/lib/cuirass
3) Reboot
4) Check no files above are regenerated, e.g. by other services requiring postgresql (none found)
5) Re-enable Cuirass in /etc/config.scm, reconfigure: (frequently observed error at end of this item)
selecting default max_connections ... 100selecting default shared_buffers ... 128MBselecting default timezone ... US/Centralselecting dynamic shared memory implementation ... posixcreating configuration files ... okrunning bootstrap script ... okperforming post-bootstrap initialization ... sh: locale: command not found2021-06-10 05:57:26.532 CDT [1370] WARNING: no usable system locales were foundoksyncing data to disk ... ok
WARNING: enabling "trust" authentication for local connectionsYou can change this by editing pg_hba.conf or using the option -A, or--auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
/gnu/store/jsa77nkqcvsck4ksvm2b8sccl174hai4-postgresql-10.17/bin/pg_ctl -D /var/lib/postgresql/data -l logfile start
The following derivation will be built: /gnu/store/bmzhdkki40d8y6d6n9a3gw4g70xmv824-install-bootloader.scm.drv
building /gnu/store/bmzhdkki40d8y6d6n9a3gw4g70xmv824-install-bootloader.scm.drv...guix system: bootloader successfully installed on '/boot/efi'shepherd: Service host-name has been started.shepherd: Service user-homes has been started.shepherd: Service sysctl has been started.shepherd: Service host-name has been started.shepherd: Service term-auto could not be started.guix system: warning: exception caught while executing 'start' on service 'postgres':Throw to key `%exception' with args `("#<&invoke-error program: \"/gnu/store/4x3h2096cvzvq65wv40a4acwdyks9ivc-pg_ctl-wrapper\" arguments: (\"start\") exit-status: 1 term-signal: #f stop-signal: #f>")'.guix system: warning: some services could not be upgradedhint: To allow changes to all the system services to take effect, you will need to reboot.
6) Reboot
7) telnet localhost 5432
telnet localhost 5432Trying 127.0.0.1...telnet: Unable to connect to remote host: Connection refused
--------
I am also observing divide-by-zero errors reported by a PG process when computing metrics. Perhaps it is ignorable, but it seems to throw a Scheme "stack trace" that doesn't look good. I was unable to capture the specific message due to thrashing to restart Curirass and the DB.
I am able to reproduce this on several machines, this is my third attempt to install on a fresh machine, use as I expect (ability to add/remove/reconfigure services) etc.
This may be a red herring, but I can't help but feel that postgres is getting pulled in from other services as well, and that there may be a collision (e.g. PostgreSQL 10 and 13 both seem to get referenced.) I have stripped this system back to (essentially) bare-bones.scm, and see that PostgreSQL is even referenced by networkmanager package/service. (Which I am loathe to revert to dhcp since it handles wireguard. :-( )
Best regardsEric
PS I would add that i have seen an error like:guix system: warning: exception caught while executing 'start' on service 'postgres':Throw to key `%exception' with args `("#<&invoke-error program: \"/gnu/store/4x3h2096cvzvq65wv40a4acwdyks9ivc-pg_ctl-wrapper\" arguments: (\"start\") exit-status: 1 term-signal: #f stop-signal: #f>")'.
in another context, it was for nginx but a reboot fixed that and I can serve pages.
?