GDM, GNOME: Can't start desktop session after upgrade

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Luis Felipe
Owner
unassigned
Submitted by
Luis Felipe
Severity
normal
L
L
Luis Felipe wrote on 17 May 2022 20:49
(name . bug-guix@gnu.org)(address . bug-guix@gnu.org)
-ERAqngkTp_rn5o9bawg5LQaV0bIQaYkKKbA55nOcyAHmZA88EsuHxKdIrt7k6pauSeGlZH47uE68OXt-xpG_z6yZPs_VCu1qLw_N3Yp9WY=@protonmail.com
I'm using the Guix system. After upgrading it to recent versions of Guix (e.g. guix 1110479), I can't access the GNOME desktop. The problem is also present when creating a virtual machine with the same configuration file used for the real machine (guix system vm config.scm).


STEPS TO REPRODUCE

I guess these:

1. guix pull --commit=1110479d2d4dc69fc66eadb4a6c24ca2f13afa93
2. guix package -m MANIFEST.scm
3. sudo guix system reconfigure CONFIG.scm
4. sudo reboot
5. Log in in GDM

(MANIFEST and CONFIG available on request if necessary)


EXPECTED RESULT

After a few seconds, the GNOME desktop shows up, and you can start working normally.


UNEXPECTED RESULT

In the real machine, only a black screen appears. After hours, there is no change. There is no mouse pointer or any indication that the desktop has loaded.

In the virtual machine, you see a black screen and the mouse pointer appearing sometimes and then disappearing. After a minute, you are taken back to the log in screen, which now behaves erratically: when you click on your user to log in again, it shows the username input box, and, once you move the pointer to enter your username, the view is changed back to the list of users. The same happens everytime you try to log in again.


ADDITIONAL INFORMATION

The most recent system generation where this problem does not occur for me is

~~~
Generation 81 Mar 24 2022 12:31:43
file name: /var/guix/profiles/system-81-link
canonical file name: /gnu/store/lwaf0xhp93k7l9yibxfyiwwm5rv5d52q-system
label: GNU with Linux-Libre 5.16.17
bootloader: grub-efi
root device: UUID: 3f651226-f53e-4944-8bf8-a0b8c28cfac5
kernel: /gnu/store/0bx2k3qslb9hw9cf52hl9xal39455maa-linux-libre-5.16.17/bzImage
channels:
guix:
branch: master
commit: ab98b51ef1304e975da127e3092d01dcc7f02657
configuration file: /gnu/store/64rz9n8hd1np1aljjiwvy2347abjv04w-configuration.scm
~~~

The oldest generation where the problem still occurs is

~~~
Generation 83 Apr 18 2022 13:37:16
file name: /var/guix/profiles/system-83-link
canonical file name: /gnu/store/y78s36kn6h4cnz50a11gi6zv662x2f95-system
label: GNU with Linux-Libre 5.16.20
bootloader: grub-efi
root device: UUID: 3f651226-f53e-4944-8bf8-a0b8c28cfac5
kernel: /gnu/store/ai4r464i09bay67f33qlmgxd7kdqjv5f-linux-libre-5.16.20/bzImage
channels:
guix:
branch: master
commit: 237d90a7808cfdced34b34595eba16632cbcb89e
configuration file: /gnu/store/7s2xwvwvsa6nv3mmn6g65vra4mlmlw0w-configuration.scm
~~~


WORKAROUND

Personally, I choose one of these options:

1. Use Sway as an emergency desktop (which is always in my system config file).
2. Roll back to a generation of the system that works (e.g. guix ab98b51e). This usually requires to roll back to a previous generation of the user profile as well to avoid other errors when logging in to GNOME.


SYSTEM INFORMATION

Guix system:

~~~
guix system describe
Generation 84 May 16 2022 22:04:49 (current)
file name: /var/guix/profiles/system-84-link
canonical file name: /gnu/store/hy2r1ff491v5rlscwz26880sx87a50la-system
label: GNU with Linux-Libre 5.17.7
bootloader: grub-efi
root device: UUID: 3f651226-f53e-4944-8bf8-a0b8c28cfac5
kernel: /gnu/store/0yl7hq7aca9dvlc514mj97f9f7s5vns4-linux-libre-5.17.7/bzImage
channels:
guix:
branch: master
commit: 1110479d2d4dc69fc66eadb4a6c24ca2f13afa93
configuration file: /gnu/store/7s2xwvwvsa6nv3mmn6g65vra4mlmlw0w-configuration.scm
~~~

Guix:

~~~
guix describe
Generation 56 May 15 2022 08:04:22 (current)
guix 1110479
branch: master
commit: 1110479d2d4dc69fc66eadb4a6c24ca2f13afa93
~~~



---
Luis Felipe López Acevedo
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 18 May 2022 10:44
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)(address . 55488@debbugs.gnu.org)
87k0aj44dd.fsf@gnu.org
Hi Luis,

Luis Felipe <luis.felipe.la@protonmail.com> skribis:

Toggle quote (2 lines)
> I'm using the Guix system. After upgrading it to recent versions of Guix (e.g. guix 1110479), I can't access the GNOME desktop. The problem is also present when creating a virtual machine with the same configuration file used for the real machine (guix system vm config.scm).

This is a config that uses Xorg, not Wayland, right?

Toggle quote (4 lines)
> In the real machine, only a black screen appears. After hours, there is no change. There is no mouse pointer or any indication that the desktop has loaded.
>
> In the virtual machine, you see a black screen and the mouse pointer appearing sometimes and then disappearing. After a minute, you are taken back to the log in screen, which now behaves erratically: when you click on your user to log in again, it shows the username input box, and, once you move the pointer to enter your username, the view is changed back to the list of users. The same happens everytime you try to log in again.

What does ‘sudo herd status’ show at that point?

I wonder if it might be https://issues.guix.gnu.org/55488.

If not, could you provide a minimal config that reproduces the problem?

Thanks,
Ludo’.
L
L
Luis Felipe wrote on 18 May 2022 16:33
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 55488@debbugs.gnu.org)
PJAAn2A1jctnhINfTc2LnA55WhC2uGZOj0txj9UY6RlK0PXL16sBbqNTHK6nFPLVdJeGjjEvS9mq7mC5HC0oN4DnXkAYDXb6olwvgusQQ3Q=@protonmail.com
Hi Ludo,

On Wednesday, May 18th, 2022 at 8:44 AM, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (2 lines)
> This is a config that uses Xorg, not Wayland, right?

Right.

Toggle quote (10 lines)
> > In the real machine, only a black screen appears. After hours, there is no change. There is no mouse pointer or any indication that the desktop has loaded.
> >

> > In the virtual machine, you see a black screen and the mouse pointer appearing sometimes and then disappearing. After a minute, you are taken back to the log in screen, which now behaves erratically: when you click on your user to log in again, it shows the username input box, and, once you move the pointer to enter your username, the view is changed back to the list of users. The same happens everytime you try to log in again.
>

>

> What does ‘sudo herd status’ show at that point?

~~~
Iniciado (Started):
+ avahi-daemon
+ console-font-tty1
+ console-font-tty2
+ console-font-tty3
+ console-font-tty4
+ console-font-tty5
+ console-font-tty6
+ dbus-system
+ elogind
+ file-system-/boot/efi
+ file-system-/dev/pts
+ file-system-/dev/shm
+ file-system-/gnu/store
+ file-system-/run/systemd
+ file-system-/run/user
+ file-system-/sys/firmware/efi/efivars
+ file-system-/sys/fs/cgroup
+ file-system-/sys/fs/cgroup/blkio
+ file-system-/sys/fs/cgroup/cpu
+ file-system-/sys/fs/cgroup/cpuacct
+ file-system-/sys/fs/cgroup/cpuset
+ file-system-/sys/fs/cgroup/devices
+ file-system-/sys/fs/cgroup/elogind
+ file-system-/sys/fs/cgroup/freezer
+ file-system-/sys/fs/cgroup/memory
+ file-system-/sys/fs/cgroup/perf_event
+ file-system-/sys/fs/cgroup/pids
+ file-system-/sys/kernel/debug
+ file-system-/var/cache/fontconfig
+ file-systems
+ guix-daemon
+ loopback
+ mcron
+ networking
+ nscd
+ ntpd
+ root
+ root-file-system
+ swap-/dev/sda2
+ syslogd
+ term-tty1
+ term-tty2
+ term-tty3
+ term-tty4
+ term-tty5
+ term-tty6
+ udev
+ upower-daemon
+ urandom-seed
+ user-file-systems
+ user-processes
+ virtual-terminal
+ wpa-supplicant
+ xorg-server
Detenido (Stopped):
- term-console
Ejecución única (one-shot):
* host-name
* sysctl
* user-homes
~~~


Toggle quote (2 lines)
> I wonder if it might be https://issues.guix.gnu.org/55488.

Did you mean another bug number? 55488 is the one I reported :)


Toggle quote (2 lines)
> If not, could you provide a minimal config that reproduces the problem?

Here's one:

~~~
(use-modules (gnu)
(gnu packages dns))
(use-service-modules admin desktop networking ssh xorg)


;;; CONSTANTS
;;; =========

(define DESKTOP_USER_GROUPS
(list "wheel"
"netdev"
"audio"
"video"
"cdrom"
"kvm"))



;;; SYSTEM CONFIGURATION
;;; ====================

(operating-system
(locale "es_CO.utf8")
(timezone "America/Bogota")
(keyboard-layout
(keyboard-layout "latam" #:model "pc104" #:options '("ctrl:nocaps")))
(kernel-arguments (list "quiet" "pci=noaer"))
(bootloader
(bootloader-configuration
(bootloader grub-efi-bootloader)
(targets (list "/boot/efi"))
(keyboard-layout keyboard-layout)))
(swap-devices
(list (swap-space
(target "/dev/sda2"))))
(file-systems
(cons* (file-system
(mount-point "/")
(device
(uuid "3f651226-f53e-4944-8bf8-a0b8c28cfac5"
'ext4))
(type "ext4"))
(file-system
(mount-point "/boot/efi")
(device (uuid "A0CA-050E" 'fat32))
(type "vfat"))
%base-file-systems))
(host-name "gnu")
(users
(cons*
(user-account
(name "someone")
(comment "some person")
(group "users")
(home-directory "/home/someone")
(supplementary-groups DESKTOP_USER_GROUPS)
(password (crypt "InitialPassword!" "$6$abc")))
%base-user-accounts))
(packages
(cons*
;; Locales (all locales).
(specification->package "glibc-locales")
;; For HTTPS access.
(specification->package "nss-certs")
%base-packages))
(services
(cons*
(service gnome-desktop-service-type)
(set-xorg-configuration
(xorg-configuration
(keyboard-layout keyboard-layout)))
%desktop-services)))
~~~
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 31 May 2022 11:29
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)(address . 55488@debbugs.gnu.org)
87ilpmyrr1.fsf@gnu.org
Hi Luis,

Luis Felipe <luis.felipe.la@protonmail.com> skribis:

Toggle quote (6 lines)
> On Wednesday, May 18th, 2022 at 8:44 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> This is a config that uses Xorg, not Wayland, right?
>
> Right.

[...]

Toggle quote (2 lines)
>> What does ‘sudo herd status’ show at that point?

Looks as if everything was started.

Could you share bits of /var/log/messages starting from the line that
reads “syslogd (GNU inetutils 2.0): restart” (the last boot)? This
could contain sensitive information so you can redact it or share it
privately if you prefer.

Toggle quote (4 lines)
>> I wonder if it might be https://issues.guix.gnu.org/55488.
>
> Did you mean another bug number? 55488 is the one I reported :)

Oops, I meant https://issues.guix.gnu.org/55444, which is now fixed.

Toggle quote (4 lines)
>> If not, could you provide a minimal config that reproduces the problem?
>
> Here's one:

It works fine in ‘guix system vm’. :-/
(Same for you?)

So it could be a stateful issue, such as stale files in /var/lib/gdm.
One way to test that hypothesis would be, for instance, to run:

sudo mv /var/lib/gdm/.cache{,.bak}
sudo herd restart xorg-server

Thanks,
Ludo’.
L
L
Luis Felipe wrote on 31 May 2022 22:50
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 55488@debbugs.gnu.org)
rE4NHRaBKx8nlN8aX2lnlejM_N3B5sHrw9ApwNOxqHn4aBBj1VYEePjWSo-culkO87jo-Oas6633UYKgoBh7utQak6QatzfPL6zXeno7lBw=@protonmail.com
Hi Ludo,

Since its been a while, lots of commits have gone into master and https://issues.guix.gnu.org/55444is solved, I decided to reconfigure the system, and try again. Unfortunately, I still can't start a GNOME session. I update the system information below and provide the data you asked for in the last message.

My current system generation that presents the same issue is this one:

~~~
Generation 86 May 31 2022 11:42:31 (current)
file name: /var/guix/profiles/system-86-link
canonical file name: /gnu/store/pfaj2d8w7vwg1ywrb7pib958angiys2r-system
label: GNU with Linux-Libre 5.17.11
bootloader: grub-efi
root device: UUID: 3f651226-f53e-4944-8bf8-a0b8c28cfac5
kernel: /gnu/store/w8py29cnikbg69jvxhxb33cp86vx4lxc-linux-libre-5.17.11/bzImage
channels:
guix:
branch: master
commit: d11d27e2e4a0e6d132127f5130533b0ce9d04481
configuration file: /gnu/store/7s2xwvwvsa6nv3mmn6g65vra4mlmlw0w-configuration.scm
~~~

Checking /var/log/messages for the restart after system reconfigure, I see several lines that have errors, one of which is CRITICAL:

~~~
...
May 31 12:23:57 localhost vmunix: [ 0.343582] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20211217/psargs-330)
May 31 12:23:57 localhost vmunix: [ 0.343589] ACPI Error: Aborting method \_SB.PR01._CPC due to previous error (AE_NOT_FOUND) (20211217/psparse-529)
May 31 12:23:57 localhost vmunix: [ 0.343616] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20211217/psargs-330)
May 31 12:23:58 localhost vmunix: [ 0.343619] ACPI Error: Aborting method \_SB.PR02._CPC due to previous error (AE_NOT_FOUND) (20211217/psparse-529)
May 31 12:23:37 localhost NetworkManager[380]: <info> [1654017817.5747] manager[0x12a9000]: monitoring kernel firmware directory '/lib/firmware'.
May 31 12:23:58 localhost vmunix: [ 0.343644] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20211217/psargs-330)
May 31 12:23:58 localhost vmunix: [ 0.343648] ACPI Error: Aborting method \_SB.PR03._CPC due to previous error (AE_NOT_FOUND) (20211217/psparse-529)
...
May 31 12:23:49 localhost gdm: Gdm: GdmLocalDisplayFactory: Failed to issue method call: GDBus.Error:org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms)
...
May 31 12:24:24 localhost vmunix: [ 24.524488] Error: Driver 'pcspkr' is already registered, aborting...
...
May 31 12:24:20 localhost gdm: Gdm: Failed to contact accountsservice: Error calling StartServiceByName for org.freedesktop.Accounts: Timeout was reached
...
May 31 12:24:31 localhost shepherd[1]: [dbus-daemon] ** (accounts-daemon:421): CRITICAL **: 12:24:28.105: error getting polkit authority: Error initializing authority: Error calling StartServiceByName for org.freedesktop.PolicyKit1: Timeout was reached
...
May 31 12:24:47 localhost gnome-session-binary[488]: WARNING: Failed to upload environment to systemd: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Name "org.freedesktop.systemd1" does not exist
...
May 31 12:25:00 localhost pulseaudio[552]: [pulseaudio] bluez5-util.c: GetManagedObjects() failed: org.freedesktop.DBus.Error.ServiceUnknown: The name org.bluez was not provided by any .service files
~~~

But I sent the whole log to you privately just in case.

After trying to log in to GNOME, running "sudo herd status" displays pretty much the same as before:

~~~
Iniciado:
+ avahi-daemon
+ console-font-tty1
+ console-font-tty2
+ console-font-tty3
+ console-font-tty4
+ console-font-tty5
+ console-font-tty6
+ dbus-system
+ elogind
+ file-system-/boot/efi
+ file-system-/dev/pts
+ file-system-/dev/shm
+ file-system-/gnu/store
+ file-system-/run/systemd
+ file-system-/run/user
+ file-system-/sys/firmware/efi/efivars
+ file-system-/sys/fs/cgroup
+ file-system-/sys/fs/cgroup/blkio
+ file-system-/sys/fs/cgroup/cpu
+ file-system-/sys/fs/cgroup/cpuacct
+ file-system-/sys/fs/cgroup/cpuset
+ file-system-/sys/fs/cgroup/devices
+ file-system-/sys/fs/cgroup/elogind
+ file-system-/sys/fs/cgroup/freezer
+ file-system-/sys/fs/cgroup/memory
+ file-system-/sys/fs/cgroup/perf_event
+ file-system-/sys/fs/cgroup/pids
+ file-system-/sys/kernel/debug
+ file-system-/var/cache/fontconfig
+ file-systems
+ guix-daemon
+ loopback
+ mcron
+ networking
+ nscd
+ ntpd
+ root
+ root-file-system
+ swap-/dev/sda2
+ syslogd
+ term-tty1
+ term-tty2
+ term-tty3
+ term-tty4
+ term-tty5
+ term-tty6
+ udev
+ upower-daemon
+ urandom-seed
+ user-file-systems
+ user-processes
+ virtual-terminal
+ wpa-supplicant
+ xorg-server
Detenido:
- term-console
Ejecución única (one-shot):
* host-name
* sysctl
* user-homes
~~~

Running "guix system vm config.scm", where config.scm is the same minimal config in comment 2 (https://issues.guix.gnu.org/55488#2),the problem is still there and it is pretty much the same as in the original's report VM session: I log in, but the desktop session never starts; I'm taken back to log in screen. The only difference is that this time the log in form does not present problems on subsequent tries.

Finally, moving files from /var/lib/gdm out of the way does not help. The problem persists.

For what it's worth, this is my machine information:

~~~
OS: Guix System x86_64
Host: Vostro 3470
Kernel: 5.17.11-gnu
Shell: bash 5.1.8
WM: Mutter
Terminal: alacritty
CPU: Intel i3-8100 (4) @ 3.600GHz
GPU: Intel 8th Gen Core Processor Gaussian Mixture Model
Memory: 1791MiB / 3761MiB
HDD: 1 TiB (Yes, Hard Disk Drive)
~~~
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 1 Jun 2022 21:55
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)(address . 55488@debbugs.gnu.org)
87sfootayu.fsf@gnu.org
Hi Luis,

The parts related to elogind I see from the log are the following:

Toggle snippet (13 lines)
190:May 31 12:23:27 localhost shepherd[1]: Service elogind has been started.
220:May 31 12:23:14 localhost elogind[382]: New seat seat0.
264:May 31 12:23:15 localhost elogind[382]: Watching system buttons on /dev/input/event2 (Power Button)
280:May 31 12:23:16 localhost dbus-daemon[349]: [system] Activating service name='org.freedesktop.login1' requested by ':1.1' (uid=0 pid=383 comm="/gnu/store/qpaw2f734zlsq153fkn5afcv4k4fk63z-upower") (using servicehelper)
343:May 31 12:23:20 localhost elogind[382]: Watching system buttons on /dev/input/event1 (Power Button)
415:May 31 12:23:26 localhost elogind[382]: Watching system buttons on /dev/input/event0 (Sleep Button)
604:May 31 12:23:41 localhost dbus-daemon[349]: [system] Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms)
643:May 31 12:23:45 localhost elogind[382]: Watching system buttons on /dev/input/event6 (HUION Huion Tablet_HS611 Consumer Control)
684:May 31 12:23:49 localhost gdm: Gdm: GdmLocalDisplayFactory: Failed to issue method call: GDBus.Error:org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms)
752:May 31 12:24:18 localhost vmunix: [ 17.648359] shepherd[1]: Service file-system-/sys/fs/cgroup/elogind has been started.
959:May 31 12:24:33 localhost elogind[382]: New session c1 of user gdm.

So it has to do with elogind taking too long to start. Same with
polkit:

Toggle snippet (4 lines)
854:May 31 12:24:10 localhost polkitd[426]: Started polkitd version 0.120
925:May 31 12:24:31 localhost shepherd[1]: [dbus-daemon] ** (accounts-daemon:421): CRITICAL **: 12:24:28.105: error getting polkit authority: Error initializing authority: Error calling StartServiceByName for org.freedesktop.PolicyKit1: Timeout was reached

Now, something’s wrong with those timestamps, where shepherd’s message,
which came first, has a timestamp in the future.

Could it be https://issues.guix.gnu.org/55707? At boot time, if you
switch to tty12 (by pressing alt-f12) as soon as it’s available, do you
see messages that get printed pretty slowly, like 4–5 lines per second?

Sorry for not having clearer ideas, but maybe we have a lead…

Thanks,
Ludo’.
L
L
Luis Felipe wrote on 2 Jun 2022 21:44
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 55488@debbugs.gnu.org)
TSaXpZVm1bRDOoefNbGepAhg5X6Iho264kzHCZDljAGWQsqvG8JnajOQyQJVoYuHnWjX2nzhTlvERFdyQnuC-DPyTcUwJKKu3FxUo6YdNm0=@protonmail.com
Hello again,

On Wednesday, June 1st, 2022 at 19:55, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (3 lines)
> So it has to do with elogind taking too long to start. Same with
> polkit:

Timeouts are common here...


Toggle quote (11 lines)
> --8<---------------cut here---------------start------------->8---
>

> 854:May 31 12:24:10 localhost polkitd[426]: Started polkitd version 0.120
> 925:May 31 12:24:31 localhost shepherd[1]: [dbus-daemon] ** (accounts-daemon:421): CRITICAL **: 12:24:28.105: error getting polkit authority: Error initializing authority: Error calling StartServiceByName for org.freedesktop.PolicyKit1: Timeout was reached
> --8<---------------cut here---------------end--------------->8---
>

> Now, something’s wrong with those timestamps, where shepherd’s message,
> which came first, has a timestamp in the future.

Ooh...

Toggle quote (4 lines)
> Could it be https://issues.guix.gnu.org/55707?At boot time, if you
> switch to tty12 (by pressing alt-f12) as soon as it’s available, do you
> see messages that get printed pretty slowly, like 4–5 lines per second?

Maybe? I recorded a video. Image is low quality, but speed is accurate (I changed to tty12 at second 33):


At 01:57, it takes a while to start GDM session (that's common; always been like that, I think).


Toggle quote (2 lines)
> Sorry for not having clearer ideas, but maybe we have a lead…

No worries, thanks again for taking a look at it.
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 2 Jun 2022 23:02
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)
87o7zalqxw.fsf@gnu.org
Hola,

Luis Felipe <luis.felipe.la@protonmail.com> skribis:

Toggle quote (8 lines)
>> Could it be https://issues.guix.gnu.org/55707?At boot time, if you
>> switch to tty12 (by pressing alt-f12) as soon as it’s available, do you
>> see messages that get printed pretty slowly, like 4–5 lines per second?
>
> Maybe? I recorded a video. Image is low quality, but speed is accurate (I changed to tty12 at second 33):
>
> https://luis-felipe.gitlab.io/media/2022/06/guix-system-ab98b51e-boot-TTY12.webm

Oh yes, that’s exactly what I observed on another machine recently.
I’ll trace to investigate further when I can access it.

In the meantime, could you try the following config, which disables
logging on tty12, to see if it boots quicker? The config changes is
twofold; first, after the ‘use-modules’ form of your OS config, add:

Toggle snippet (15 lines)
(define %syslog.conf
;; Custom syslog.conf without /dev/tty12 logging.
(plain-file "custom-syslog.conf"
(let loop ((lines (string-split (plain-file-content %default-syslog.conf)
#\newline))
(result '()))
(match lines
(()
(string-join (reverse result) "\n"))
((head . tail)
(if (string-contains head "/dev/tty12")
(loop tail result) ;drop this line
(loop tail (cons head result))))))))

Second, in the ‘services’ field of your config, replace
‘%desktop-services’ with:

Toggle snippet (6 lines)
(modify-services %desktop-services
(syslog-service-type
_ => (syslog-configuration
(config-file %syslog.conf))))

Dunno if it’s silly, but at least it’ll allow us to check whether it’s
just the framebuffer that’s slowing things down or if it’s something
else.

Thanks for your help!

Ludo’.
L
L
Luis Felipe wrote on 3 Jun 2022 01:51
(name . Ludovic Courtès)(address . ludo@gnu.org)
L6PGP_GXG5oaCpEsXGwEZrfD8cGc19YGAbpGZDDDvmjpgYbLQivsfbawzBf6eHebs_POoLS0kD22OeP7zUWvhMdzeobsSkGSH5PYjgqSEU8=@protonmail.com
Salut,

On Thursday, June 2nd, 2022 at 21:02, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (12 lines)
> In the meantime, could you try the following config, which disables
> logging on tty12, to see if it boots quicker? The config changes is
> twofold; first, after the ‘use-modules’ form of your OS config, add:
>

> [...]
>

> Dunno if it’s silly, but at least it’ll allow us to check whether it’s
> just the framebuffer that’s slowing things down or if it’s something
> else.

It seems nothing changed. I'm still unable to start a GNOME session, and booting time is about the same. I recorded a new video:

Attachment: signature.asc
L
L
Ludovic Courtès wrote on 4 Jun 2022 00:54
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)
87wndxfjcw.fsf@gnu.org
Hi,

Luis Felipe <luis.felipe.la@protonmail.com> skribis:

Toggle quote (2 lines)
> It seems nothing changed. I'm still unable to start a GNOME session, and booting time is about the same.

Indeed.

Here’s another debugging trick; would be great if you could try this:

Toggle snippet (9 lines)
(define strace-syslogd
(program-file "strace-syslogd"
#~(apply execl #$(file-append strace "/bin/strace")
"strace" ;argv[0]
"-f" "-Tt" "-o" "/syslogd.log" "-s" "80"
#$(file-append inetutils "/libexec/syslogd")
(cdr (command-line)))))

and then:

Toggle snippet (6 lines)
(modify-services %desktop-services
(syslog-service-type
_ => (syslog-configuration
(syslogd strace-syslogd))))

This creates a log file, /syslogd.log, which will allow us to see the
time it takes syslogd to read from /proc/kmsg and hopefully to determine
the origin of delays.

TIA!

Ludo’.
L
L
Ludovic Courtès wrote on 4 Jun 2022 19:07
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)
878rqcfjcn.fsf@gnu.org
Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (22 lines)
> Here’s another debugging trick; would be great if you could try this:
>
> (define strace-syslogd
> (program-file "strace-syslogd"
> #~(apply execl #$(file-append strace "/bin/strace")
> "strace" ;argv[0]
> "-f" "-Tt" "-o" "/syslogd.log" "-s" "80"
> #$(file-append inetutils "/libexec/syslogd")
> (cdr (command-line)))))
>
>
> and then:
>
> (modify-services %desktop-services
> (syslog-service-type
> _ => (syslog-configuration
> (syslogd strace-syslogd))))
>
> This creates a log file, /syslogd.log, which will allow us to see the
> time it takes syslogd to read from /proc/kmsg and hopefully to determine
> the origin of delays.

I tried this on a machine I have access to that exhibits this slowness,
and here’s what I get (excerpt that spans 2+ seconds of syslogd
activity):

Toggle snippet (104 lines)
328 18:46:13 openat(AT_FDCWD, "/dev/console", O_WRONLY|O_CREAT|O_APPEND, 0644) = 4 <0.000099>
328 18:46:13 openat(AT_FDCWD, "/var/log/messages", O_WRONLY|O_CREAT|O_APPEND, 0644) = 5 <0.000075>
328 18:46:13 ioctl(5, TCGETS, 0x7ffe23d6c930) = -1 ENOTTY (Inappropriate ioctl for device) <0.000261>
328 18:46:13 openat(AT_FDCWD, "/var/log/debug", O_WRONLY|O_CREAT|O_APPEND, 0644) = 6 <0.000201>
328 18:46:13 ioctl(6, TCGETS, 0x7ffe23d6c930) = -1 ENOTTY (Inappropriate ioctl for device) <0.000138>
328 18:46:13 openat(AT_FDCWD, "/dev/tty12", O_WRONLY|O_CREAT|O_APPEND, 0644) = 7 <0.001059>
328 18:46:13 ioctl(7, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000101>
328 18:46:13 openat(AT_FDCWD, "/var/log/secure", O_WRONLY|O_CREAT|O_APPEND, 0644) = 8 <0.000077>
328 18:46:13 ioctl(8, TCGETS, 0x7ffe23d6c930) = -1 ENOTTY (Inappropriate ioctl for device) <0.000039>
328 18:46:13 openat(AT_FDCWD, "/var/log/maillog", O_WRONLY|O_CREAT|O_APPEND, 0644) = 9 <0.000070>

[…]

328 18:46:13 read(3, "<5>[ 0.000000] Linux version 5.17.11-gnu (guix@guix) (gcc (GCC) 10.3.0, GNU l"..., 1024) = 981 <0.000083>
328 18:46:13 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000059>
322 18:46:13 +++ exited with 0 +++
328 18:46:13 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000049>
328 18:46:13 writev(7, [{iov_base="Jun 4 18:46:13", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Linux version 5.17.11-gnu (guix@guix) (gcc (GCC) 10.3.0, "..., iov_len=124}, {iov_base="\r\n", iov_len=2}], 6) = 152 <0.000086>
328 18:46:13 fsync(7) = -1 EINVAL (Invalid argument) <0.000036>
328 18:46:13 writev(6, [{iov_base="Jun 4 18:46:13", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Linux version 5.17.11-gnu (guix@guix) (gcc (GCC) 10.3.0, "..., iov_len=124}, {iov_base="\n", iov_len=1}], 6) = 151 <0.000063>
328 18:46:13 fsync(6) = 0 <0.380857>
328 18:46:13 writev(5, [{iov_base="Jun 4 18:46:13", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Linux version 5.17.11-gnu (guix@guix) (gcc (GCC) 10.3.0, "..., iov_len=124}, {iov_base="\n", iov_len=1}], 6) = 151 <0.000079>
328 18:46:13 fsync(5) = 0 <0.131763>
328 18:46:13 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000213>
328 18:46:13 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000160>
328 18:46:13 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000049>
328 18:46:13 writev(7, [{iov_base="Jun 4 18:46:13", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Command line: BOOT_IMAGE=/gnu/store/w8py29cnikbg69jvxhxb3"..., iov_len=314}, {iov_base="\r\n", iov_len=2}], 6) = 342 <0.000123>
328 18:46:13 fsync(7) = -1 EINVAL (Invalid argument) <0.000040>
328 18:46:13 writev(6, [{iov_base="Jun 4 18:46:13", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Command line: BOOT_IMAGE=/gnu/store/w8py29cnikbg69jvxhxb3"..., iov_len=314}, {iov_base="\n", iov_len=1}], 6) = 341 <0.000074>
328 18:46:13 fsync(6) = 0 <0.239999>
328 18:46:13 writev(5, [{iov_base="Jun 4 18:46:13", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Command line: BOOT_IMAGE=/gnu/store/w8py29cnikbg69jvxhxb3"..., iov_len=314}, {iov_base="\n", iov_len=1}], 6) = 341 <0.000156>
328 18:46:13 fsync(5) = 0 <0.099143>
328 18:46:14 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000128>
328 18:46:14 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000180>
328 18:46:14 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000100>
328 18:46:14 writev(7, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] KERNEL supported cpus:", iov_len=45}, {iov_base="\r\n", iov_len=2}], 6) = 73 <0.000189>
328 18:46:14 fsync(7) = -1 EINVAL (Invalid argument) <0.000039>
328 18:46:14 writev(6, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] KERNEL supported cpus:", iov_len=45}, {iov_base="\n", iov_len=1}], 6) = 72 <0.000097>
328 18:46:14 fsync(6) = 0 <0.198157>
328 18:46:14 writev(5, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] KERNEL supported cpus:", iov_len=45}, {iov_base="\n", iov_len=1}], 6) = 72 <0.000284>
328 18:46:14 fsync(5) = 0 <0.098527>
328 18:46:14 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000136>
328 18:46:14 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000133>
328 18:46:14 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000488>
328 18:46:14 writev(7, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Intel GenuineIntel", iov_len=43}, {iov_base="\r\n", iov_len=2}], 6) = 71 <0.000213>
328 18:46:14 fsync(7) = -1 EINVAL (Invalid argument) <0.000141>
328 18:46:14 writev(6, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Intel GenuineIntel", iov_len=43}, {iov_base="\n", iov_len=1}], 6) = 70 <0.000219>
328 18:46:14 fsync(6) = 0 <0.096564>
328 18:46:14 writev(5, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Intel GenuineIntel", iov_len=43}, {iov_base="\n", iov_len=1}], 6) = 70 <0.000215>
328 18:46:14 fsync(5) = 0 <0.125492>
328 18:46:14 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000128>
328 18:46:14 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000130>
328 18:46:14 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000252>
328 18:46:14 writev(7, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] AMD AuthenticAMD", iov_len=41}, {iov_base="\r\n", iov_len=2}], 6) = 69 <0.000368>
328 18:46:14 fsync(7) = -1 EINVAL (Invalid argument) <0.000130>
328 18:46:14 writev(6, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] AMD AuthenticAMD", iov_len=41}, {iov_base="\n", iov_len=1}], 6) = 68 <0.000402>
328 18:46:14 fsync(6) = 0 <0.112798>
328 18:46:14 writev(5, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] AMD AuthenticAMD", iov_len=41}, {iov_base="\n", iov_len=1}], 6) = 68 <0.000199>
328 18:46:14 fsync(5) = 0 <0.098952>
328 18:46:14 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000244>
328 18:46:14 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000128>
328 18:46:14 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000252>
328 18:46:14 writev(7, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Hygon HygonGenuine", iov_len=43}, {iov_base="\r\n", iov_len=2}], 6) = 71 <0.000375>
328 18:46:14 fsync(7) = -1 EINVAL (Invalid argument) <0.000039>
328 18:46:14 writev(6, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Hygon HygonGenuine", iov_len=43}, {iov_base="\n", iov_len=1}], 6) = 70 <0.000099>
328 18:46:14 fsync(6) = 0 <0.113492>
328 18:46:14 writev(5, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Hygon HygonGenuine", iov_len=43}, {iov_base="\n", iov_len=1}], 6) = 70 <0.000998>
328 18:46:14 fsync(5) = 0 <0.106071>
328 18:46:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000220>
328 18:46:15 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000115>
328 18:46:15 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000115>
328 18:46:15 writev(7, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Centaur CentaurHauls", iov_len=45}, {iov_base="\r\n", iov_len=2}], 6) = 73 <0.000262>
328 18:46:15 fsync(7) = -1 EINVAL (Invalid argument) <0.000048>
328 18:46:15 writev(6, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Centaur CentaurHauls", iov_len=45}, {iov_base="\n", iov_len=1}], 6) = 72 <0.000076>
328 18:46:15 fsync(6) = 0 <0.080533>
328 18:46:15 writev(5, [{iov_base="Jun 4 18:46:14", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] Centaur CentaurHauls", iov_len=45}, {iov_base="\n", iov_len=1}], 6) = 72 <0.000198>
328 18:46:15 fsync(5) = 0 <0.107653>
328 18:46:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000051>
328 18:46:15 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000055>
328 18:46:15 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000093>
328 18:46:15 writev(7, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] zhaoxin Shanghai ", iov_len=45}, {iov_base="\r\n", iov_len=2}], 6) = 73 <0.000197>
328 18:46:15 fsync(7) = -1 EINVAL (Invalid argument) <0.000042>
328 18:46:15 writev(6, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] zhaoxin Shanghai ", iov_len=45}, {iov_base="\n", iov_len=1}], 6) = 72 <0.000123>
328 18:46:15 fsync(6) = 0 <0.123659>
328 18:46:15 writev(5, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] zhaoxin Shanghai ", iov_len=45}, {iov_base="\n", iov_len=1}], 6) = 72 <0.000370>
328 18:46:15 fsync(5) = 0 <0.098383>
328 18:46:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000138>
328 18:46:15 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000128>
328 18:46:15 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000254>
328 18:46:15 writev(7, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE", iov_len=55}, {iov_base="\r\n", iov_len=2}], 6) = 83 <0.000383>
328 18:46:15 fsync(7) = -1 EINVAL (Invalid argument) <0.000135>
328 18:46:15 writev(6, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE", iov_len=55}, {iov_base="\n", iov_len=1}], 6) = 82 <0.000209>
328 18:46:15 fsync(6) = 0 <0.122001>
328 18:46:15 writev(5, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE", iov_len=55}, {iov_base="\n", iov_len=1}], 6) = 82 <0.000197>
328 18:46:15 fsync(5) = 0 <0.123772>
328 18:46:15 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000102>
328 18:46:15 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], [], 8) = 0 <0.000140>
328 18:46:15 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 <0.000167>
328 18:46:15 writev(7, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] signal: max sigframe size: 1440", iov_len=54}, {iov_base="\r\n", iov_len=2}], 6) = 82 <0.000273>
328 18:46:15 fsync(7) = -1 EINVAL (Invalid argument) <0.000129>
328 18:46:15 writev(6, [{iov_base="Jun 4 18:46:15", iov_len=15}, {iov_base=" ", iov_len=1}, {iov_base="localhost", iov_len=9}, {iov_base=" ", iov_len=1}, {iov_base="vmunix: [ 0.000000] signal: max sigframe size: 1440", iov_len=54}, {iov_base="\n", iov_len=1}], 6) = 81 <0.000099>
328 18:46:15 fsync(6) = 0 <0.121488>

During that time span, syslogd makes no less than 19 ‘fsync’ calls (not
counting the EINVAL ones), each of which takes between 100ms and
400ms—no wonder it’s slow. (This machine has a low-grade spinning HDD.)

There are two things we can do:

1. Use ‘fdatasync’ rather than ‘fsync’;

2. Explicitly disable syncing for some of the log files by prepending
a hyphen right before the file name in syslogd.conf (info
"(inetutils) syslogd invocation").

I’ll give that a try and report back.

Ludo’.
L
L
Ludovic Courtès wrote on 4 Jun 2022 19:35
Re: bug#55707: syslogd logging kernel messages slowly?
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)
874k10fi1d.fsf_-_@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (2 lines)
> 1. Use ‘fdatasync’ rather than ‘fsync’;

Turns out ‘fdatasync’ is about as expensive:

Toggle snippet (13 lines)
1234:301 19:22:00 fdatasync(6) = 0 <0.398749>
1236:301 19:22:01 fdatasync(5) = 0 <0.198520>
1243:301 19:22:01 fdatasync(6) = 0 <0.231935>
1245:301 19:22:01 fdatasync(5) = 0 <0.183503>
1252:301 19:22:01 fdatasync(6) = 0 <0.230854>
1254:301 19:22:02 fdatasync(5) = 0 <0.248839>
1261:301 19:22:02 fdatasync(6) = 0 <0.181513>
1263:301 19:22:02 fdatasync(5) = 0 <0.141267>
1270:301 19:22:02 fdatasync(6) = 0 <0.131548>
1272:301 19:22:02 fdatasync(5) = 0 <0.124369>
1279:301 19:22:02 fdatasync(6) = 0 <0.121414>

Tested with:

Toggle snippet (11 lines)
(define inetutils/fdatasync
(package/inherit inetutils
(version (string-append (package-version inetutils) ".fdatasync"))
(source (origin
(inherit (package-source inetutils))
(modules '((guix build utils)))
(snippet
'(substitute* "src/syslogd.c"
(("fsync") "fdatasync")))))))

It still makes more sense than ‘fsync’ though.

Ludo’.
L
L
Ludovic Courtès wrote on 4 Jun 2022 21:30
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)
87sfokdy5r.fsf_-_@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (4 lines)
> 2. Explicitly disable syncing for some of the log files by prepending
> a hyphen right before the file name in syslogd.conf (info
> "(inetutils) syslogd invocation").

Tried this (in addition to ‘fdatasync’):

Toggle snippet (25 lines)
(define %default-syslog.conf
(plain-file "syslog.conf" "
# Log all error messages, authentication messages of
# level notice or higher and anything of level err or
# higher to the console.
# Don't log private authentication messages!
*.alert;auth.notice;authpriv.none -/dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none /var/log/messages

*.=debug -/var/log/debug

# Same, in a different place.
*.info;mail.none;authpriv.none -/dev/tty12

# The authpriv file has restricted access.
authpriv.* /var/log/secure

# Log all the mail messages in one place.
mail.* -/var/log/maillog
"))

In addition to the hyphens, notice that now only debugging info goes to
/var/log/debug (previously everything going to /var/log/messages would
also go to /var/log/debug, which is kinda silly).

This halves the number of ‘fdatasync’ calls and thus doubles the
throughput, but we’re still spending ~0.1s on ‘fdatasync’ for each line,
which is slow.

So the next solution is to not ‘fdatasync’ on /var/log/messages either:

Toggle snippet (27 lines)
(define %default-syslog.conf
(plain-file "syslog.conf" "
# Log all error messages, authentication messages of
# level notice or higher and anything of level err or
# higher to the console.
# Don't log private authentication messages!
*.alert;auth.notice;authpriv.none -/dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none -/var/log/messages

# Log \"debug\"-level entries and nothing else.
*.=debug -/var/log/debug

# Same, in a different place.
*.info;mail.none;authpriv.none -/dev/tty12

# The authpriv file has restricted access.
# 'fsync' the file after each line.
authpriv.* /var/log/secure

# Log all the mail messages in one place.
mail.* -/var/log/maillog
"))

With this, the system boots much more quickly.

I’m tempted to commit the above syslog.conf. From a look at the code
and doc, it seems that neither rsyslog nor systemd-journald does
anything like “fsync after each logged line”, so the config above sounds
like a reasonable default (at any rate, the current default is
definitely unreasonable on slow spinning disks).

I’m curious to see if it fixes your other boot-up issues, timeouts and
all, Luis.

Ludo’.
L
L
Luis Felipe wrote on 5 Jun 2022 15:19
Re: bug#55488: GDM, GNOME: Can't start desktop session after upgrade
(name . Ludovic Courtès)(address . ludo@gnu.org)
lqIwB62_rCWJQZnFFvEVMObQSr8a05iCinuAj7TTPt1NVNWXRWqIn6ombMUqcXPOjp1gazR9itgutgU_oW5X5YVHN6-PIUJ_4CfGPQAv9oA=@protonmail.com
Howdy,


On Friday, June 3rd, 2022 at 22:54, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (39 lines)
> Here’s another debugging trick; would be great if you could try this:
>

> --8<---------------cut here---------------start------------->8---
>

> (define strace-syslogd
> (program-file "strace-syslogd"
> #~(apply execl #$(file-append strace "/bin/strace")
> "strace" ;argv[0]
> "-f" "-Tt" "-o" "/syslogd.log" "-s" "80"
> #$(file-append inetutils "/libexec/syslogd")
> (cdr (command-line)))))
> --8<---------------cut here---------------end--------------->8---
>

>

> and then:
>

> --8<---------------cut here---------------start------------->8---
>

> (modify-services %desktop-services
> (syslog-service-type
> _ => (syslog-configuration
>

> (syslogd strace-syslogd))))
> --8<---------------cut here---------------end--------------->8---
>

>

> This creates a log file, /syslogd.log, which will allow us to see the
> time it takes syslogd to read from /proc/kmsg and hopefully to determine
> the origin of delays.

I see you already tried this on the other machine, but I gave this a try too just in case and sent the log to you privately. I'll try the other tricks next.
Attachment: signature.asc
L
L
Luis Felipe wrote on 5 Jun 2022 17:25
Re: bug#55707: syslogd logging kernel messages slowly?
(name . Ludovic Courtès)(address . ludo@gnu.org)
YDcBv9iInLBBpElUUnxpfLjIK1LXG1wb_A5c59GZ9fBoegUo0a3aQ8mA38hlKalWZx-yLgSQ-Yuc2bC_1VWE18eoPOv2qnDOtNC1dd_5UUY=@protonmail.com
On Saturday, June 4th, 2022 at 19:30, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (3 lines)
> I’m curious to see if it fixes your other boot-up issues, timeouts and
> all, Luis.

Do I simply put both definitions (inetutils/fdatasync and %default-syslog.conf) in my system config and reconfigure...? Because I tried that and nothing seems to change.

Or, once defined, do I have to use inetutils/fdatasync and %default-syslog.conf somewhere in my system packages...?
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 6 Jun 2022 12:00
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)
87r142cdry.fsf@gnu.org
Hi,

Luis Felipe <luis.felipe.la@protonmail.com> skribis:

Toggle quote (9 lines)
> On Saturday, June 4th, 2022 at 19:30, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> I’m curious to see if it fixes your other boot-up issues, timeouts and
>> all, Luis.
>
> Do I simply put both definitions (inetutils/fdatasync and %default-syslog.conf) in my system config and reconfigure...? Because I tried that and nothing seems to change.
>
> Or, once defined, do I have to use inetutils/fdatasync and %default-syslog.conf somewhere in my system packages...?

Sorry, I wasn’t clear. Let’s forget about ‘inetutils/fdatasync’ because
it doesn’t have a measurable impact AFAICS.

In your config file, you first put:

Toggle snippet (27 lines)
(define %new-default-syslog.conf
(plain-file "syslog.conf" "
# Log all error messages, authentication messages of
# level notice or higher and anything of level err or
# higher to the console.
# Don't log private authentication messages!
*.alert;auth.notice;authpriv.none -/dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none -/var/log/messages

# Log \"debug\"-level entries and nothing else.
*.=debug -/var/log/debug

# Same, in a different place.
*.info;mail.none;authpriv.none -/dev/tty12

# The authpriv file has restricted access.
# 'fsync' the file after each line.
authpriv.* /var/log/secure

# Log all the mail messages in one place.
mail.* -/var/log/maillog
"))

Then, you replace ‘%desktop-services’ with:

Toggle snippet (6 lines)
(modify-services %desktop-services
(syslog-service-type
_ => (syslog-configuration
(config-file %new-default-syslog.conf))))

Thanks for testing!

Ludo’.
L
L
Luis Felipe wrote on 6 Jun 2022 18:31
(name . Ludovic Courtès)(address . ludo@gnu.org)
Fg9njIuG4A2FgdRgqnQTkiAs2y-P96sAEd-_lJ2_qjY6CioHRy3e074QMwDiUnlifipEfMfkN2T3zwXI28_2oQXh_6zjabCtsoNxglzWoNs=@protonmail.com
Hey,

On Monday, June 6th, 2022 at 10:00, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (39 lines)
> > > I’m curious to see if it fixes your other boot-up issues, timeouts and
> > > all, Luis.
> >

> > Do I simply put both definitions (inetutils/fdatasync and %default-syslog.conf) in my system config and reconfigure...? Because I tried that and nothing seems to change.
> >

> > Or, once defined, do I have to use inetutils/fdatasync and %default-syslog.conf somewhere in my system packages...?
>

> Sorry, I wasn’t clear. Let’s forget about ‘inetutils/fdatasync’ because
> it doesn’t have a measurable impact AFAICS.
>

> In your config file, you first put:
>

> --8<---------------cut here---------------start------------->8---
> (define %new-default-syslog.conf
> [...]
> --8<---------------cut here---------------end--------------->8---
>

>

> Then, you replace ‘%desktop-services’ with:
>

> --8<---------------cut here---------------start------------->8---
>

> (modify-services %desktop-services
> (syslog-service-type
> _ => (syslog-configuration
>

> (config-file %new-default-syslog.conf))))
> --8<---------------cut here---------------end--------------->8---

This does reduce the time it takes to get to GDM log in screen from GRUB. It used to be around 2:35 minutes, but now it takes around 1:25 minutes, which is nice.

Unfortunately, though, I still can't start a GNOME session. I logged in, and stared at a blackish screen for about 5 minutes. Nothing happened :)
Attachment: signature.asc
L
L
Luis Felipe wrote on 6 Jun 2022 20:11
Re: bug#55488: GDM, GNOME: Can't start desktop session after upgrade
(name . Ludovic Courtès)(address . ludo@gnu.org)
ak52RzsNYnyNcDjSO47mtUWBawvNYhxwdlQl_qoMIzg0CaaYn8If97e-jG_7yg5jeWnqS_-Sjdbz2w2su48alODaQWzeLoavKEwS6FUj1_U=@protonmail.com
It turns out the problem is related to user services. I just disabled my user services, rebooted, and GNOME session started.

I only have one user service defined in ~/.config/shepherd/init.scm as follows

~~~
;;; Definition of my user-level services.
(use-modules (shepherd service))


;;; SERVICE DEFINITIONS
;;; ===================

(define rygel
(make <service>
#:provides '(rygel)
#:docstring "Run the Rygel media server."
#:start (make-forkexec-constructor '("rygel"))
#:stop (make-kill-destructor)
#:respawn? #t))


;;; REGISTER SERVICES
;;; =================

(register-services rygel)


;;; START SERVICES
;;; ==============

(start rygel)

;;; Send shepherd to the background.
(action 'shepherd 'daemonize)
~~~

And I call "shepherd" in the ~/.profile file. This never posed a problem until my recent system generations from April to date, though, where I was never able to start a GNOME session. So I wonder what's going on now...

For what it's worth, the only error I see in ~/.config/shepherd/shepherd.log is this

2022-04-07 11:01:57 system-error("getpgid" "~A" ("No existe el proceso") (3))

"No existe el proceso" meaning, "The process doesn't exist". This error has occurred several times since 2021, though.
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 6 Jun 2022 22:46
Re: bug#55707: syslogd logging kernel messages slowly?
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)
87k09tbjuw.fsf@gnu.org
Hi,

Luis Felipe <luis.felipe.la@protonmail.com> skribis:

Toggle quote (2 lines)
> This does reduce the time it takes to get to GDM log in screen from GRUB. It used to be around 2:35 minutes, but now it takes around 1:25 minutes, which is nice.

Alright, so commit 264ca9452fae827d6621b28b8972f4b1d68401a1 closes this
particular issue (#55707).

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 6 Jun 2022 22:50
(name . Luis Felipe)(address . luis.felipe.la@protonmail.com)(address . 55488@debbugs.gnu.org)
87fskhbjp1.fsf_-_@gnu.org
Hi,

Luis Felipe <luis.felipe.la@protonmail.com> skribis:

Toggle quote (2 lines)
> It turns out the problem is related to user services. I just disabled my user services, rebooted, and GNOME session started.

OK.

Toggle quote (2 lines)
> I only have one user service defined in ~/.config/shepherd/init.scm as follows

[...]

Toggle quote (8 lines)
> And I call "shepherd" in the ~/.profile file. This never posed a problem until my recent system generations from April to date, though, where I was never able to start a GNOME session. So I wonder what's going on now...
>
> For what it's worth, the only error I see in ~/.config/shepherd/shepherd.log is this
>
> 2022-04-07 11:01:57 system-error("getpgid" "~A" ("No existe el proceso") (3))
>
> "No existe el proceso" meaning, "The process doesn't exist". This error has occurred several times since 2021, though.

Hmm OK. I’m not sure what’s going on; are you sure ‘rygel’ is in $PATH
when shepherd starts?

Anyway, this looks like “not-a-bug” from the perspective of Guix.
WDYT?

BTW, I’d encourage you to write a service for Guix Home for Rygel and to
use Guix Home (*then* if you have problems that’ll be a bug from the
perspective of Guix :-)).

Thanks,
Ludo’.
L
L
Luis Felipe wrote on 7 Jun 2022 01:09
Re: bug#55488: GDM, GNOME: Can't start desktop session after upgrade
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 55488@debbugs.gnu.org)
7mEqscI-KOcKRkmKPh6-5mhhVtXXOGI7ThfkZwTXsIebUPWRZh9wZBHV_2Rmj6-xoyIrVN0EwaqfEtEm42kXPv2w6swYfBWj1MRkcX4Bou8=@protonmail.com
Hi,

On Monday, June 6th, 2022 at 20:50, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (3 lines)
> Hmm OK. I’m not sure what’s going on; are you sure ‘rygel’ is in $PATH
> when shepherd starts?

No, but I know rygel is in my user profile; I can start it from a terminal. And it works when I go back to a previous system generation...

Toggle quote (8 lines)
> Anyway, this looks like “not-a-bug” from the perspective of Guix.
> WDYT?
>

> BTW, I’d encourage you to write a service for Guix Home for Rygel and to
> use Guix Home (then if you have problems that’ll be a bug from the
> perspective of Guix :-)).

So yeah, I'm closing this bug then and I'll be starting rygel manually for now.

Thanks,
Attachment: signature.asc
L
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 55488@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 55488
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch