dbus-daemon fails to start within %pid-file-timeout when running from DVD

  • Done
  • quality assurance status badge
Details
5 participants
  • Bengt Richter
  • Danny Milosavljevic
  • Ludovic Courtès
  • Mathieu Othacehe
  • pelzflorian (Florian Pelz)
Owner
unassigned
Submitted by
pelzflorian (Florian Pelz)
Severity
important
P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 14:09
installer networking: Connman detects no technologies on Acer Aspire
(address . bug-guix@gnu.org)
20200412120918.bcl5yc37f4fq2gzr@pelzflorian.localdomain
On my Acer Aspire 5738PG the GUI installer for 1.1.0-rc2 fails at
automatic network setup. It tells me the install process required
Internet access. Manual setup of networking via “ip link set
enp0s29f7u1” and “dhclient enp0s29f7u1” works fine; I can ping
gnu.org. This is true for both a USB Ethernet adapter that works fine
with Guix on other computers and is true for built-in non-USB
Ethernet.

It appears the Guix installer uses Connman for network setup.

On the Acer Aspire the output of connmanctl is

Error getting peers: The name net.connman was not provided by any .service filesError getting technologies: The name net.connman was not provided by any .service files
connmanctl>

while on my Macbook the output of connmanctl is just a prompt

connmanctl>

What could be the reason? Note that I booted the Acer Aspire from DVD
making it even slower than usual.

Regards,
Florian
M
M
Mathieu Othacehe wrote on 12 Apr 2020 16:24
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
87wo6klr2u.fsf@gmail.com
Hello Florian,

Thanks for the analysis!

Toggle quote (3 lines)
> Error getting peers: The name net.connman was not provided by any .service filesError getting technologies: The name net.connman was not provided by any .service files
> connmanctl>

This is the kind of error you get when connmanctl cannot reach connmand
(its daemon). Could you try to see if its around or why it died, maybe
in the system logs?

Mathieu
P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 17:14
(name . Mathieu Othacehe)(address . m.othacehe@gmail.com)(address . 40572@debbugs.gnu.org)
20200412151454.m5xsma6ckltoqsb5@pelzflorian.localdomain
On Sun, Apr 12, 2020 at 04:24:41PM +0200, Mathieu Othacehe wrote:
Toggle quote (14 lines)
>
> Hello Florian,
>
> Thanks for the analysis!
>
> > Error getting peers: The name net.connman was not provided by any .service filesError getting technologies: The name net.connman was not provided by any .service files
> > connmanctl>
>
> This is the kind of error you get when connmanctl cannot reach connmand
> (its daemon). Could you try to see if its around or why it died, maybe
> in the system logs?
>
> Mathieu

My old and slow Uniwill has the same issue, so I look there. I wait a
medium time for the installer to start. I continue until the error.
I press <F3> and enter connmanctl and get the same error.

ps -Af | less

shows

/gnu/store/…-connman-1.38/sbin/connman-vpnd -n

running. It is the only process whose name contains connman.

I run “herd status” (output attached), which I should have checked
earlier. It shows networking and wpa-supplicant not running.

I also attach dmesg and /var/log/messages. But I do not see the
reason yet. I keep looking.

Regards,
Florian
Started:
+ console-font-tty1
+ console-font-tty2
+ console-font-tty3
+ console-font-tty4
+ console-font-tty5
+ console-font-tty6
+ dbus-system
+ file-system-/dev/pts
+ file-system-/dev/shm
+ file-system-/gnu/store
+ file-system-/tmp
+ file-systems
+ gpm
+ guix-daemon
+ loopback
+ nscd
+ root
+ root-file-system
+ syslogd
+ term-tty1
+ term-tty2
+ term-tty3
+ term-tty4
+ term-tty5
+ term-tty6
+ udev
+ user-file-systems
+ user-processes
+ virtual-terminal
Stopped:
- cow-store
- networking
- ssh-daemon
- wpa-supplicant
One-shot:
* host-name
* user-homes
* uvesafb
Attachment: dmesg
Attachment: varlogmessages
P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 17:41
(name . Mathieu Othacehe)(address . m.othacehe@gmail.com)(address . 40572@debbugs.gnu.org)
20200412154122.wcgk4fvgjcxhddnr@pelzflorian.localdomain
Manually running

herd start wpa-supplicant
herd start networking

makes the installer work.

I will look at old bug reports.
P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 18:34
(name . Mathieu Othacehe)(address . m.othacehe@gmail.com)(address . 40572@debbugs.gnu.org)
20200412163443.ua2zmki44ael3gso@pelzflorian.localdomain
On Sun, Apr 12, 2020 at 05:41:22PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (9 lines)
> Manually running
>
> herd start wpa-supplicant
> herd start networking
>
> makes the installer work.
>
> I will look at old bug reports.

I wonder if this could be related to
https://issues.guix.gnu.org/issue/35550. Maybe my move to ever more
ancient and slow computers from ebay booted via DVD resurrected the
issue. This is not what I hoped for at Easter.

Back then Ludo wrote
Toggle quote (14 lines)
> I’ve tried running:
>
> pkill wpa_supplicant
> while herd start wpa-supplicant ; do herd stop wpa-supplicant ; done
>
> and that always works (i.e., no “failed to start ‘wpa-supplicant’”
> error.)
>
> If I run in parallel this:
>
> while touch /var/run/wpa_supplicant.pid ; do : ; done
>
> then it occasionally “fails to start”, but not very often.

No wpa_supplicant is running. I run the above loops in two TTYs.
But I see no errors.

I will try make a new image with USB instead of DVD.

Regards,
Florian
D
D
Danny Milosavljevic wrote on 12 Apr 2020 18:52
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
20200412185234.189f5199@scratchpost.org
Hi Florian,

could you check

strace -s 0 -p 301

?

It should attach to the running connman-vpnd and you'll get which command it's
currently in. If it's idle, it should be some "accept" call.

If strace is not available, install package "strace" (if that's possible
without networking :P).

Toggle quote (2 lines)
>Can't load /gnu/store/vclzrvbxac8ipc8g1ncq5gjjj8gdvxw3-connman-1.38/lib/connman/plugins-vpn/wireguard.so: /gnu/store/vclzrvbxac8ipc8g1ncq5gjjj8gdvxw3-connman-1.38/lib/connman/plugins-vpn/wireguard.so: undefined symbol: __vpn_ipconfig_foreach

No idea how that can happen...

I've checked guix master connman and there,

$ objdump -t /gnu/store/k6iw9mhqjkhq7626a7sgfn0qpg9m73qh-connman-1.38/sbin/connman-vpnd |grep __vpn_ipconfig_foreach
0000000000418480 l F .text 000000000000006b __vpn_ipconfig_foreach

... so that symbol is indeed defined.

And wireguard.so has:
0000000000000000 *UND* 0000000000000000 __vpn_ipconfig_foreach

But that should be resolved by the loader since wireguard.so is loaded by
connman-vpnd--which has the definition of the symbol.

And I've checked the same from the installer iso image:

connman-vpnd:
0000000000418480 l F .text 000000000000006b __vpn_ipconfig_foreach

wireguard.so:
0000000000000000 *UND* 0000000000000000 __vpn_ipconfig_foreach

The small "l" means "local". That's probably not good.

I've also tried a small program of my own:

$ cat a1.c
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

void foo(void) {
exit(1);
}
int main() {
if (dlopen("li.so", RTLD_NOW))
return 0;
else {
fprintf(stderr, "error: %s\n", dlerror());
return 1;
}
}

$ cat li.c
void foo(void);

void bar(void) {
foo();
}
$ gcc -fPIC -shared -o li.so li.c
$ gcc -o a1 -Wl,--export-dynamic a1.c -ldl
$ objdump -t a.out |grep foo
00000000004011a0 g F .text 000000000000000e foo
$ objdump -t li.so |grep foo
0000000000000000 *UND* 0000000000000000 foo
$ ./a.out && echo ok
ok

So it should work?!

Since you didn't try to use wireguard, it's probably unrelated--but maybe it's
a follow-up error to that one.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl6TR1IACgkQ5xo1VCww
uqWaZQf/cRbSMrzfx7iR7Z7cKy7XroMmQAYAYLcyhi4X8xvofUO0nB/WIZ966m5N
meAzqbvIAh/u6bGumeVh2WpPtaXc/rI+XOWhRKoQc0f++XQCdiv4hfY0DiIQBHk0
nkPurRwhqPX+DiLN0b3nr73ZReCT599/Zggmgi2TbT3QvZnlNE8tkSfOCR666jhX
1CAYiWtxW5QWuliR8X2hFD8wRseCAzMG5OnEh+2tVG0D/dJqdbGUyW2+rRNUcovE
GV/Dj9vBgnNMqfG8kF9znEDkun176viWXETgEkr4cOxnPdeCOQlFt1moYfkpdX/r
X5VfwYyixRkanzc2tBZBAxzRQ3MJ4A==
=+kW4
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 12 Apr 2020 18:56
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
20200412185656.5658765e@scratchpost.org
Connman uses:

$ cat src/connman.ver
{
global:
connman_*;
g_dbus_*;
local:
*;
};

So indeed __vpn_ipconfig_foreach is local.

I guess wireguard.so shouldn't try to use it then and it's a bug there?

It's called explicitly in vpn/plugins/wireguard.c get_ifname, so that's not
going to work, is it?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl6TSFgACgkQ5xo1VCww
uqUmIgf/Y7mSBMyobq1tpwOVSQqv0TYWyenhnoI5Ic6VkXHh5oKzHgj7075eb6+x
9q3kHeUk7NdoSdSPvIOJU90nZZhA2SipieeHyxRSoqXAgzpi84LSY8wDqBfirBsy
lCUF3H4UyEwHIt5t4O9lAFljpFkab87t7g52JqeRfcCvRvRmcJ/bCGmE9gk2Kgbk
eW1vkZ5if9YozPzNnAzxdCpIvYfVdMyGJCa+CtKNxPHMVVdXGZaeLwZ1F51CaKVF
MXB8Rl+v/BqhrflrTz/IjJAY6YwoglKznMcNeazNrzqhhKwdQiCDrx/OHMJ5CzY8
oenIp+tYekLmhEvsltaJi5ZoC/BpxA==
=HJyB
-----END PGP SIGNATURE-----


P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 19:09
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200412170946.cpnyvk44dayi5nua@pelzflorian.localdomain
On Sun, Apr 12, 2020 at 06:52:34PM +0200, Danny Milosavljevic wrote:
Toggle quote (11 lines)
> Hi Florian,
>
> could you check
>
> strace -s 0 -p 301
>
> ?
>
> It should attach to the running connman-vpnd and you'll get which command it's
> currently in. If it's idle, it should be some "accept" call.

I rebooted from DVD. No wpa-supplicant or networking is running like
last time but now there is no connman-vpnd. Sorry. I believe this is
an issue but a different one.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 19:28
(name . Mathieu Othacehe)(address . m.othacehe@gmail.com)(address . 40572@debbugs.gnu.org)
20200412172829.wgkibk5evjzzve62@pelzflorian.localdomain
On Sun, Apr 12, 2020 at 06:34:43PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (2 lines)
> I will try make a new image with USB instead of DVD.

Via USB everything works on my Acer Aspire. It seems only DVDs have
the issue, maybe because they are so slow. My Uniwill never could
boot anythings but Windows from USB, so I can only try DVD there.

Regards,
Florian
M
M
Mathieu Othacehe wrote on 12 Apr 2020 19:37
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
87v9m4hafw.fsf@gmail.com
Toggle quote (6 lines)
> Apr 12 16:35:35 localhost vmunix: [ 48.124498] uvesafb: fb0: VESA VGA frame buffer device
> Apr 12 16:35:35 localhost vmunix: [ 48.129053] shepherd[1]: Service uvesafb has been started.
> Apr 12 16:35:40 localhost shepherd[1]: Service dbus-system could not be started.
> Apr 12 16:35:40 localhost shepherd[1]: Service networking depends on dbus-system.
> Apr 12 16:35:40 localhost shepherd[1]: Service networking could not be started.

I suspect something is going wrong here. The 'networking' service is
indeed not started, probably because 'dbus-system' could not be started.

What is reported by 'herd start dbus-system' command? You could also try
to start dbus-daemon by hand with the arguments provided in (gnu
services dbus).

Thanks,

Mathieu
P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 19:45
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)(address . 40572@debbugs.gnu.org)
20200412174553.gjeu6qheqec4ese2@pelzflorian.localdomain
On Sun, Apr 12, 2020 at 07:09:46PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (16 lines)
> On Sun, Apr 12, 2020 at 06:52:34PM +0200, Danny Milosavljevic wrote:
> > Hi Florian,
> >
> > could you check
> >
> > strace -s 0 -p 301
> >
> > ?
> >
> > It should attach to the running connman-vpnd and you'll get which command it's
> > currently in. If it's idle, it should be some "accept" call.
>
> I rebooted from DVD. No wpa-supplicant or networking is running like
> last time but now there is no connman-vpnd. Sorry. I believe this is
> an issue but a different one.

I rebooted from USB and everything works. Now I see no connman-vpnd
but instead connmand -n -r --noplugin=vpn

This does not make sense. Why is vpn disabled now but not before?

I attach ps -Af and /var/log/messages
Attachment: psaf
Attachment: usb-varlogmessages
P
P
pelzflorian (Florian Pelz) wrote on 12 Apr 2020 20:43
(name . Mathieu Othacehe)(address . m.othacehe@gmail.com)(address . 40572@debbugs.gnu.org)
20200412184329.itthwjjjwprhhhbg@pelzflorian.localdomain
On Sun, Apr 12, 2020 at 07:37:39PM +0200, Mathieu Othacehe wrote:
Toggle quote (11 lines)
>
> > Apr 12 16:35:35 localhost vmunix: [ 48.124498] uvesafb: fb0: VESA VGA frame buffer device
> > Apr 12 16:35:35 localhost vmunix: [ 48.129053] shepherd[1]: Service uvesafb has been started.
> > Apr 12 16:35:40 localhost shepherd[1]: Service dbus-system could not be started.
> > Apr 12 16:35:40 localhost shepherd[1]: Service networking depends on dbus-system.
> > Apr 12 16:35:40 localhost shepherd[1]: Service networking could not be started.
>
> I suspect something is going wrong here. The 'networking' service is
> indeed not started, probably because 'dbus-system' could not be started.
>

That may be the case. Note dbus-system was started later on, but
networking was not.

Toggle quote (8 lines)
> What is reported by 'herd start dbus-system' command? You could also try
> to start dbus-daemon by hand with the arguments provided in (gnu
> services dbus).
>
> Thanks,
>
> Mathieu

herd start dbus-system works flawlessly when the system has started.
I cannot try running “herd start dbus-system” earlier because I do not
have a login prompt yet. I will look at (gnu services dbus) later.

Regards,
Florian
D
D
Danny Milosavljevic wrote on 12 Apr 2020 20:47
20200412204725.263f5af3@scratchpost.org
I mean as long as we don't offer any VPN configuration in the installer we
could just disable VPN in connman-configuration (disable-vpn?) in the
installation image.

What doesn't get loaded can't break either :P
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl6TYlIACgkQ5xo1VCww
uqX7hQgAppwDxw05aAMlyj8EOJkIr6dXHvTXVJju3EQnxi9wMvlUDLJWauWE4u08
jao1YaX9yzRQ7mSSWMmNZFvuwi9MmnUbRcgTueTCKBLR1u5/GxVFN28CA9zTwhiO
/MXC8oUw/foCXxYkDDtDux/bkGwNDZ3zhobdK02sxlvqrbHkTBI6BbeMGtvpciLV
pUPoTPcrV5HNip+iluF5vMFi/7+VExI09Hzj+J71drGWMetwBph0HlytrZwtouv6
Hlws2xH2CtYL17pk6ORTADTriG/3HdgM+Z9j9JQTDLjOfW9q1ahmtDnohGvpw3wa
iB/OFgvTaX4I3i8HeL2vyQE5cvdybg==
=egiE
-----END PGP SIGNATURE-----


P
P
pelzflorian (Florian Pelz) wrote on 13 Apr 2020 09:36
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)(address . 40572@debbugs.gnu.org)
20200413073609.4767ispa3xfc5fhx@pelzflorian.localdomain
On Sun, Apr 12, 2020 at 08:47:46PM +0200, Danny Milosavljevic wrote:
Toggle quote (4 lines)
> I mean as long as we don't offer any VPN configuration in the installer we
> could just disable VPN in connman-configuration (disable-vpn?) in the
> installation image.

disable-vpn? is already there.

(service connman-service-type
(connman-configuration
(disable-vpn? #t)))

Regards,
Florian
D
D
Danny Milosavljevic wrote on 13 Apr 2020 13:20
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
20200413132039.6cf2a0ce@scratchpost.org
Maybe it was started by dbus activation of the connman service.

In that case, the flag corresponding to disable-vpn? won't be passed since we
manually pass it to the shepherd service only.

It could be that then it tries to start up vpn but that dies because of the
unknown symbol and then the connmand also dies maybe?

When then started up by the shepherd service later (when you manually invoke
herd start networking) it will get the flag just fine and start up just fine
without vpn.

In short, it could be a timing issue.

If you invoke connmanctl without having connmand running already then dbus
activation would actually invoke it automatically (without flag), if there's
a service file for "net.connman".

But the error message from your other computer suggests that there is no such
service file, so it shouldn't have.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl6USwgACgkQ5xo1VCww
uqXLSQf+KvTOyrdhjgwOB3JSEuKx0kNyzI564PK+SjCsUvdwNiGRawOe+dBXAkCz
KUQ/KKsJ5lJxwMP7RduwQSGpnK0rSHcGjo3f+9VGVkt1sdNOT/rFwl1qsT94zjl/
uLYFHHnyHewMLdqrLrd7funxGzkRLbH7TKdmgO75yIn2XyBp+uKPP6ZaiSnx8oLR
Qq+wxZs66ayfoq+FNdig24iMoQ1/nAHPILQIbLh6mCJpUeOEuZOawW5lirK7Rywg
u6hJhLdgOHJtjs2gUXFpjPapP2vj8mdt/rnKRpcn2sjUobpQpXvT/ouhw1T8CGbY
6MEBYrvXDm22spKiUKFQNbdt3MbZgQ==
=lkAU
-----END PGP SIGNATURE-----


P
P
pelzflorian (Florian Pelz) wrote on 13 Apr 2020 13:57
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200413115721.mjdrelwvrdvhm7zc@pelzflorian.localdomain
On Mon, Apr 13, 2020 at 12:42:20PM +0200, Ludovic Courtès wrote:
Toggle quote (21 lines)
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>
> > On Sun, Apr 12, 2020 at 07:37:39PM +0200, Mathieu Othacehe wrote:
> >>
> >> > Apr 12 16:35:35 localhost vmunix: [ 48.124498] uvesafb: fb0: VESA VGA frame buffer device
> >> > Apr 12 16:35:35 localhost vmunix: [ 48.129053] shepherd[1]: Service uvesafb has been started.
> >> > Apr 12 16:35:40 localhost shepherd[1]: Service dbus-system could not be started.
> >> > Apr 12 16:35:40 localhost shepherd[1]: Service networking depends on dbus-system.
> >> > Apr 12 16:35:40 localhost shepherd[1]: Service networking could not be started.
> >>
> >> I suspect something is going wrong here. The 'networking' service is
> >> indeed not started, probably because 'dbus-system' could not be started.
> >>
> >
> > That may be the case. Note dbus-system was started later on, but
> > networking was not.
>
> Did /var/log/messages have additional hints as to why dbus-system was
> not started?
>

Nothing. However, by now I burned a new image to DVD with the
attached patch. But I realize my mistake now; due to dbus-daemon
eventually starting after networking failed, the logs are lost.

I inserted the DVD in an external drive for my slightly faster Macbook
(but the DVD drive seems much slower) and (realizing unlike USB I
needed to add nomodeset) I got the same error. So the error occurs
when booting any computer from DVD, it seems.

I attach the unhelpful logs, even though I believe the dbus-daemon log
got overwritten by a subsequent successful start …

I do not know how to keep earlier logs. Maybe I need to pipe it in a
custom gexp?

Regards,
Florian
From a5063dd623a0090ec91ff3f7ce2ebc7cdf28e5d6 Mon Sep 17 00:00:00 2001
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Mon, 13 Apr 2020 08:01:57 +0200
Subject: [PATCH] [add dbus debug flags]

---
gnu/packages/glib.scm | 3 +++
gnu/services/dbus.scm | 5 ++++-
2 files changed, 7 insertions(+), 1 deletion(-)

Toggle diff (32 lines)
diff --git a/gnu/packages/glib.scm b/gnu/packages/glib.scm
index c98d8c18f5..227851f100 100644
--- a/gnu/packages/glib.scm
+++ b/gnu/packages/glib.scm
@@ -105,6 +105,9 @@
;; Install the session bus socket under /tmp.
"--with-session-socket-dir=/tmp"
+ ;; Debugging:
+ "--enable-verbose-mode"
+
;; Use /etc/dbus-1 for system-wide config.
;; Look for configuration file under
;; /etc/dbus-1. This is notably required by
diff --git a/gnu/services/dbus.scm b/gnu/services/dbus.scm
index 7b3c8100e2..1270883c4a 100644
--- a/gnu/services/dbus.scm
+++ b/gnu/services/dbus.scm
@@ -189,7 +189,10 @@ includes the @code{etc/dbus-1/system.d} directories of each package listed in
(requirement '(user-processes syslogd))
(start #~(make-forkexec-constructor
(list (string-append #$dbus "/bin/dbus-daemon")
- "--nofork" "--system" "--syslog-only")
+ "--nofork" "--system" "--syslog")
+ #:environment-variables
+ (cons "DBUS_VERBOSE=1" (default-environment-variables))
+ #:log-file "/var/log/dbus-daemon"
#:pid-file "/var/run/dbus/pid"))
(stop #~(make-kill-destructor)))))))
--
2.26.0
L
L
Ludovic Courtès wrote on 13 Apr 2020 15:18
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87wo6jjzh6.fsf@gnu.org
Hi Florian,

(+Cc: Mathieu, who’s probably more familiar with Connman.)

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (4 lines)
> Nothing. However, by now I burned a new image to DVD with the
> attached patch. But I realize my mistake now; due to dbus-daemon
> eventually starting after networking failed, the logs are lost.

I don’t think so, because currently shepherd opens #:log-file in append
mode. So I believe everything is there.

Nothing caught my eye in the log (but there’s a lot of stuff in there),
except (but I don’t know if it’s relevant):

229: 0x7f98cd08d400: 1586778311.080442 [connection.c(2626):bus_transaction_send_error_reply] Sending error reply org.freedesktop.DBus.Error.ServiceUnknown "The name net.connman was not provided by any .service files"

And indeed, there’s no ‘.service’ file for that:

Toggle snippet (4 lines)
$ find $(guix build connman) -name \*.service
/gnu/store/vclzrvbxac8ipc8g1ncq5gjjj8gdvxw3-connman-1.38/share/dbus-1/system-services/net.connman.vpn.service

The attach patch changes connman so that ‘net.connman.service’ is
installed.

Does it make any difference?

Thanks,
Ludo’.
Toggle diff (33 lines)
diff --git a/gnu/packages/connman.scm b/gnu/packages/connman.scm
index 5f17a0ea95..4a5c4e6d3c 100644
--- a/gnu/packages/connman.scm
+++ b/gnu/packages/connman.scm
@@ -4,6 +4,7 @@
;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org>
;;; Copyright © 2017 Ricardo Wurmus <rekado@elephly.net>
;;; Copyright © 2017, 2018, 2019, 2020 Tobias Geerinckx-Rice <me@tobias.gr>
+;;; Copyright © 2020 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -50,8 +51,18 @@
(method url-fetch)
(uri (string-append "mirror://kernel.org/linux/network/connman/"
"connman-" version ".tar.xz"))
- (sha256
- (base32 "0awkqigvhwwxiapw0x6yd4whl465ka8a4al0v2pcqy9ggjlsqc6b"))))
+ (sha256
+ (base32 "0awkqigvhwwxiapw0x6yd4whl465ka8a4al0v2pcqy9ggjlsqc6b"))
+ (modules '((guix build utils)))
+ (snippet
+ '(begin
+ ;; Install the D-Bus 'net.connman.service' file. Note that we're
+ ;; patching something that is within an "if VPN", but that's OK
+ ;; since we build VPN support.
+ (substitute* "Makefile.in"
+ (("dbusservice_DATA =")
+ "dbusservice_DATA = src/net.connman.service "))
+ #t))))
(build-system gnu-build-system)
(arguments
`(#:configure-flags
L
L
Ludovic Courtès wrote on 13 Apr 2020 16:13
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87r1wriido.fsf@gnu.org
Hi Danny,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

Toggle quote (5 lines)
> Maybe it was started by dbus activation of the connman service.
>
> In that case, the flag corresponding to disable-vpn? won't be passed since we
> manually pass it to the shepherd service only.

Yes, that’s a problem. Perhaps we should generate a net.connman.service
file that takes ‘disable-vpn?’ into account.

Toggle quote (16 lines)
> It could be that then it tries to start up vpn but that dies because of the
> unknown symbol and then the connmand also dies maybe?
>
> When then started up by the shepherd service later (when you manually invoke
> herd start networking) it will get the flag just fine and start up just fine
> without vpn.
>
> In short, it could be a timing issue.
>
> If you invoke connmanctl without having connmand running already then dbus
> activation would actually invoke it automatically (without flag), if there's
> a service file for "net.connman".
>
> But the error message from your other computer suggests that there is no such
> service file, so it shouldn't have.

Right. The patch I sent to Florian adds ‘net.connman.service’. With
that file, connmand should at least be guaranteed to start when we need
it, if shepherd hasn’t started it yet.

But then, if connmand is started via D-Bus activation, the ‘networking’
service is eventually marked as failing to start because connmand is
already running.

So… not sure what’s best.

How frequent is this issue? Should we postpone a fix?

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 13 Apr 2020 17:03
(name . Ludovic Courtès)(address . ludo@gnu.org)
20200413150347.kaapjhmdwbhkij7t@pelzflorian.localdomain
On Mon, Apr 13, 2020 at 03:18:29PM +0200, Ludovic Courtès wrote:
Toggle quote (5 lines)
> The attach patch changes connman so that ‘net.connman.service’ is
> installed.
>
> Does it make any difference?

Now with the patch DVD install from Macbook works fine. I will try
the same patched DVD on the other two old laptops now. Thank you all
of you, Danny, Matthieu, Ludo!

I cannot judge if the patch is right for master, but my issue here
affects DVD installation on at least my three machines here.

I do not know if the Wifi issue
is related. Half a month ago a friend could not connect to Wifi from
the installer on an HP laptop although Ethernet (maybe after a reboot)
worked fine. But I have no data and no access to that machine
currently, so I cannot try the connman patch there. I only have
Ethernet here.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 13 Apr 2020 17:52
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200413155202.rjmnp4v2vnrpq3vc@pelzflorian.localdomain
On Mon, Apr 13, 2020 at 05:03:47PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (10 lines)
> On Mon, Apr 13, 2020 at 03:18:29PM +0200, Ludovic Courtès wrote:
> > The attach patch changes connman so that ‘net.connman.service’ is
> > installed.
> >
> > Does it make any difference?
>
> Now with the patch DVD install from Macbook works fine. I will try
> the same patched DVD on the other two old laptops now. Thank you all
> of you, Danny, Matthieu, Ludo!

Success! You have cured DVD install on all my machines.

Both

connmand -n
connman-vpnd

are running according to ps -Af.

“herd status” says

Stopped:
- console-font-tty1
- cow-store
- gpm
- networking
- ssh-daemon
- wpa-supplicant

I attach all logs if you decide this issue is not done yet, because
actually disable-vpn? is set. Otherwise please push & mark as done.

The log from my patch to dbus-daemon says:

254: 0x7f1fcbc8c400: 1586791314.062311 [activation.c(424):update_desktop_file_entry] Name of "/etc/dbus-1/system-services/net.connman.service" is as expected
254: 0x7f1fcbc8c400: 1586791314.062355 [activation.c(519):update_desktop_file_entry] Added "net.connman" to list of services

and later

320: 0x7fb96195f400: 1586791657.019603 [connection.c(2626):bus_transaction_send_error_reply] Sending error reply org.freedesktop.DBus.Error.NameHasNoOwner "Could not get owner of name 'net.connman.vpn': no such name"

and later

320: 0x7fb96195f400: 1586791657.133488 [connection.c(2626):bus_transaction_send_error_reply] Sending error reply org.freedesktop.DBus.Error.ServiceUnknown "The name fi.w1.wpa_supplicant1 was not provided by any .service files"


Regards,
Florian
Started:
+ console-font-tty2
+ console-font-tty3
+ console-font-tty4
+ console-font-tty5
+ console-font-tty6
+ dbus-system
+ file-system-/dev/pts
+ file-system-/dev/shm
+ file-system-/gnu/store
+ file-system-/tmp
+ file-systems
+ guix-daemon
+ loopback
+ nscd
+ root
+ root-file-system
+ syslogd
+ term-tty1
+ term-tty2
+ term-tty3
+ term-tty4
+ term-tty5
+ term-tty6
+ udev
+ user-file-systems
+ user-processes
+ virtual-terminal
Stopped:
- console-font-tty1
- cow-store
- gpm
- networking
- ssh-daemon
- wpa-supplicant
One-shot:
* host-name
* user-homes
* uvesafb
L
L
Ludovic Courtès wrote on 13 Apr 2020 19:11
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
871rorgvk8.fsf@gnu.org
Hi Florian,

It seems to me that the connman.service patch papers over the actual
problem:

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (5 lines)
> Apr 13 17:21:29 localhost vmunix: [ 56.102520] shepherd[1]: Service uvesafb has been started.
> Apr 13 17:21:33 localhost shepherd[1]: Service dbus-system could not be started.
> Apr 13 17:21:33 localhost shepherd[1]: Service networking depends on dbus-system.
> Apr 13 17:21:33 localhost shepherd[1]: Service networking could not be started.

[...]

Toggle quote (6 lines)
> Apr 13 17:21:38 localhost shepherd[1]: Service dbus-system could not be started.
> Apr 13 17:21:43 localhost shepherd[1]: Service dbus-system could not be started.
> Apr 13 17:21:43 localhost shepherd[1]: Service wpa-supplicant depends on dbus-system.
> Apr 13 17:21:43 localhost shepherd[1]: Service wpa-supplicant could not be started.
> Apr 13 17:21:43 localhost shepherd[1]: Service loopback has been started.

[...]

Toggle quote (7 lines)
> Apr 13 17:21:55 localhost shepherd[1]: Service dbus-system could not be started.
> Apr 13 17:21:55 localhost shepherd[1]: Service term-tty1 depends on dbus-system.
> Apr 13 17:21:55 localhost shepherd[1]: Service term-tty1 could not be started.
> Apr 13 17:21:55 localhost shepherd[1]: Service console-font-tty1 depends on term-tty1.
> Apr 13 17:21:55 localhost shepherd[1]: Service console-font-tty1 could not be started.
> Apr 13 17:21:55 localhost shepherd[1]: Service host-name has been started.

[...]

Toggle quote (17 lines)
> Apr 13 17:22:20 localhost shepherd[1]: Service console-font-tty2 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service host-name has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service term-tty3 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service console-font-tty3 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service host-name has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service term-tty4 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service console-font-tty4 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service host-name has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service term-tty5 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service console-font-tty5 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service host-name has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service term-tty6 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service console-font-tty6 has been started.
> Apr 13 17:22:20 localhost shepherd[1]: Service guix-daemon has been started.
> Apr 13 17:22:21 localhost shepherd[1]: Service dbus-system has been started.
> Apr 13 17:22:21 localhost shepherd[1]: Service term-tty1 has been started.

In my VM, ‘dbus-system’ starts just fine the first time, but I also
noticed that ‘host-name’ is started before ‘dbus-system’.

So, could you move away the connman.service patch and try instead the
attached patch. It’ll create /dbus.trace.* files, which should allow us
to see how dbus-daemon stopped.

TIA!

Ludo’.
Toggle diff (30 lines)
diff --git a/gnu/services/dbus.scm b/gnu/services/dbus.scm
index 7b3c8100e2..a291ac402e 100644
--- a/gnu/services/dbus.scm
+++ b/gnu/services/dbus.scm
@@ -25,6 +25,7 @@
#:use-module ((gnu packages glib) #:select (dbus))
#:use-module (gnu packages polkit)
#:use-module (gnu packages admin)
+ #:use-module (gnu packages linux)
#:use-module (guix gexp)
#:use-module ((guix packages) #:select (package-name))
#:use-module (guix records)
@@ -186,9 +187,15 @@ includes the @code{etc/dbus-1/system.d} directories of each package listed in
(list (shepherd-service
(documentation "Run the D-Bus system daemon.")
(provision '(dbus-system))
- (requirement '(user-processes syslogd))
+ (requirement '(user-processes syslogd)) ;<- add 'host-name' and/or 'nscd'
(start #~(make-forkexec-constructor
- (list (string-append #$dbus "/bin/dbus-daemon")
+ (list #$(file-append strace "/bin/strace")
+ "-o"
+ (let ((t (gettimeofday)))
+ (format #f "/dbus.trace.~a.~a"
+ (car t) (cdr t)))
+ "-s" "500" "-f"
+ (string-append #$dbus "/bin/dbus-daemon")
"--nofork" "--system" "--syslog-only")
#:pid-file "/var/run/dbus/pid"))
(stop #~(make-kill-destructor)))))))
P
P
pelzflorian (Florian Pelz) wrote on 13 Apr 2020 20:14
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200413181415.6pgj63zcew3bkkma@pelzflorian.localdomain
On Mon, Apr 13, 2020 at 07:11:19PM +0200, Ludovic Courtès wrote:
Toggle quote (6 lines)
> So, could you move away the connman.service patch and try instead the
> attached patch. It’ll create /dbus.trace.* files, which should allow us
> to see how dbus-daemon stopped.
>
> TIA!

I will try ASAP now that I reinstalled. Expect results in 1 1/2 hours.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 13 Apr 2020 23:56
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200413215646.wduvdya6rqhr4h4r@pelzflorian.localdomain
On Mon, Apr 13, 2020 at 08:14:15PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (9 lines)
> On Mon, Apr 13, 2020 at 07:11:19PM +0200, Ludovic Courtès wrote:
> > So, could you move away the connman.service patch and try instead the
> > attached patch. It’ll create /dbus.trace.* files, which should allow us
> > to see how dbus-daemon stopped.
> >
> > TIA!
>
> I will try ASAP now that I reinstalled. Expect results in 1 1/2 hours.

Sorry it took so long. I had trouble getting GDM to work after using
guix from version-1.1.0 and needed GDM to run Brasero. Using the
older system-wide guix that came with the rc2 install helped. But the
trouble is possibly specific to my Macbook.

Here is the only trace file in /.

Regards,
Florian
L
L
Ludovic Courtès wrote on 14 Apr 2020 00:19
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
87ftd72flp.fsf@gnu.org
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (17 lines)
> On Mon, Apr 13, 2020 at 08:14:15PM +0200, pelzflorian (Florian Pelz) wrote:
>> On Mon, Apr 13, 2020 at 07:11:19PM +0200, Ludovic Courtès wrote:
>> > So, could you move away the connman.service patch and try instead the
>> > attached patch. It’ll create /dbus.trace.* files, which should allow us
>> > to see how dbus-daemon stopped.
>> >
>> > TIA!
>>
>> I will try ASAP now that I reinstalled. Expect results in 1 1/2 hours.
>
> Sorry it took so long. I had trouble getting GDM to work after using
> guix from version-1.1.0 and needed GDM to run Brasero. Using the
> older system-wide guix that came with the rc2 install helped. But the
> trouble is possibly specific to my Macbook.
>
> Here is the only trace file in /.

Apparently dbus-daemon started just fine right from the start. Problem
solved? :-) Or did connmand fail to start anyway?

If you still have the image around, could you try booting it several
time in the hope dbus-daemon will fail to start like you showed
previously?

Thanks a *lot* for all the testing!

I think we’ll delay the release for one more day so we can investigate
this and the graphics card issue.

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 14 Apr 2020 00:24
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200413222403.spsdf6bx2j63tmrt@pelzflorian.localdomain
On Mon, Apr 13, 2020 at 07:11:19PM +0200, Ludovic Courtès wrote:
Toggle quote (3 lines)
> In my VM, ‘dbus-system’ starts just fine the first time, but I also
> noticed that ‘host-name’ is started before ‘dbus-system’.

Oh this time with the logs I just sent I had not done the installer
steps yet, so hostname had not been started. Sorry!

Actually now after choosing any hostname the installer starts again
from the beginning. But now “herd start wpa-supplicant” fails too.

Here are new logs after the installer steps.

(Could we go back to the paper-over patch?)

Regards,
Florian
Started:
+ console-font-tty1
+ console-font-tty2
+ console-font-tty3
+ console-font-tty4
+ console-font-tty5
+ console-font-tty6
+ dbus-system
+ file-system-/dev/pts
+ file-system-/dev/shm
+ file-system-/gnu/store
+ file-system-/tmp
+ file-systems
+ guix-daemon
+ loopback
+ nscd
+ root
+ root-file-system
+ syslogd
+ term-tty1
+ term-tty2
+ term-tty3
+ term-tty4
+ term-tty5
+ term-tty6
+ udev
+ user-file-systems
+ user-processes
+ virtual-terminal
Stopped:
- cow-store
- gpm
- networking
- ssh-daemon
- wpa-supplicant
One-shot:
* host-name
* user-homes
* uvesafb
Attachment: psaf
Attachment: varlogmessages
Attachment: dmesg
L
L
Ludovic Courtès wrote on 14 Apr 2020 00:25
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
8736972fc1.fsf@gnu.org
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (2 lines)
> Here is the only trace file in /.

Argh, I just realized that the patch I sent you would actually keep
using the same file name for strace, even when dbus-daemon restarts.
:-/ Apologies for the mistake.

The attached patch should do the right thing: it will append the
dbus-daemon PID to /dbus.trace.

If you could test it, that’d be great. I guess I’ll owe you a couple of
beverages of your choice when we meet in real life!

Ludo’.
Toggle diff (27 lines)
diff --git a/gnu/services/dbus.scm b/gnu/services/dbus.scm
index 7b3c8100e2..7f506a16c5 100644
--- a/gnu/services/dbus.scm
+++ b/gnu/services/dbus.scm
@@ -25,6 +25,7 @@
#:use-module ((gnu packages glib) #:select (dbus))
#:use-module (gnu packages polkit)
#:use-module (gnu packages admin)
+ #:use-module (gnu packages linux)
#:use-module (guix gexp)
#:use-module ((guix packages) #:select (package-name))
#:use-module (guix records)
@@ -186,9 +187,12 @@ includes the @code{etc/dbus-1/system.d} directories of each package listed in
(list (shepherd-service
(documentation "Run the D-Bus system daemon.")
(provision '(dbus-system))
- (requirement '(user-processes syslogd))
+ (requirement '(user-processes syslogd)) ;<- add 'host-name' and/or 'nscd'
(start #~(make-forkexec-constructor
- (list (string-append #$dbus "/bin/dbus-daemon")
+ (list #$(file-append strace "/bin/strace")
+ "-o" "/dbus.trace"
+ "-s" "500" "-ff"
+ (string-append #$dbus "/bin/dbus-daemon")
"--nofork" "--system" "--syslog-only")
#:pid-file "/var/run/dbus/pid"))
(stop #~(make-kill-destructor)))))))
P
P
pelzflorian (Florian Pelz) wrote on 14 Apr 2020 02:43
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200414004344.fg7yio3frp534jih@pelzflorian.localdomain
On Tue, Apr 14, 2020 at 12:25:34AM +0200, Ludovic Courtès wrote:
Toggle quote (7 lines)
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> If you could test it, that’d be great. I guess I’ll owe you a couple of
> beverages of your choice when we meet in real life!
>
> Ludo’.
>

The installer crashed again after entering a newly invented hostname
“a” (perhaps it was already in use from my previous attempt?). But
later my normal hostname “florianmacbook” worked and network
technologies failed. The hostname command displays “gnu” now.

Here are the log files. From among the dbus trace files, only
dbus.trace.301 is different after the installer crashed (see “diff
logs/dbus.trace.301 logs/after-network-failed/dbus.trace.301”).
Though you may be more interested in early trace logs/dbus.trace.228.
But I’m going to sleep now. ;)

Regards,
Florian
Attachment: logs.tar.xz
L
L
Ludovic Courtès wrote on 14 Apr 2020 11:03
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
87v9m2zbes.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (11 lines)
> The installer crashed again after entering a newly invented hostname
> “a” (perhaps it was already in use from my previous attempt?). But
> later my normal hostname “florianmacbook” worked and network
> technologies failed. The hostname command displays “gnu” now.
>
> Here are the log files. From among the dbus trace files, only
> dbus.trace.301 is different after the installer crashed (see “diff
> logs/dbus.trace.301 logs/after-network-failed/dbus.trace.301”).
> Though you may be more interested in early trace logs/dbus.trace.228.
> But I’m going to sleep now. ;)

Uh, well deserved. :-)

The logs show very well what happened. From /var/log/messages
(stripped):

Toggle snippet (23 lines)
Apr 14 01:52:21 localhost vmunix: [ 12.733898] random: dbus-uuidgen: uninitialized urandom read (12 bytes read)
Apr 14 01:52:21 localhost vmunix: [ 27.690871] shepherd[1]: Service root has been started.
Apr 14 01:52:26 localhost shepherd[1]: Service dbus-system could not be started.
Apr 14 01:52:26 localhost shepherd[1]: Service networking depends on dbus-system.
Apr 14 01:52:26 localhost shepherd[1]: Service networking could not be started.
Apr 14 01:52:31 localhost shepherd[1]: Service dbus-system could not be started.
Apr 14 01:52:36 localhost shepherd[1]: Service dbus-system could not be started.
Apr 14 01:52:36 localhost shepherd[1]: Service wpa-supplicant depends on dbus-system.
Apr 14 01:52:36 localhost shepherd[1]: Service wpa-supplicant could not be started.
Apr 14 01:52:36 localhost shepherd[1]: Service loopback has been started.
Apr 14 01:52:41 localhost /gnu/store/bfvr3brh7f9dqh26jf49767ypbanqycm-gpm-1.20.7/sbin/gpm[258]: *** info [daemon/startup.c(136)]:
Apr 14 01:52:41 localhost /gnu/store/bfvr3brh7f9dqh26jf49767ypbanqycm-gpm-1.20.7/sbin/gpm[258]: Started gpm successfully. Entered daemon mode.
Apr 14 01:52:41 localhost shepherd[1]: Service gpm could not be started.
Apr 14 01:52:43 localhost dbus-daemon[244]: Failed to start message bus: Failed to bind socket "/var/run/dbus/system_bus_socket": Address already in use
Apr 14 01:52:45 localhost vmunix: [ 78.947812] mc: Linux media interface: v0.10
Apr 14 01:52:46 localhost shepherd[1]: Service dbus-system could not be started.
Apr 14 01:52:46 localhost shepherd[1]: Service term-tty1 depends on dbus-system.
Apr 14 01:52:46 localhost shepherd[1]: Service term-tty1 could not be started.
Apr 14 01:52:50 localhost dbus-daemon[262]: Failed to start message bus: Failed to open "/var/run/dbus/pid": File exists
Apr 14 01:53:14 localhost shepherd[1]: Service dbus-system has been started.
Apr 14 01:53:14 localhost shepherd[1]: Service term-tty1 has been started.

That alone shows the problem: dbus-system was initially wrongfully
considered as “not started”, thus subsequent attempts to start it result
in EADDRINUSE. This is confirmed by strace logs:

228 -> starts fine
openat(AT_FDCWD, "/var/run/dbus/pid", O_WRONLY|O_CREAT|O_EXCL, 0644) = 5
fcntl(5, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE)
fstat(5, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
write(5, "228\n", 4) = 4

244 ->
sendto(3, "<28>Apr 14 01:52:43 dbus-daemon[244]: Failed to start message bus: Failed to bind socket \"/var/run/dbus/system_bus_socket\": Address already in use", 146, MSG_NOSIGNAL, NULL, 0) = 146
exit_group(1) = ?

262 ->
sendto(3, "<28>Apr 14 01:52:50 dbus-daemon[262]: Failed to start message bus: Failed to open \"/var/run/dbus/pid\": File exists", 114, MSG_NOSIGNAL, NULL, 0) = 114
exit_group(1) = ?

301 -> starts fine (did 228 die in the meantime? go figure)
openat(AT_FDCWD, "/var/run/dbus/pid", O_WRONLY|O_CREAT|O_EXCL, 0644) = 5
fcntl(5, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE)
fstat(5, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
write(5, "301\n", 4) = 4

Everything seems to be extremely slow on this machine (booting from an
actual DVD, right?). For example:

Toggle snippet (9 lines)
[ 27.690871] shepherd[1]: Service root has been started.
[ 37.063759] shepherd[1]: starting services...

[...]

[ 39.969589] shepherd[1]: Service host-name has been started.
[ 41.959013] shepherd[1]: Service user-homes has been started.

That’s 27s before shepherd is started, and another 10s before “starting
services” (the only thing that happens in between in shepherd.conf is
loading .go files for the services.)

My guess is that cold-cache I/O is very slow. A plausible scenario is
that loading ‘dbus-daemon’ the first time takes several seconds;
dbus-daemon has enough time to fork, but it does not produce its PID
file until after the 5s ‘%pid-file-timeout’ has timeout has expired.
Thus, shepherd marks it as “failed to start” but it’s actually running.

To confirm this hypothesis, we need to run “strace -t”, see below (sorry
for not thinking about doing it!). If you can try again with the patch
below, that’s awesome. Then we’ll compare the timestamps in
/var/log/messages and those in the strace log.

If that’s confirmed, we can work around it locally by passing:

#:pid-file-timeout 15

to ‘make-forkexec-constructor’ for dbus-daemon or, alternately, setting
‘%pid-file-timeout’ globally from shepherd.conf.

You were right that it relates to
discussion with Konrad about the best way to make this configurable.

Ludo’.
Toggle diff (28 lines)
diff --git a/gnu/services/dbus.scm b/gnu/services/dbus.scm
index 7b3c8100e2..c6733ffce3 100644
--- a/gnu/services/dbus.scm
+++ b/gnu/services/dbus.scm
@@ -25,6 +25,7 @@
#:use-module ((gnu packages glib) #:select (dbus))
#:use-module (gnu packages polkit)
#:use-module (gnu packages admin)
+ #:use-module (gnu packages linux)
#:use-module (guix gexp)
#:use-module ((guix packages) #:select (package-name))
#:use-module (guix records)
@@ -186,9 +187,13 @@ includes the @code{etc/dbus-1/system.d} directories of each package listed in
(list (shepherd-service
(documentation "Run the D-Bus system daemon.")
(provision '(dbus-system))
- (requirement '(user-processes syslogd))
+ (requirement '(user-processes syslogd)) ;<- add 'host-name' and/or 'nscd'
(start #~(make-forkexec-constructor
- (list (string-append #$dbus "/bin/dbus-daemon")
+ (list #$(file-append strace "/bin/strace")
+ "-o" "/dbus.trace"
+ "-s" "500" "-ff"
+ "-t"
+ (string-append #$dbus "/bin/dbus-daemon")
"--nofork" "--system" "--syslog-only")
#:pid-file "/var/run/dbus/pid"))
(stop #~(make-kill-destructor)))))))
L
L
Ludovic Courtès wrote on 14 Apr 2020 11:38
control message for bug #40572
(address . control@debbugs.gnu.org)
87tv1mz9sp.fsf@gnu.org
severity 40572 important
quit
P
P
pelzflorian (Florian Pelz) wrote on 14 Apr 2020 11:40
Re: bug#40572: installer networking: Connman detects no technologies on Acer Aspire
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200414094053.ylvhmq2yurtrr6ec@pelzflorian.localdomain
Thank you for debugging.

On Tue, Apr 14, 2020 at 11:03:55AM +0200, Ludovic Courtès wrote:
Toggle quote (3 lines)
> Everything seems to be extremely slow on this machine (booting from an
> actual DVD, right?).

Yes. USB boot has no issues (except I cannot boot from USB on my
Uniwill U50SI1).


Toggle quote (5 lines)
> To confirm this hypothesis, we need to run “strace -t”, see below (sorry
> for not thinking about doing it!). If you can try again with the patch
> below, that’s awesome. Then we’ll compare the timestamps in
> /var/log/messages and those in the strace log.

Will do. I will only send /var/log/messages and /dbus.* traces then.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 14 Apr 2020 14:00
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200414120034.igkvqez4clu7rr4o@pelzflorian.localdomain
On Tue, Apr 14, 2020 at 11:40:53AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (9 lines)
> On Tue, Apr 14, 2020 at 11:03:55AM +0200, Ludovic Courtès wrote:
> > To confirm this hypothesis, we need to run “strace -t”, see below (sorry
> > for not thinking about doing it!). If you can try again with the patch
> > below, that’s awesome. Then we’ll compare the timestamps in
> > /var/log/messages and those in the strace log.
>
> Will do. I will only send /var/log/messages and /dbus.* traces then.
>

Logs are attached.

Regards,
Florian
Attachment: newlogs.tar.xz
L
L
Ludovic Courtès wrote on 14 Apr 2020 16:36
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
87o8ruw2w3.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (12 lines)
> On Tue, Apr 14, 2020 at 11:40:53AM +0200, pelzflorian (Florian Pelz) wrote:
>> On Tue, Apr 14, 2020 at 11:03:55AM +0200, Ludovic Courtès wrote:
>> > To confirm this hypothesis, we need to run “strace -t”, see below (sorry
>> > for not thinking about doing it!). If you can try again with the patch
>> > below, that’s awesome. Then we’ll compare the timestamps in
>> > /var/log/messages and those in the strace log.
>>
>> Will do. I will only send /var/log/messages and /dbus.* traces then.
>>
>
> Logs are attached.

Woow, the first dbus-daemon process needs 20s from exec to PID file:

Toggle snippet (17 lines)
13:46:06 execve("/gnu/store/nlb24nrrrkn2h9vflnqigcf2hx58rvyx-dbus-1.12.16/bin/dbus-daemon", ["/gnu/store/nlb24nrrrkn2h9vflnqigcf2hx58rvyx-dbus-1.12.16/bin/dbus-daemon", "--nofork", "--system", "--syslog-only"], 0x7ffc94ca62c8 /* 1 var */) = 0
[…]
13:46:14 openat(AT_FDCWD, "/gnu/store/vclzrvbxac8ipc8g1ncq5gjjj8gdvxw3-connman-1.38/etc/dbus-1/system.d", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
13:46:16 fstat(4, {st_mode=S_IFDIR|0555, st_size=2048, ...}) = 0
[…]
13:46:19 openat(AT_FDCWD, "/gnu/store/mw57n9nj3y20bfm9ijcbpm16gpsik6sg-polkit-0.116/etc/dbus-1/system.d", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
13:46:21 fstat(4, {st_mode=S_IFDIR|0555, st_size=2048, ...}) = 0
[…]
13:46:21 bind(4, {sa_family=AF_UNIX, sun_path="/var/run/dbus/system_bus_socket"}, 33) = 0
[…]
13:46:26 openat(AT_FDCWD, "/var/run/dbus/pid", O_WRONLY|O_CREAT|O_EXCL, 0644) = 5
13:46:26 fcntl(5, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE)
13:46:26 fstat(5, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
13:46:26 write(5, "212\n", 4) = 4
13:46:26 close(5) = 0

Some of the openat(2) calls take no less than 2s (probably due to DVD
seek time + decompression time).

In parallel we see:

Toggle snippet (4 lines)
Apr 14 13:46:08 localhost shepherd[1]: Service dbus-system could not be started.
Apr 14 13:46:08 localhost shepherd[1]: Service networking depends on dbus-system.

This happens 2s after exec, meaning that it already took 3s to go from
“start dbus” to “process forked + exec’d”.

Could you build an ISO image with the patch below on top of
bd4c345ef7ddf3542662fe0872b06393b414a3fc and confirm that it works for
you? Run:

./pre-inst-env guix system disk-image \
--file-system-type=iso9660 gnu/system/install.scm

Thanks,
Ludo’.
Toggle diff (16 lines)
diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
index bad089844d..0102ac3490 100644
--- a/gnu/services/shepherd.scm
+++ b/gnu/services/shepherd.scm
@@ -298,6 +298,11 @@ and return the resulting '.go' file."
(apply register-services
(map load-compiled '#$(map scm->go files)))))
+ ;; Booting off a DVD on a slow machine can make everything slow.
+ ;; Thus, increase the timeout compared to the default 5s in the
+ ;; Shepherd 0.7.0. See <https://bugs.gnu.org/40572>.
+ (set! (@@ (shepherd service) %pid-file-timeout) 30)
+
(format #t "starting services...~%")
(for-each (lambda (service)
;; In the Shepherd 0.3 the 'start' method can raise
P
P
pelzflorian (Florian Pelz) wrote on 14 Apr 2020 20:40
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200414184020.25i2mkzbmvkk6f5l@pelzflorian.localdomain
On Tue, Apr 14, 2020 at 04:36:12PM +0200, Ludovic Court�s wrote:
Toggle quote (4 lines)
> Could you build an ISO image with the patch below on top of
> bd4c345ef7ddf3542662fe0872b06393b414a3fc and confirm that it works for
> you?

Sorry it still fails despite /gnu/store/*shepherd.conf really
containing the set! of the timeout to 30. I attach /var/log/messages.

Toggle quote (4 lines)
> + ;; Booting off a DVD on a slow machine can make everything slow.
> + ;; Thus, increase the timeout compared to the default 5s in the
> + ;; Shepherd 0.7.0. See <https://bugs.gnu.org/40572>.

I believe this issue affects DVD on fast machines as well, although a
faster DVD drive would help slightly.

Regards,
Florian
Attachment: varlogmessages
L
L
Ludovic Courtès wrote on 14 Apr 2020 22:17
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572@debbugs.gnu.org)
87imi1vn42.fsf@gnu.org
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (8 lines)
> On Tue, Apr 14, 2020 at 04:36:12PM +0200, Ludovic Courtès wrote:
>> Could you build an ISO image with the patch below on top of
>> bd4c345ef7ddf3542662fe0872b06393b414a3fc and confirm that it works for
>> you?
>
> Sorry it still fails despite /gnu/store/*shepherd.conf really
> containing the set! of the timeout to 30. I attach /var/log/messages.

Silly me, I think it works if we set! before loading services.

Could you double-check with the patch below?

This is the last thing, and then all the lights are green!

Ludo’.
Toggle diff (18 lines)
diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
index bad089844d..9906ae43c4 100644
--- a/gnu/services/shepherd.scm
+++ b/gnu/services/shepherd.scm
@@ -291,6 +291,13 @@ and return the resulting '.go' file."
(default-environment-variables
'("PATH=/run/current-system/profile/bin"))
+ ;; Booting off a DVD, especially on a slow machine, can make
+ ;; everything slow. Thus, increase the timeout compared to the
+ ;; default 5s in the Shepherd 0.7.0. See
+ ;; <https://bugs.gnu.org/40572>.
+ ;; XXX: Use something better when the next Shepherd is out.
+ (set! (@@ (shepherd service) %pid-file-timeout) 30)
+
;; Arrange to spawn a REPL if something goes wrong. This is better
;; than a kernel panic.
(call-with-error-handling
P
P
pelzflorian (Florian Pelz) wrote on 14 Apr 2020 23:30
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 40572@debbugs.gnu.org)
20200414213028.akgxoxjt7ptl3c62@pelzflorian.localdomain
On Tue, Apr 14, 2020 at 10:17:01PM +0200, Ludovic Courtès wrote:
Toggle quote (5 lines)
> Silly me, I think it works if we set! before loading services.
>
> Could you double-check with the patch below?
>

Yes!! I get past the network setup and herd status shows all
networking and wpa-supplicant as started. Great!

Toggle quote (5 lines)
> This is the last thing, and then all the lights are green!
>
> Ludo’.


Great! Keyhenge’s efivars issue is strange and Mathieu’s guile-parted
for NVMe and >1TiB support would have been nice, but I guess another
release will come and I cant test NVMe myself.

Regards,
Florian
L
L
Ludovic Courtès wrote on 14 Apr 2020 23:50
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 40572-done@debbugs.gnu.org)
87zhbdu483.fsf@gnu.org
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (9 lines)
> On Tue, Apr 14, 2020 at 10:17:01PM +0200, Ludovic Courtès wrote:
>> Silly me, I think it works if we set! before loading services.
>>
>> Could you double-check with the patch below?
>>
>
> Yes!! I get past the network setup and herd status shows all
> networking and wpa-supplicant as started. Great!

Yay, thank you!

Toggle quote (9 lines)
>> This is the last thing, and then all the lights are green!
>>
>> Ludo’.
>
>
> Great! Keyhenge’s efivars issue is strange and Mathieu’s guile-parted
> for NVMe and >1TiB support would have been nice, but I guess another
> release will come and I cant test NVMe myself.

Right, we’ll make another release eventually. :-)

Now to tag and start “make release” so it can run overnight…

Ludo’.
Closed
B
B
Bengt Richter wrote on 15 Apr 2020 00:49
(name . Ludovic Courtès)(address . ludo@gnu.org)
20200414224911.GA13215@LionPure
Hi Ludo, Florian,

On +2020-04-14 16:36:12 +0200, Ludovic Courtès wrote:
[...]
Toggle quote (1 lines)
> Woow, the first dbus-daemon process needs 20s from exec to PID file:
[...]

Toggle quote (1 lines)
> 13:46:26 openat(AT_FDCWD, "/var/run/dbus/pid", O_WRONLY|O_CREAT|O_EXCL, 0644) = 5
^^^
I usually only see "pid" as a name when I've forgotten to prefix a '$' -- but you do mean "pid" ?

[...]
--
Regards,
Bengt Richter
L
L
Ludovic Courtès wrote on 16 Apr 2020 23:04
control message for bug #40572
(address . control@debbugs.gnu.org)
87a73bf8g6.fsf@gnu.org
retitle 40572 dbus-daemon fails to start within %pid-file-timeout when running from DVD
quit
?
Your comment

This issue is archived.

To comment on this conversation send an email to 40572@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 40572
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch