Installer: wpa_supplicant fails to start

DoneSubmitted by Ludovic Courtès.
Details
6 participants
  • Danny Milosavljevic
  • Giovanni Biscuolo
  • Ludovic Courtès
  • pelzflorian (Florian Pelz)
  • Ricardo Wurmus
  • sirgazil
Owner
unassigned
Severity
important
L
L
Ludovic Courtès wrote on 3 May 2019 21:31
(name . Bug Guix)(address . bug-guix@gnu.org)
87sgtv8hcz.fsf@gnu.org
Hello,
Ricardo and sirgazil reported on IRC that the ‘wpa-supplicant’ would(sometimes?) fail to start for the in the installation image on theirlaptop.
As a result the ‘networking’ service (Connman) would not start either;thus, the installer would report that it failed to establish Internetaccess, and would just stop there.
Sirgazil posted /var/log/messages, which shows this:
Toggle snippet (44 lines)667 May 3 16:03:40 localhost vmunix: [ 9.164363] ieee80211 phy0: Atheros AR9565 Rev:2 mem=0xffffa88240e80000, irq=19668 May 3 16:03:40 localhost vmunix: [ 9.172514] ath9k 0000:02:00.0 wlp2s0: renamed from wlan0669 May 3 16:03:40 localhost vmunix: [ 9.199748] shepherd[1]: Service udev has been started.670 May 3 16:03:40 localhost vmunix: [ 9.200682] shepherd[1]: Service file-system-/tmp has been started.671 May 3 16:03:40 localhost vmunix: [ 9.201560] shepherd[1]: Service file-system-/dev/pts has been started.672 May 3 16:03:40 localhost vmunix: [ 9.202423] shepherd[1]: Service file-system-/dev/shm has been started.673 May 3 16:03:40 localhost vmunix: [ 9.203232] shepherd[1]: Service file-system-/gnu/store has been started.674 May 3 16:03:40 localhost vmunix: [ 9.203891] shepherd[1]: Service file-systems has been started.675 May 3 16:03:40 localhost vmunix: [ 9.207605] shepherd[1]: Service user-processes has been started.676 May 3 16:03:40 localhost vmunix: [ 9.208993] shepherd[1]: Service host-name has been started.677 May 3 16:03:40 localhost vmunix: [ 9.217925] shepherd[1]: Service user-homes has been started.678 May 3 16:03:40 localhost vmunix: [ 10.221211] shepherd[1]: Service dbus-system has been started.679 May 3 16:03:40 localhost vmunix: [ 10.222093] shepherd[1]: Service loopback has been started.680 May 3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant681 May 3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.682 May 3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.683 May 3 16:03:40 localhost shepherd[1]: Service networking could not be started.684 May 3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered685 May 3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant686 May 3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.687 May 3 16:03:45 localhost nscd: 401 monitoring file `/etc/hosts` (1)688 May 3 16:03:45 localhost nscd: 401 monitoring directory `/etc` (2)689 May 3 16:03:45 localhost nscd: 401 monitoring file `/etc/resolv.conf` (3)690 May 3 16:03:45 localhost nscd: 401 monitoring directory `/etc` (2)691 May 3 16:03:46 localhost shepherd[1]: Service nscd has been started.692 May 3 16:03:46 localhost /gnu/store/3dnww56dsrd6f9pf9p8ll3yawhfp50d8-gpm-1.20.7/sbin/gpm[409]: *** info [daemon/startup.c(136)]:693 May 3 16:03:46 localhost /gnu/store/3dnww56dsrd6f9pf9p8ll3yawhfp50d8-gpm-1.20.7/sbin/gpm[409]: Started gpm successfully. Entered daemon mode.694 May 3 16:03:46 localhost shepherd[1]: Service gpm has been started.695 May 3 16:03:46 localhost shepherd[1]: Service term-tty1 has been started.696 May 3 16:03:46 localhost shepherd[1]: Service console-font-tty1 has been started.697 May 3 16:03:46 localhost shepherd[1]: Service virtual-terminal has been started.698 May 3 16:03:46 localhost shepherd[1]: Service term-tty2 has been started.699 May 3 16:03:46 localhost shepherd[1]: Service console-font-tty2 has been started.700 May 3 16:03:46 localhost shepherd[1]: Service term-tty3 has been started.701 May 3 16:03:46 localhost shepherd[1]: Service console-font-tty3 has been started.702 May 3 16:03:46 localhost shepherd[1]: Service term-tty4 has been started.703 May 3 16:03:46 localhost shepherd[1]: Service console-font-tty4 has been started.704 May 3 16:03:46 localhost shepherd[1]: Service term-tty5 has been started.705 May 3 16:03:46 localhost shepherd[1]: Service console-font-tty5 has been started.706 May 3 16:03:46 localhost shepherd[1]: Service term-tty6 has been started.707 May 3 16:03:46 localhost shepherd[1]: Service console-font-tty6 has been started.708 May 3 16:03:46 localhost shepherd[1]: Service guix-daemon has been started.
Ludo’.
L
L
Ludovic Courtès wrote on 3 May 2019 22:51
(address . 35550@debbugs.gnu.org)
875zqr8dnw.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (10 lines)> 678 May 3 16:03:40 localhost vmunix: [ 10.221211] shepherd[1]: Service dbus-system has been started.> 679 May 3 16:03:40 localhost vmunix: [ 10.222093] shepherd[1]: Service loopback has been started.> 680 May 3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant> 681 May 3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.> 682 May 3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.> 683 May 3 16:03:40 localhost shepherd[1]: Service networking could not be started.> 684 May 3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered> 685 May 3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant> 686 May 3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.
My guess is that it goes like this:
1. shepherd starts ‘networking’, which triggers the start of ‘wpa-supplicant’ (PID 398), which immediately “fails”. Thus ‘networking’ is not started.
2. shepherd continues and starts ‘wpa-supplicant’ directly (PID 400). This time it fails for good; after 5 seconds, since the PID file didn’t show up, shepherd says again that it could not be started.
Indeed, by looking at shepherd.conf from:
guix gc -R $(guix system build gnu/system/install.scm) | grep shepherd.conf
one can see that ‘networking’ comes before ‘wpa-supplicant’ in the expression:
(for-each start '(… networking … wpa-supplicant …))
So why is ‘wpa-supplicant’ marked as failing to start on the firstattempt?
The only reason I can think of is if ‘read-pid-file’ from (shepherdservice) returns immediately and returns #f instead of a number. Thatcan actually happen if the PID file exists but is empty (or containsgarbage).
You would expect wpa_supplicant to create its PID file atomically: writeit under a different name, then rename(2)… but no:
Toggle snippet (23 lines)int os_daemonize(const char *pid_file){#if defined(__uClinux__) || defined(__sun__) return -1;#else /* defined(__uClinux__) || defined(__sun__) */ if (os_daemon(0, 0)) { perror("daemon"); return -1; }
if (pid_file) { FILE *f = fopen(pid_file, "w"); if (f) { fprintf(f, "%u\n", getpid()); fclose(f); } }
return -0;#endif /* defined(__uClinux__) || defined(__sun__) */}
So there is a possibility, albeit unlikely, for shepherd to see the PIDfile after it’s been open but before it’s been written to. (Thisproblem is not limited to the installer.)
I’m not 100% convinced that this is what’s happening there but that’sthe only lead I have. I’m surprised we haven’t seen other reportsbefore.
Thoughts?
Ludo’.
R
R
Ricardo Wurmus wrote on 3 May 2019 23:26
Re: Installer: wpa_supplicant fails to start
(name . Ludovic Courtès)(address . ludo@gnu.org)
877eb7grfn.fsf@elephly.net
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (4 lines)> Ricardo and sirgazil reported on IRC that the ‘wpa-supplicant’ would> (sometimes?) fail to start for the in the installation image on their> laptop.
What I observed was that the wpa-supplicant *service* would not startand starting it manually would fail because a wpa_supplicant *process*would already be running.
Upon killing the stray process the service could be started.
-- Ricardo
S
S
sirgazil wrote on 4 May 2019 05:26
Installer: wpa_supplicant fails to start
(name . 35550)(address . 35550@debbugs.gnu.org)
16a80e1461c.1292821a927951.2710363725930741694@zoho.com
In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).
Killing "wpa-supplicant" and then running "herd start networking" made it possible for the graphical installer to display the steps where you select Wireless or Wired, and then continue normally.
Attachment: file
L
L
Ludovic Courtès wrote on 4 May 2019 14:42
control message for bug #35550
(address . control@debbugs.gnu.org)
87sgtu4che.fsf@gnu.org
severity 35550 important
L
L
Ludovic Courtès wrote on 4 May 2019 14:49
Re: bug#35550: Installer: wpa_supplicant fails to start
(address . sirgazil@zoho.com)
87k1f64c5o.fsf@gnu.org
Hi,
sirgazil@zoho.com skribis:
Toggle quote (2 lines)> In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).
Could you check whether the sequence of events in /var/log/messagesalways looks like the snippet I showed?
Could you also:
1. Let the installation image boot;
2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’ service is marked as stopped (failed to start);
3. Show the contents /var/run/wpa_supplicant.pid, and check whether they correspond to the PID of a running ‘wpa_supplicant’ process.
?
Besides, do you have a Guix(SD) installation, recent or not, that useswpa_supplicant? If so, does it suffer from the same problem(‘wpa-supplicant’ service fails to start)?
Thanks,Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 4 May 2019 16:07
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190504140753.6wjgctw6ird7xj2b@pelzflorian.localdomain
On Sat, May 04, 2019 at 02:49:55PM +0200, Ludovic Courtès wrote:
Toggle quote (10 lines)> Hi,> > sirgazil@zoho.com skribis:> > > In my case, "wpa-supplicant" would *always* fail to start (I rebooted and tried the graphical installer around 4 times).> > Could you check whether the sequence of events in /var/log/messages> always looks like the snippet I showed?>
So for me it sometimes fails on two different machines. Actually itstarted failing four times in a row now that I test this.
When it fails, I also get
On Fri, May 03, 2019 at 09:31:40PM +0200, Ludovic Courtès wrote:
Toggle quote (18 lines)> 680 May 3 16:03:40 localhost wpa_supplicant[398]: Successfully initialized wpa_supplicant> 681 May 3 16:03:40 localhost shepherd[1]: Service wpa-supplicant could not be started.> 682 May 3 16:03:40 localhost shepherd[1]: Service networking depends on wpa-supplicant.> 683 May 3 16:03:40 localhost shepherd[1]: Service networking could not be started.> 684 May 3 16:03:40 localhost wpa_supplicant[400]: dbus: Could not request service name: already registered> 685 May 3 16:03:40 localhost wpa_supplicant[400]: Failed to initialize wpa_supplicant> 686 May 3 16:03:45 localhost shepherd[1]: Service wpa-supplicant could not be started.


> Could you also:> > 1. Let the installation image boot;> > 2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’> service is marked as stopped (failed to start);>
$ herd status wpa-supplicantStatus of wpa-supplicant: It is stopped. It is enabled. Provides (wpa-supplicant). Requires (user-processes dbus-system loopback syslogd). Conflicts with (). Will be respawned.


Toggle quote (4 lines)> 3. Show the contents /var/run/wpa_supplicant.pid, and check whether> they correspond to the PID of a running ‘wpa_supplicant’ process.>
There is no such file.
Regards,Florian
L
L
Ludovic Courtès wrote on 4 May 2019 17:28
(address . 35550@debbugs.gnu.org)
87a7g22q9n.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (3 lines)> You would expect wpa_supplicant to create its PID file atomically: write> it under a different name, then rename(2)… but no:
Shepherd commit 72631752149d000c8c98ae0cc66e0b0c2eda94ef changes‘read-pid-file’ to better deal with this.
Ludo’.
L
L
Ludovic Courtès wrote on 4 May 2019 17:32
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
877eb62q2g.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (25 lines)>> Could you also:>> >> 1. Let the installation image boot;>> >> 2. Confirm with ‘herd status wpa-supplicant’ that the ‘wpa-supplicant’>> service is marked as stopped (failed to start);>>>> $ herd status wpa-supplicant> Status of wpa-supplicant:> It is stopped.> It is enabled.> Provides (wpa-supplicant).> Requires (user-processes dbus-system loopback syslogd).> Conflicts with ().> Will be respawned.>>>>> 3. Show the contents /var/run/wpa_supplicant.pid, and check whether>> they correspond to the PID of a running ‘wpa_supplicant’ process.>>>> There is no such file.
At this point, could you also check:
3b. The output of “pgrep -fa wpa_supplicant”.
The PID file was probably removed by the second attempt to start the‘wpa-supplicant’ service.
Could you also try running (as root):
while herd restart wpa-supplicant; do : ; done
to see the frequency at which the service fails to start?
Thanks!
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 4 May 2019 17:47
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190504154743.i5sv6spf4jsbxruu@pelzflorian.localdomain
On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:
Toggle quote (5 lines)> At this point, could you also check:> > 3b. The output of “pgrep -fa wpa_supplicant”.>
root@gnu ~# pgrep -fa wpa_supplicant481 /gnu/store/ifqy2dr2hbqplv14k50dzprw3k2bz9m9-wpa-supplicant-2.7/sbin/wpa_supplicant -P/var/run/wpa_supplicant.pid -B -s -u

Toggle quote (10 lines)> The PID file was probably removed by the second attempt to start the> ‘wpa-supplicant’ service.> > Could you also try running (as root):> > while herd restart wpa-supplicant; do : ; done> > to see the frequency at which the service fails to start?>
It displays
Service wpa-supplicant is not running.Service wpa-supplicant could not be started.
after and again and again every approximately 5 seconds.
Regards,Florian
L
L
Ludovic Courtès wrote on 4 May 2019 23:14
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
878svl2a8a.fsf@gnu.org
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (2 lines)> On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:
[...]
Toggle quote (12 lines)>> Could you also try running (as root):>> >> while herd restart wpa-supplicant; do : ; done>> >> to see the frequency at which the service fails to start?>>>> It displays>> Service wpa-supplicant is not running.> Service wpa-supplicant could not be started.
Sorry, to be complete, I should have asked you to first kill the“dangling” wpa_supplicant process:
pkill -f wpa_supplicant pgrep -fa wpa_supplicant # make sure it’s really dead while herd restart wpa-supplicant; do : ; done
Could you try this?
TIA,Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 4 May 2019 23:34
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190504213402.2pdyieijtpejpi4p@pelzflorian.localdomain
On Sat, May 04, 2019 at 11:14:29PM +0200, Ludovic Courtès wrote:
Toggle quote (26 lines)> Hi,> > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:> > > On Sat, May 04, 2019 at 05:32:23PM +0200, Ludovic Courtès wrote:> > [...]> > >> Could you also try running (as root):> >> > >> while herd restart wpa-supplicant; do : ; done> >> > >> to see the frequency at which the service fails to start?> >>> >> > It displays> >> > Service wpa-supplicant is not running.> > Service wpa-supplicant could not be started.> > Sorry, to be complete, I should have asked you to first kill the> “dangling” wpa_supplicant process:> > pkill -f wpa_supplicant> pgrep -fa wpa_supplicant # make sure it’s really dead
pgrep does not display anything.

Toggle quote (5 lines)> while herd restart wpa-supplicant; do : ; done> > Could you try this?>
Now it displays
Service wpa-supplicant has been stopped.Service wpa-supplicant has been started.
again and again incessantly, one immediately after the other with notime in between.
By the way, the network connection only failed sometimes when I testedGuix 1.0.0, but since I started using a Guix disk image from a morerecent Guix, wpa-supplicant apparently always fails. Strange.
Regards,Florian
L
L
Ludovic Courtès wrote on 5 May 2019 18:29
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87lfzkzwxx.fsf@gnu.org
Hello,
I’ve been able to test it on the bare metal as well and I confirm: Ibooted the install image a few times and the ‘wpa-supplicant’ servicewould be stopped every time, with a dangling ‘wpa_supplicant’ processrunning. Given that this seems to be systematic, I’m not sure my PIDfile atomicity hypothesis explains everything.
I’ve tried running:
pkill wpa_supplicant while herd start wpa-supplicant ; do herd stop wpa-supplicant ; done
and that always works (i.e., no “failed to start ‘wpa-supplicant’”error.)
If I run in parallel this:
while touch /var/run/wpa_supplicant.pid ; do : ; done
then it occasionally “fails to start”, but not very often.
To be continued…
Ludo’.
L
L
Ludovic Courtès wrote on 5 May 2019 22:26
(address . 35550@debbugs.gnu.org)
87zho0y7fq.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (3 lines)> You would expect wpa_supplicant to create its PID file atomically: write> it under a different name, then rename(2)… but no:
Regarding this PID file atomicity issue, a few other data points: Tor,syslogd (GNU Inetutils), sshd (OpenSSH), and dbus-daemon all do theexact same non-atomic PID file creation as wpa_supplicant.
Libdaemon (used by Avahi) does something more fancy: it acquires anexclusive write lock (fcntl with F_SETLKW) on the PID file right afteropening it, but since shepherd does not attempt to acquire locks on PIDfile, it has no effect.
systemd and GNU Pies don’t take any precautions while reading PIDfiles. The assumption seems to be that if the file exists, then it mustcontain a valid PID.
So perhaps the problem we see with ‘wpa_supplicant’ is elsewhere.
Ludo’.
L
L
Ludovic Courtès wrote on 6 May 2019 00:21
(address . 35550@debbugs.gnu.org)
87k1f4y23d.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (8 lines)> So why is ‘wpa-supplicant’ marked as failing to start on the first> attempt?>> The only reason I can think of is if ‘read-pid-file’ from (shepherd> service) returns immediately and returns #f instead of a number. That> can actually happen if the PID file exists but is empty (or contains> garbage).
I’ve produced an ISO with the patch below and ran it on the bare metalto get confirmation (too bad the bug doesn’t show in QEMU :-/). Indeed,‘read-pid-file’ for /var/run/wpa_supplicant.pid systematically reads theempty string the first time the ‘wpa-supplicant’ service is started.
(After that, if we kill the process and try to restart the service, theproblem doesn’t show up.)
Toggle diff (20 lines)diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scmindex 53437b6..e21492e 100644--- a/modules/shepherd/service.scm+++ b/modules/shepherd/service.scm@@ -717,9 +717,12 @@ otherwise return the number that was read (a PID)." (let loop () (catch 'system-error (lambda ()- (string->number- (string-trim-both- (call-with-input-file file get-string-all))))+ (define str+ (call-with-input-file file get-string-all))++ (local-output (l10n "read-pid-file ~s -> ~s")+ file str)+ (string->number (string-trim-both str))) (lambda args (let ((errno (system-error-errno args))) (if (= ENOENT errno)
With the second patch below, I confirm that the ‘wpa-supplicant’ startscorrectly. We can see in /var/log/messages that ‘read-pid-file’ firstreads the empty string from /var/run/wpa_supplicant.pid, then triesagain, and gets a valid PID on the second attempt.
It’s surprising that the timing is always like that, and only on thebare metal, but that’s the way it is.
It’d be great if you could do some testing with the patch below. Then Iguess we’ll push a Shepherd release with this fix.
I wonder if this could also explainhttps://issues.guix.info/issue/30993.
Thanks,Ludo’.
Toggle diff (63 lines)diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scmindex dfc3467bf8..e1dd248679 100644--- a/gnu/packages/admin.scm+++ b/gnu/packages/admin.scm@@ -188,7 +188,8 @@ and provides a \"top-like\" mode (monitoring).") version ".tar.gz")) (sha256 (base32- "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))))+ "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))+ (patches (search-patches "shepherd-debug.patch")))) (build-system gnu-build-system) (arguments '(#:configure-flags '("--localstatedir=/var")))diff --git a/gnu/packages/patches/shepherd-debug.patch b/gnu/packages/patches/shepherd-debug.patchnew file mode 100644index 0000000000..2fd97cc578--- /dev/null+++ b/gnu/packages/patches/shepherd-debug.patch@@ -0,0 +1,43 @@+diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm+index 53437b6..bef8f42 100644+--- a/modules/shepherd/service.scm++++ b/modules/shepherd/service.scm+@@ -715,21 +715,28 @@ number. Return #f if FILE was not created or does not contain a number;+ otherwise return the number that was read (a PID)."+ (define start (current-time))+ (let loop ()++ (define (retry)++ (and (< (current-time) (+ start max-delay))++ (begin++ ;; FILE does not exist yet, so wait and try again.++ ;; XXX: Ideally we would yield to the main event loop++ ;; and/or use inotify.++ (sleep 1)++ (loop))))+++ (catch 'system-error+ (lambda ()+- (string->number+- (string-trim-both+- (call-with-input-file file get-string-all))))++ (define str++ (call-with-input-file file get-string-all))++++ (local-output (l10n "read-pid-file ~s -> ~s")++ file str)++ (or (string->number (string-trim-both str))++ (retry)))+ (lambda args+ (let ((errno (system-error-errno args)))+ (if (= ENOENT errno)+- (and (< (current-time) (+ start max-delay))+- (begin+- ;; FILE does not exist yet, so wait and try again.+- ;; XXX: Ideally we would yield to the main event loop+- ;; and/or use inotify.+- (sleep 1)+- (loop)))++ (retry)+ (apply throw args)))))))+ + (define* (exec-command command
P
P
pelzflorian (Florian Pelz) wrote on 6 May 2019 21:47
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190506194741.qm7uha4fomsngdga@pelzflorian.localdomain
On Mon, May 06, 2019 at 12:21:26AM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)> It’d be great if you could do some testing with the patch below. Then I> guess we’ll push a Shepherd release with this fix.>
The second patch (the one for Guix is the only one I have tested)works fine on the system which always failed before since sometimeafter 1.0.
One time wpa-supplicant worked fine but it had an error establishingthe internet connection, but I am quite certain it was an unrelatedhardware issue.
Regards,Florian
D
D
Danny Milosavljevic wrote on 6 May 2019 22:00
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190506220030.4a608b37@scratchpost.org
Hi Ludo,
what happens when the loop reads the pid file when it contains just half of anumeral? It won't detect it, right?
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAlzQkl4ACgkQ5xo1VCwwuqXR2QgAkyT7VO/EanKhg81l0bDe7A0rLjtgoR83/+ijYjHXcqwnKYqsixD+Eh7j8sbxvvAW6rP0Th+ngbu02ZCxkn0vWmgQdpJGYMOXOUwCo010gK56w0S+LYW4PvYqj3ppUP+x+QZz0AUvuio/GIiMc67ede1PaFpNqKGJdkgqMCX4Gn04xUQwOOILi6tD8Sj8m6SAvhrmgKVt9+Qu7yjvR+Yd1k+CvZIVafcI9vOTG26P/xg7cnBxpwRne7wEiDhr+rOE0K6++sQf48hl798zeve850Rk1/LZj4EzeJNUJRU5utzPzu8OiuPSysO9KgkepMfrgi8d26kQ0WG0WG7pETICQQ===UahM-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 7 May 2019 09:53
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87r29aaeel.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (13 lines)> On Mon, May 06, 2019 at 12:21:26AM +0200, Ludovic Courtès wrote:>> It’d be great if you could do some testing with the patch below. Then I>> guess we’ll push a Shepherd release with this fix.>> >> The second patch (the one for Guix is the only one I have tested)> works fine on the system which always failed before since sometime> after 1.0.>> One time wpa-supplicant worked fine but it had an error establishing> the internet connection, but I am quite certain it was an unrelated> hardware issue.
Awesome, thanks for testing.
Note that I made similar changes in shepherd, which I’ll probablyrelease as 0.6.1 so we can use it in Guix 1.0.1:
https://git.savannah.gnu.org/cgit/shepherd.git/log
Ludo’.
L
L
Ludovic Courtès wrote on 7 May 2019 10:05
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87imumadvv.fsf@gnu.org
Hi Danny,
Danny Milosavljevic <dannym@scratchpost.org> skribis:
Toggle quote (3 lines)> what happens when the loop reads the pid file when it contains just half of a> numeral? It won't detect it, right?
Correct.
I’m proposing the addition below to be on the verrrry safe side.WDYT?
The weird thing, as I mentioned earlier, is that systemd and Pies do notprotect against truncated PID files, and I couldn’t find any“documentation” of the problem on the intertubes. For systemd it’smaybe less of a problem since services are started in a cgroup, I think.
Thanks,Ludo’.
Toggle diff (52 lines)diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scmindex b937609..ef27d88 100644--- a/modules/shepherd/service.scm+++ b/modules/shepherd/service.scm@@ -709,10 +709,16 @@ results." set when starting a service." (environ)) -(define* (read-pid-file file #:key (max-delay 5))+(define* (read-pid-file file #:key (max-delay 5)+ (validate-pid? #f)) "Wait for MAX-DELAY seconds for FILE to show up, and read its content as a number. Return #f if FILE was not created or does not contain a number;-otherwise return the number that was read (a PID)."+otherwise return the number that was read (a PID).++When VALIDATE-PID? is true, succeed if and only if the number that was read is+the PID of an existing process in the current PID namespace. This test cannot+be used if FILE might contain a PID from another PID namespace (i.e., the+daemon writing FILE is running in a separate PID namespace.)" (define start (current-time)) (let loop ()@@ -736,11 +742,13 @@ otherwise return the number that was read (a PID)." (try-again)) ((? integer? pid) ;; It's possible, though unlikely, that PID is not a valid PID, for- ;; instance because writes to FILE did not complete. However, we- ;; don't do (kill pid 0) because if the process lives in a separate- ;; PID namespace, then PID is probably invalid in our own- ;; namespace.- pid)))+ ;; instance because writes to FILE did not complete. When+ ;; VALIDATE-PID? is true, check that PID is valid in the current+ ;; PID namespace.+ (if (or (not validate-pid?)+ (catch-system-error (kill pid 0) #t))+ pid+ (try-again))))) (lambda args (let ((errno (system-error-errno args))) (if (= ENOENT errno)@@ -931,7 +939,8 @@ start." environment-variables))) (if pid-file (match (read-pid-file pid-file- #:max-delay pid-file-timeout)+ #:max-delay pid-file-timeout+ #:validate-pid? #t) (#f (catch-system-error (kill pid SIGTERM)) #f)
G
G
Giovanni Biscuolo wrote on 8 May 2019 13:32
Re: bug#35630: graphical install, no network device found (HP EliteDesk 800 G1)
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87v9yl6v10.fsf@roquette.mug.biscuolo.net
Hi Florian,
thank you for pointing me to the right bug report!
I checked the bug reports list via web [1] but did not linkwpa-supplicant to my issue since I did not realized that such an issuealso affects wired connections
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes:
Toggle quote (5 lines)> I hope this is an instance of https://issues.guix.info/issue/35550,> that is “herd status wpa-supplicant” shows that wpa-supplicant is> stopped instead of running? I hope there will be a new release soon> where this is fixed.
yes, this is definitely an instance of #35550
I confirm I see the very same wpa_supplicant log messages in/var/log/messages as reported by Sirgazil and pelzflorian
the following suggested workaround (in a text console activated viaCTRL-ALT-F3) works for me:
Toggle snippet (4 lines)pkill wpa_supplicantherd start networking
after that I was able to restart the "Network interface" installer step(by choosing "Exit" and selecting it again in the proposed dialog) andwas able to select the wired interface
Thanks! Gio'

[1] I'm going to subscribe bug-guix :-)
-- Giovanni Biscuolo
Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEERcxjuFJYydVfNLI5030Op87MORIFAlzSvmsACgkQ030Op87MORKJHg//Vt8TSxusa37j3Y22Pw2xYySCrW/sf9NBwDzoxa7eJKJBxMHeFhEIbSL3vVe5WQ9eb+ZOeARacPBmWrR/JgLp1RTTtmgToUWmcRhT4gjB0NTt6H4nLonddSXZZ/2jG/w9yDwBdh6/74Y8kAVJvnld76eRpvLqmFbr3az5BOllXF5Zsk4tRft0SQ9tC1A+cC/Wh19MKtvwQ+guqua9khgeMuCp3IKY9SvpPi5w+WxGAS4NSUQc6i9JC2DvxSEbMIhCJwHBw3yoZesM1yojYgIb/kXP9cueUtuh7TBQW6cX1rjCTkRNoF5cegUX230BvqCr17DldU9Ew2VbRw2LwqQZRBqtvkxE29TwecrzHnbHZe1P4emIMHEusnOWVh87YuCfh6Oxz4o8EaYucHr8C8C3KaiHDyBj/yWmwcs+9sKXcLuIJOtqjrRdOxdLdI2S7ZTMuefkyGPcxILXvaAow+UFnNjzgdHlD98YfRLBxdwadX4S1u9EqeAT5rHamaIxHeO/fIvHp22MAG+wm/aGfndz8yvFIYEam4Xp/ADz1UGj+FVq0tUVrM1Rb02KsDYerjPuRL4PodVxpcC41fl/tn8z/17JoS7/Qmeim3I94JIU+k/hL0Y2u9C8s5SwfXnMCvLzQyBAJZtKUC65Nt+1CaFB1sblTucVuRQHi7d8rCpeaME==NmY7-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 8 May 2019 16:32
Re: bug#35550: Installer: wpa_supplicant fails to start
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
874l65vwyn.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (2 lines)> I’m proposing the addition below to be on the verrrry safe side.
Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.
Unless something goes wrong, I’ll make Shepherd commitfbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the comingdays (well, with an updated NEWS file.)
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 9 May 2019 00:25
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87imuktwg8.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (10 lines)> Ludovic Courtès <ludo@gnu.org> skribis:>>> I’m proposing the addition below to be on the verrrry safe side.>> Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.>> Unless something goes wrong, I’ll make Shepherd commit> fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming> days (well, with an updated NEWS file.)
If you want to re-test that wpa_supplicant does start correctly with thelatest Shepherd, you can do so by applying the patch below to Guix.
Ludo’.
Toggle diff (42 lines)diff --git a/gnu/packages/admin.scm b/gnu/packages/admin.scmindex dfc3467bf8..6cabea0fb9 100644--- a/gnu/packages/admin.scm+++ b/gnu/packages/admin.scm@@ -181,22 +181,30 @@ and provides a \"top-like\" mode (monitoring).") (define-public shepherd (package (name "shepherd")- (version "0.6.0")+ (version "0.6.1pre1") (source (origin- (method url-fetch)- (uri (string-append "mirror://gnu/shepherd/shepherd-"- version ".tar.gz"))+ (method git-fetch)+ (uri (git-reference+ (url "https://git.savannah.gnu.org/git/shepherd.git")+ (commit "fbb9c3fac745552eaf0f354bd0134cca7027bf17"))) (sha256 (base32- "1ys2w83vm62spr8bx38sccfdpy9fqmj7wfywm5k8ihsy2k61da2i"))))+ "0w7dx2i04qgykzhz4qir3rxnrdfza3q7kzh5z4mpmgjrp08pfnrn")))) (build-system gnu-build-system) (arguments- '(#:configure-flags '("--localstatedir=/var")))+ '(#:configure-flags '("--localstatedir=/var"+ "ac_cv_path_MSGMERGE=true"))) (native-inputs `(("pkg-config" ,pkg-config) ;; This is the Guile we use as a cross-compiler...- ("guile" ,guile-2.2)))+ ("guile" ,guile-2.2)++ ("texinfo" ,texinfo)+ ("help2man" ,help2man)+ ("gettext" ,gnu-gettext)+ ("autoconf" ,autoconf)+ ("automake" ,automake))) (inputs ;; ... and this is the one that appears in shebangs when cross-compiling. `(("guile" ,guile-2.2)
P
P
pelzflorian (Florian Pelz) wrote on 9 May 2019 08:46
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190509064613.cci7yd44toggezln@pelzflorian.localdomain
On Thu, May 09, 2019 at 12:25:59AM +0200, Ludovic Court�s wrote:
Toggle quote (4 lines)> If you want to re-test that wpa_supplicant does start correctly with the> latest Shepherd, you can do so by applying the patch below to Guix.>
Works fine too on Macbook and PC (using a non-ISO disk image).
Regards,Florian
L
L
Ludovic Courtès wrote on 9 May 2019 12:18
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87ef58exsr.fsf@gnu.org
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (7 lines)> On Thu, May 09, 2019 at 12:25:59AM +0200, Ludovic Courtès wrote:>> If you want to re-test that wpa_supplicant does start correctly with the>> latest Shepherd, you can do so by applying the patch below to Guix.>> >> Works fine too on Macbook and PC (using a non-ISO disk image).
Great, thanks for testing!
L
L
Ludovic Courtès wrote on 11 May 2019 20:13
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
875zqgx3ju.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (10 lines)> Ludovic Courtès <ludo@gnu.org> skribis:>>> I’m proposing the addition below to be on the verrrry safe side.>> Pushed as Shepherd commit 1ebea0a6e4c6ff11212eda348072acf9c379e7b2.>> Unless something goes wrong, I’ll make Shepherd commit> fbb9c3fac745552eaf0f354bd0134cca7027bf17 version 0.6.1 in the coming> days (well, with an updated NEWS file.)
Commit 3b8699f9c2a9f88d44a5320398b4c968c432429d upgrades the Shepherd to0.6.1, which fixes the PID file race condition we’ve been talking about.
Closing!
Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send email to 35550@debbugs.gnu.org