Shepherd fails to start in user session ~50% of the time

  • Open
  • quality assurance status badge
Details
4 participants
  • Andrew Tropin
  • Ludovic Courtès
  • Tom Willemse
  • Zacchaeus Scheffer
Owner
unassigned
Submitted by
Tom Willemse
Severity
normal
T
T
Tom Willemse wrote on 16 Sep 2022 03:24
(address . bug-guix@gnu.org)
871qscaz6c.fsf@ryuslash.org
Hi Guix!

I've been using Guix on Archlinux for a little while now, and ever since
I've started using Guix Home on my laptop to start up user-level
services I've been having the issue that about 50% of the time when I
boot my laptop shepherd fails to start.

My .xsession-errors says:

Toggle quote (3 lines)
> shepherd: while opening socket '/run/user/1000/shepherd/socket': bind:
> Address already in use

and looking at my shepherd log:

Toggle quote (27 lines)
> 2022-09-15 11:47:18 Service root has been started.
> 2022-09-15 11:47:18 Service root has been started.
> 2022-09-15 11:47:19 Starting services...
> 2022-09-15 11:47:19 Starting services...
> 2022-09-15 11:47:19 Exiting shepherd...
> 2022-09-15 11:47:19 Service dunst has been started.
> 2022-09-15 11:47:19 Service unclutter has been started.
> 2022-09-15 11:47:19 Service syncthing has been started.
> 2022-09-15 11:47:19 Service polybar has been started.
> 2022-09-15 11:47:19 Service cmst has been started.
> 2022-09-15 11:47:19 Service kdeconnect has been started.
> 2022-09-15 11:47:20 Service xbindkeys has been started.
> 2022-09-15 11:47:20 Service picom has been started.
> 2022-09-15 11:47:20 Service xmodmap has been started.
> 2022-09-15 11:47:20 Service redshift has been started.
> 2022-09-15 11:47:20 Exiting shepherd...
> 2022-09-15 11:47:20 Service syncthing has been stopped.
> 2022-09-15 11:47:20 Service xbindkeys has been stopped.
> 2022-09-15 11:47:20 Service redshift has been stopped.
> 2022-09-15 11:47:20 Service cmst has been stopped.
> 2022-09-15 11:47:20 Service kdeconnect has been stopped.
> 2022-09-15 11:47:20 Service polybar has been stopped.
> 2022-09-15 11:47:20 Service dunst has been stopped.
> 2022-09-15 11:47:20 Service picom has been stopped.
> 2022-09-15 11:47:20 Service unclutter has been stopped.
> 2022-09-15 11:47:20 Exiting.

It looks like it starts twice and then exits both, but I'm not sure why.
I'm guessing it's the ~/.guix-home/activate and
~/.guix-home/on-first-login that are trying to start it.

I'm not sure what other information I can provide you that will help, so
please let me know!


Cheers,

Tom
L
L
Ludovic Courtès wrote on 19 Oct 2022 18:21
(name . Andrew Tropin)(address . andrew@trop.in)
87y1tbhjhj.fsf@gnu.org
Hi,

Andrew, does the bug report below ring a bell?


(I haven’t hit that problem myself.)

Ludo’.

Tom Willemse <tom@ryuslash.org> skribis:

Toggle quote (52 lines)
> Hi Guix!
>
> I've been using Guix on Archlinux for a little while now, and ever since
> I've started using Guix Home on my laptop to start up user-level
> services I've been having the issue that about 50% of the time when I
> boot my laptop shepherd fails to start.
>
> My .xsession-errors says:
>
>> shepherd: while opening socket '/run/user/1000/shepherd/socket': bind:
>> Address already in use
>
> and looking at my shepherd log:
>
>> 2022-09-15 11:47:18 Service root has been started.
>> 2022-09-15 11:47:18 Service root has been started.
>> 2022-09-15 11:47:19 Starting services...
>> 2022-09-15 11:47:19 Starting services...
>> 2022-09-15 11:47:19 Exiting shepherd...
>> 2022-09-15 11:47:19 Service dunst has been started.
>> 2022-09-15 11:47:19 Service unclutter has been started.
>> 2022-09-15 11:47:19 Service syncthing has been started.
>> 2022-09-15 11:47:19 Service polybar has been started.
>> 2022-09-15 11:47:19 Service cmst has been started.
>> 2022-09-15 11:47:19 Service kdeconnect has been started.
>> 2022-09-15 11:47:20 Service xbindkeys has been started.
>> 2022-09-15 11:47:20 Service picom has been started.
>> 2022-09-15 11:47:20 Service xmodmap has been started.
>> 2022-09-15 11:47:20 Service redshift has been started.
>> 2022-09-15 11:47:20 Exiting shepherd...
>> 2022-09-15 11:47:20 Service syncthing has been stopped.
>> 2022-09-15 11:47:20 Service xbindkeys has been stopped.
>> 2022-09-15 11:47:20 Service redshift has been stopped.
>> 2022-09-15 11:47:20 Service cmst has been stopped.
>> 2022-09-15 11:47:20 Service kdeconnect has been stopped.
>> 2022-09-15 11:47:20 Service polybar has been stopped.
>> 2022-09-15 11:47:20 Service dunst has been stopped.
>> 2022-09-15 11:47:20 Service picom has been stopped.
>> 2022-09-15 11:47:20 Service unclutter has been stopped.
>> 2022-09-15 11:47:20 Exiting.
>
> It looks like it starts twice and then exits both, but I'm not sure why.
> I'm guessing it's the ~/.guix-home/activate and
> ~/.guix-home/on-first-login that are trying to start it.
>
> I'm not sure what other information I can provide you that will help, so
> please let me know!
>
>
> Cheers,
>
> Tom
A
A
Andrew Tropin wrote on 20 Oct 2022 08:20
(address . 57844@debbugs.gnu.org)
87a65rf23o.fsf@trop.in
On 2022-10-19 18:21, Ludovic Courtès wrote:

Toggle quote (5 lines)
> Hi,
>
> Andrew, does the bug report below ring a bell?
>

Yes, I don't remember if I created a thread on that (probably not) or
just discussed it in some chat, but when shepherd stops it doesn't clean
up its socket file, so you can't start shepherd again until manually
remove socket.

Checked it right now:
Toggle snippet (4 lines)
herd stop root
shepherd # fails with Address already in use

I found it out, when was experimenting with the place, where I start
shepherd https://issues.guix.gnu.org/57692. To inherit graphical
environment variables I start it by sway compositor, not login shell and
if in addition to sway session I login on another tty, elogind won't
remove XDG_RUNTIME_DIR => shepherd/socket is not removed => shepherd
fails to start after sway restart.

Toggle quote (53 lines)
>
> (I haven’t hit that problem myself.)
>
> Ludo’.
>
> Tom Willemse <tom@ryuslash.org> skribis:
>
>> Hi Guix!
>>
>> I've been using Guix on Archlinux for a little while now, and ever since
>> I've started using Guix Home on my laptop to start up user-level
>> services I've been having the issue that about 50% of the time when I
>> boot my laptop shepherd fails to start.
>>
>> My .xsession-errors says:
>>
>>> shepherd: while opening socket '/run/user/1000/shepherd/socket': bind:
>>> Address already in use
>>
>> and looking at my shepherd log:
>>
>>> 2022-09-15 11:47:18 Service root has been started.
>>> 2022-09-15 11:47:18 Service root has been started.
>>> 2022-09-15 11:47:19 Starting services...
>>> 2022-09-15 11:47:19 Starting services...
>>> 2022-09-15 11:47:19 Exiting shepherd...
>>> 2022-09-15 11:47:19 Service dunst has been started.
>>> 2022-09-15 11:47:19 Service unclutter has been started.
>>> 2022-09-15 11:47:19 Service syncthing has been started.
>>> 2022-09-15 11:47:19 Service polybar has been started.
>>> 2022-09-15 11:47:19 Service cmst has been started.
>>> 2022-09-15 11:47:19 Service kdeconnect has been started.
>>> 2022-09-15 11:47:20 Service xbindkeys has been started.
>>> 2022-09-15 11:47:20 Service picom has been started.
>>> 2022-09-15 11:47:20 Service xmodmap has been started.
>>> 2022-09-15 11:47:20 Service redshift has been started.
>>> 2022-09-15 11:47:20 Exiting shepherd...
>>> 2022-09-15 11:47:20 Service syncthing has been stopped.
>>> 2022-09-15 11:47:20 Service xbindkeys has been stopped.
>>> 2022-09-15 11:47:20 Service redshift has been stopped.
>>> 2022-09-15 11:47:20 Service cmst has been stopped.
>>> 2022-09-15 11:47:20 Service kdeconnect has been stopped.
>>> 2022-09-15 11:47:20 Service polybar has been stopped.
>>> 2022-09-15 11:47:20 Service dunst has been stopped.
>>> 2022-09-15 11:47:20 Service picom has been stopped.
>>> 2022-09-15 11:47:20 Service unclutter has been stopped.
>>> 2022-09-15 11:47:20 Exiting.
>>
>> It looks like it starts twice and then exits both, but I'm not sure why.
>> I'm guessing it's the ~/.guix-home/activate and
>> ~/.guix-home/on-first-login that are trying to start it.

~/.guix-home/activate should be launched only by guix home reconfigure,
so it shouldn't be touched during startup of the session at all, also
they both have a condition, which must prevent the start of shepherd if
socket exists.


Tom, can you show your startup scripts, please (like xsession or
whatever you use for starting graphical environment)? Sharing home
environment config can be useful as well. Do you use some display/login
manager?

Toggle quote (6 lines)
>>
>> I'm not sure what other information I can provide you that will help, so
>> please let me know!
>>


--
Best regards,
Andrew Tropin
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEKEGaxlA4dEDH6S/6IgjSCVjB3rAFAmNQ6JsACgkQIgjSCVjB
3rA3yQ/+OA0sNDZc5m2bOcVca3CTMk+PeMnrY7it1fIl2hbnLNoR5F3beEWGoKd4
me5Gg4eP+apCEKG8X+idNYkknk9Rlw7KAuizgupSoc+GN/RzpO8LWXayKvPmGGP1
V01MP1Om2fh3eXFkJEKrzsjcR3AXv/EF2vlzad9MX//6EVxAXbhdvy5mg35udpXP
gTGifrZTpweWdaKOtlaBw53daIQAD/JUrUyVCsZqG2oS1A7MXFBybr2b2ZSNoPok
spJbnCmOFil6Gv7A3L5vtAKh6aLSIT+9JvXJKTNhyNEl1B2+0zskd1nFG+8fHhFT
ffs3Y41xBPGW2rgaiKOakadtykqU54C3nrSCcXANX6OYsEdUHkEvI5rkNAzClwb/
Ux6ZtCNAvB1hRycQRG+QmdUsOAF2UATxzezE2GDYD9Bchl5tSDsFGt5A5IF9SH8Y
mtikmE8cJ3GX7ZP/XmV3kTIusKkMCRz2F7KtqhgG6InbClgoFHGPv9Hq+v+vQvDP
ftY1OuCDuRM8JNB35nhsr1SV5jKBu+QJOfkhv5NRWsUp4Vyxb1BXbtqqK9g83xGj
ZOoB5dWhEhJJ3z/WHU0Suf5qt725uqSPhdbHet06V9NlrjAQq9v+j5p4UzaW8WeG
UGLwKngyUlvMU0HWOeIIEf1DgZNBibgsztgnDk8efVoxqC63ZIg=
=0TGL
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 20 Oct 2022 23:44
(name . Andrew Tropin)(address . andrew@trop.in)
87mt9qb26c.fsf@gnu.org
Hi,

Andrew Tropin <andrew@trop.in> skribis:

Toggle quote (5 lines)
> Yes, I don't remember if I created a thread on that (probably not) or
> just discussed it in some chat, but when shepherd stops it doesn't clean
> up its socket file, so you can't start shepherd again until manually
> remove socket.

Right, but in the context of Guix Home, it should be started only once
anyway, right?

Thanks,
Ludo’.
A
A
Andrew Tropin wrote on 21 Oct 2022 06:23
(name . Ludovic Courtès)(address . ludo@gnu.org)
87wn8terex.fsf@trop.in
On 2022-10-20 23:44, Ludovic Courtès wrote:

Toggle quote (12 lines)
> Hi,
>
> Andrew Tropin <andrew@trop.in> skribis:
>
>> Yes, I don't remember if I created a thread on that (probably not) or
>> just discussed it in some chat, but when shepherd stops it doesn't clean
>> up its socket file, so you can't start shepherd again until manually
>> remove socket.
>
> Right, but in the context of Guix Home, it should be started only once
> anyway, right?

Right.

--
Best regards,
Andrew Tropin
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEKEGaxlA4dEDH6S/6IgjSCVjB3rAFAmNSHrYACgkQIgjSCVjB
3rCVNw/+LmjeftIi5yn7lS2UC+W1vV7eqn/iq5mT5B5RDMNlO+EdN0dIX9qXQulG
h5BKZx2X22O+EDl1G8sgjJgVOnpgnhc3+OlNClk6V0JfVtKaO5XqRRF2LiywxzG4
Y5YmE6LO0LA86yF0NqcUREAza6WxBR/fZzFM5Yhe3Nb6+l2GaLnsN84zVopoqBJX
BP+HihaEwFKN+He+JBUyT94Adt6kzGLwgj58TFUqJuLdbP9aYlOYiuWmeCmcmCdS
7EusKkGO+GIdXcoEqdozp9ieKyyCcmg175wQBW9nI3U1VDqQMt1lv6pEHX3owa0p
iQTWZzCarRpbAspa/VSRj9mZX/MJkUtk2p+lM94V+oavic+5REfA7QHg8YOuyTej
EeRViZ47g3KiFIZ2CIXtgQaAohGzEYS5lFsZOncf9d+mpFRGUwXvSjM82sPDY7kQ
1d/Jr1ZYqcdzwUHoJTzm5wQteAdg5DcNkjGjKJnl4cJfyBHKD+h/7/vKsa18YMvr
8yaIHjAIrYv7YcXWYxy7+vnz0KoVaJ/QtuNJShX/QtfLwgA9SNO3uvfu/TuQzanm
pcefAnlCnonENk8q/V3ZEeUyAEurmhTIS4w9BeLSrnYINQblkSqA61xLycPPoKIo
4ta/0FraMNkDJ0J1JO21Z/yOq+/u6HvhlzFtIdTYguQt3lxl5bc=
=Yntl
-----END PGP SIGNATURE-----

T
T
Tom Willemse wrote on 30 Oct 2022 07:22
(address . 57844@debbugs.gnu.org)
875yg1rft9.fsf@ryuslash.org
Hey!

Thanks for getting back to me on this! :)

Andrew Tropin <andrew@trop.in> writes:

Toggle quote (11 lines)
> On 2022-10-19 18:21, Ludovic Courtès wrote:
>
>>> It looks like it starts twice and then exits both, but I'm not sure why.
>>> I'm guessing it's the ~/.guix-home/activate and
>>> ~/.guix-home/on-first-login that are trying to start it.
>
> ~/.guix-home/activate should be launched only by guix home reconfigure,
> so it shouldn't be touched during startup of the session at all, also
> they both have a condition, which must prevent the start of shepherd if
> socket exists.

Yeah I know that they both have a check for the pid file, I remember
finding your bug report and patch earlier and hoping that it would fix
my issues, although now I understand that this would never have helped.
I thought that they might still both try to start shepherd, since
neither check nor execution of shepherd is atomic, so I figured they
might end up both starting shepherd, one creating the pid and starting,
the other failing to create it and crashing, but I couldn't think of why
the first one would stop as well then. Anyway, I guess that is
completely wrong if they don't actually ever get run at the same time :)

Toggle quote (8 lines)
> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/home/services/shepherd.scm?h=883fb8f41b08a8455f16c736a83fb1ae8a3df0a1#n131
>
> Tom, can you show your startup scripts, please (like xsession or
> whatever you use for starting graphical environment)? Sharing home
> environment config can be useful as well. Do you use some display/login
> manager?

My ~/.xinitrc is literally just:

exec /gnu/store/rl6f55i52lldzmbgg6z0ywr41ni4kbjg-herbstluftwm-0.9.4/bin/herbstluftwm --autostart /gnu/store/lqif0y90y7ipvf9aakdjisygnffk4999-autostart

I use the lightdm display manager installed with the Archlinux package
manager.

My home configuration on my laptop should be based on this:


It's spread across a few files, so if there's anything specific you want
to know please tell me, or if you need it in a different format I can
provide that too, I figured this would be the easiest way.

Thanks for your help!


Cheers,

Tom
Z
Z
Zacchaeus Scheffer wrote on 9 Jan 2023 18:17
(no subject)
(address . 57844@debbugs.gnu.org)
874jszhbjy.fsf@gmail.com
Hi all,

Not sure if it is relevant, but I have often had this problem, and it
has always been from an orphaned syncthing process. I.e, the user login
session which has my guix home services running in it ends, but the
syncthing process is not terminated. Then I start a new login, and it
tries to start a NEW syncthing service, and gives this same error.

You can check this (next time syncthing fails to start) with:
ps -e | grep syncthing

Hope this helps,
Zacchae
Z
Z
Zacchaeus Scheffer wrote on 9 Jan 2023 19:17
Re:
(address . 57844@debbugs.gnu.org)
CAJejy7kWFE7XPK0dv+hL45K212n1A_YnJm5arAnx2os08-6A+A@mail.gmail.com
Oh wow, should have read closer. That's a shepherd socket, not a syncthing
socket. (happened across this thread searching syncthing)

Please Disregard,
Zacchae
Attachment: file
?