[Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 28 Nov 2023 10:09
[Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive
(address . bug-guix@gnu.org)
87zfyyutux.fsf@inria.fr
On the OverDrive (AArch64), ‘cuirass remote-worker’ (1.2.0-1.bdc1f9f) says:

starting 2 workers (parallelism: 1 cores) for server at 10.0.0.1

Instead it should use two cores for each worker:

Toggle snippet (10 lines)
ludo@dover ~$ guile -c '(use-modules (ice-9 threads)) (pk (current-processor-count))'

;;; (4)
ludo@dover ~$ guile -c '(use-modules (ice-9 threads)) (pk (ceiling-quotient (current-processor-count) 2))'

;;; (2)
ludo@dover ~$ nproc
4

Since ‘current-processor-count’ is implemented indirectly in terms of
‘sched_getaffinity’, this suggests that the process starts with a bogus
affinity mask. (Time passes…) That’s indeed the case:

Toggle snippet (13 lines)
ludo@dover ~$ sudo herd status cuirass-remote-worker
Status of cuirass-remote-worker:
It is started.
Running value is 21279.
It is enabled.
Provides (cuirass-remote-worker).
Requires (avahi-daemon guix-daemon networking).
Will be respawned.
ludo@dover ~$ guile -c '(pk (getaffinity 21279))'

;;; (#*1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)

Compare to the affinity mask on x86_64-linux-gnu:

Toggle snippet (5 lines)
ludo@guix-hpc3 ~$ sudo guile -c '(pk (getaffinity 1817))'

;;; (#*1111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)

Interesting that the initial affinity mask differs on aarch64-linux-gnu
compared to x86_64-linux-gnu.

Ludo’.
L
L
Ludovic Courtès wrote on 28 Nov 2023 16:28
Re: bug#67502: [Cuirass] ‘cuirass remote-worker ’ gets the CPU count wrong on the OverDrive
(address . 67502-done@debbugs.gnu.org)
87cyvtsxql.fsf@gnu.org
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

Toggle quote (12 lines)
> ludo@dover ~$ sudo herd status cuirass-remote-worker
> Status of cuirass-remote-worker:
> It is started.
> Running value is 21279.
> It is enabled.
> Provides (cuirass-remote-worker).
> Requires (avahi-daemon guix-daemon networking).
> Will be respawned.
> ludo@dover ~$ guile -c '(pk (getaffinity 21279))'
>
> ;;; (#*1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)

This was due to ‘run-fibers’ binding one thread per CPU core. Thus,
calling ‘getaffinity’ from within ‘run-fibers’ shows only one CPU and
likewise ‘current-processor-count’ returns 1.

Fixed in Cuirass commit 87a6d6ea7ae79fdf487bbcfd44bb3dce2d7c6e82.

Ludo’.
Closed
?