System tests stuck in "shepherd[1]: waiting for udevd..."

  • Done
  • quality assurance status badge
Details
4 participants
  • Danny Milosavljevic
  • Ludovic Courtès
  • Mark H Weaver
  • zimoun
Owner
unassigned
Submitted by
Mark H Weaver
Severity
important
M
M
Mark H Weaver wrote on 13 Nov 2018 02:09
(address . bug-guix@gnu.org)
871s7pu6k1.fsf@netris.org
As I write this, there are two system test builds that have been stuck
for many hours, endlessly printing "shepherd[1]: waiting for udevd...":


They are both on i686-linux, and on the 'core-updates' branch, but with
two different build slaves (hydra.gnunet.org and guix.sjd.se).

I will now abort these builds, to free up build slots for other jobs.

Mark
M
M
Mark H Weaver wrote on 13 Nov 2018 21:51
(address . 33362@debbugs.gnu.org)
87wopgpunx.fsf@netris.org
Mark H Weaver <mhw@netris.org> writes:

Toggle quote (11 lines)
> As I write this, there are two system test builds that have been stuck
> for many hours, endlessly printing "shepherd[1]: waiting for udevd...":
>
> https://hydra.gnu.org/build/3153725
> https://hydra.gnu.org/build/3154365
>
> They are both on i686-linux, and on the 'core-updates' branch, but with
> two different build slaves (hydra.gnunet.org and guix.sjd.se).
>
> I will now abort these builds, to free up build slots for other jobs.

Today I found another Hydra build stuck, and aborted it. Moreover, I
cancelled all 'core-updates' jobs starting with "test.", to prevent more
build slots being held for long periods of time.

I expect that Berlin's build farm will also have several build slots
held in this way, but I'm not sure how to find out if it's happening,
or how to abort those jobs.

Mark
L
L
Ludovic Courtès wrote on 14 Nov 2018 11:14
control message for bug #33362
(address . control@debbugs.gnu.org)
871s7oklt1.fsf@gnu.org
severity 33362 important
L
L
Ludovic Courtès wrote on 20 Nov 2018 17:02
Re: bug#33362: System tests stuck in "shepherd[1]: waiting for udevd..."
(name . Mark H Weaver)(address . mhw@netris.org)(address . 33362@debbugs.gnu.org)
87zhu3ybws.fsf@gnu.org
Hi Mark,

Mark H Weaver <mhw@netris.org> skribis:

Toggle quote (3 lines)
> As I write this, there are two system test builds that have been stuck
> for many hours, endlessly printing "shepherd[1]: waiting for udevd...":

Basic system tests pass for me on current-ish ‘core-updates’, on x86_64:

Toggle snippet (11 lines)
$ git describe
v0.15.0-4036-g2d546858b1
$ make check-system TESTS='basic openssh mcron'
Compiling Scheme modules...
Running 3 system tests...
TOTAL: 3
PASS: /gnu/store/3zywc7y192p0n8l2cbm58vdkbpi3dchw-basic
PASS: /gnu/store/sirffl42jh6as45fy98frdfaglkgqif0-mcron
PASS: /gnu/store/pdw9jxq745w4833jx4hzfma6c48kwda8-openssh

Does the problem still show up for you?

Ludo’.
L
L
Ludovic Courtès wrote on 23 Nov 2018 18:25
control message for bug #33362
(address . control@debbugs.gnu.org)
87va4n3duu.fsf@gnu.org
tags 33362 moreinfo
M
M
Mark H Weaver wrote on 29 Nov 2018 23:58
Re: bug#33362: System tests stuck in "shepherd[1]: waiting for udevd..."
(address . 33362@debbugs.gnu.org)
87woovpk2v.fsf@netris.org
Mark H Weaver <mhw@netris.org> writes:

Toggle quote (17 lines)
> Mark H Weaver <mhw@netris.org> writes:
>
>> As I write this, there are two system test builds that have been stuck
>> for many hours, endlessly printing "shepherd[1]: waiting for udevd...":
>>
>> https://hydra.gnu.org/build/3153725
>> https://hydra.gnu.org/build/3154365
>>
>> They are both on i686-linux, and on the 'core-updates' branch, but with
>> two different build slaves (hydra.gnunet.org and guix.sjd.se).
>>
>> I will now abort these builds, to free up build slots for other jobs.
>
> Today I found another Hydra build stuck, and aborted it. Moreover, I
> cancelled all 'core-updates' jobs starting with "test.", to prevent more
> build slots being held for long periods of time.

This has now happened on the 'master' branch, on x86_64:


As I write this, this job has held 1 out of 3 build slots on x86_64 for
over 12 hours, waiting for udevd.

I forcefully killed the build.

Mark


Toggle snippet (303 lines)
@ build-started /gnu/store/30n23jvd99acs55k31frvl0lx1x3g5zf-bitlbee-test.drv - x86_64-linux /var/log/guix/drvs/30//n23jvd99acs55k31frvl0lx1x3g5zf-bitlbee-test.drv.bz2 513
^[c^[[?7l^[[2J^[[0mSeaBIOS (version rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org)


iPXE (http://ipxe.org) 00:03.0 C980 PCI2.10 PnP PMM+0FF91290+0FEF1290 C980
Press Ctrl-B to configure iPXE (PCI 00:03.0)...


Booting from ROM..^[c^[[?7l^[[2J[ 0.000000] Linux version 4.19.5-gnu (nixbld@) (gcc version 7.3.0 (GCC)) #1 SMP 1
[ 0.000000] Command line: console=ttyS0 --root=/dev/vda1 --system=/gnu/store/9avywswbm01qr2xin1s6s1fryp17rqpz-system --load=/gnu/store/9avywswbm01qr2xin1s6s1fryp17rqpz-system/boot panic=1
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000ffddfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000ffde000-0x000000000fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[ 0.000000] last_pfn = 0xffde max_arch_pfn = 0x400000000
[ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.000000] found SMP MP-table at [mem 0x000f5920-0x000f592f] mapped at [(____ptrval____)]
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] RAMDISK: [mem 0x0f4cf000-0x0ffcffff]
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000000F5730 000014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 0x000000000FFE157C 000030 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 0x000000000FFE1458 000074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 0x000000000FFE0040 001418 (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACS 0x000000000FFE0000 000040
[ 0.000000] ACPI: APIC 0x000000000FFE14CC 000078 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 0x000000000FFE1544 000038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000000ffddfff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x0ffda000-0x0ffddfff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000000001000-0x000000000ffddfff]
[ 0.000000] Normal empty
[ 0.000000] Device empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x000000000ffddfff]
[ 0.000000] Reserved but unavailable: 132 pages
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000000ffddfff]
[ 0.000000] ACPI: PM-Timer IO Port: 0x608
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[ 0.000000] [mem 0x10000000-0xfffbffff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] random: get_random_bytes called from start_kernel+0x99/0x537 with crng_init=0
[ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] percpu: Embedded 45 pages/cpu @(____ptrval____) s147288 r8192 d28840 u2097152
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64359
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: console=ttyS0 --root=/dev/vda1 --system=/gnu/store/9avywswbm01qr2xin1s6s1fryp17rqpz-system --load=/gnu/store/9avywswbm01qr2xin1s6s1fryp17rqpz-system/boot panic=1
[ 0.000000] Memory: 220076K/261616K available (12300K kernel code, 1575K rwdata, 3884K rodata, 1824K init, 2632K bss, 41540K reserved, 0K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] ftrace: allocating 39285 entries in 154 pages
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.000000] NR_IRQS: 16640, nr_irqs: 256, preallocated irqs: 16
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [ttyS0] enabled
[ 0.000000] ACPI: Core revision 20180810
[ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[ 0.008000] APIC: Switch to symmetric I/O mode setup
[ 0.012000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.048000] tsc: Unable to calibrate against PIT
[ 0.048000] tsc: using HPET reference calibration
[ 0.048000] tsc: Detected 2792.855 MHz processor
[ 0.001990] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2841e2f9134, max_idle_ns: 440795219952 ns
[ 0.003887] Calibrating delay loop (skipped), value calculated using timer frequency.. 5585.71 BogoMIPS (lpj=11171420)
[ 0.007669] pid_max: default: 32768 minimum: 301
[ 0.009292] Security Framework initialized
[ 0.009651] Yama: becoming mindful.
[ 0.013219] AppArmor: AppArmor initialized
[ 0.016697] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
[ 0.017834] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.018652] Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.019069] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.039980] mce: CPU supports 10 MCE banks
[ 0.041659] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.041976] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.042516] Spectre V2 : Spectre mitigation: LFENCE not serializing, switching to generic retpoline
[ 0.043035] Spectre V2 : Mitigation: Full generic retpoline
[ 0.043339] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 0.043798] Speculative Store Bypass: Vulnerable
[ 0.548955] Freeing SMP alternatives memory: 28K
[ 0.590970] smpboot: CPU0: AMD QEMU Virtual CPU version 2.5+ (family: 0x6, model: 0x6, stepping: 0x3)
[ 0.596752] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.599826] rcu: Hierarchical SRCU implementation.
[ 0.608287] NMI watchdog: Perf NMI watchdog permanently disabled
[ 0.613279] smp: Bringing up secondary CPUs ...
[ 0.613728] smp: Brought up 1 node, 1 CPU
[ 0.614040] smpboot: Max logical packages: 1
[ 0.614395] smpboot: Total of 1 processors activated (5585.71 BogoMIPS)
[ 0.630298] devtmpfs: initialized
[ 0.635460] x86/mm: Memory block size: 128MB
[ 0.644658] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.645509] futex hash table entries: 256 (order: 2, 16384 bytes)
[ 0.649879] pinctrl core: initialized pinctrl subsystem
[ 0.659269] RTC time: 10:09:11, date: 11/29/18
[ 0.664985] NET: Registered protocol family 16
[ 0.669097] audit: initializing netlink subsys (disabled)
[ 0.675447] audit: type=2000 audit(1543486150.716:1): state=initialized audit_enabled=0 res=1
[ 0.676794] cpuidle: using governor ladder
[ 0.677189] cpuidle: using governor menu
[ 0.679203] ACPI: bus type PCI registered
[ 0.679581] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 0.682951] PCI: Using configuration type 1 for base access
[ 0.721243] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.730282] ACPI: Added _OSI(Module Device)
[ 0.730574] ACPI: Added _OSI(Processor Device)
[ 0.730920] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.731274] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.731702] ACPI: Added _OSI(Linux-Dell-Video)
[ 0.731965] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 0.756160] ACPI: 1 ACPI AML tables successfully acquired and loaded
[ 0.783269] ACPI: Interpreter enabled
[ 0.784934] ACPI: (supports S0 S3 S4 S5)
[ 0.785294] ACPI: Using IOAPIC for interrupt routing
[ 0.786266] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 0.789697] ACPI: Enabled 2 GPEs in block 00 to 0F
[ 0.852693] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 0.853619] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[ 0.854339] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[ 0.855466] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 0.865564] acpiphp: Slot [3] registered
[ 0.866061] acpiphp: Slot [4] registered
[ 0.866487] acpiphp: Slot [5] registered
[ 0.867004] acpiphp: Slot [6] registered
[ 0.867446] acpiphp: Slot [7] registered
[ 0.867950] acpiphp: Slot [8] registered
[ 0.868414] acpiphp: Slot [9] registered
[ 0.868842] acpiphp: Slot [10] registered
[ 0.869297] acpiphp: Slot [11] registered
[ 0.869749] acpiphp: Slot [12] registered
[ 0.870172] acpiphp: Slot [13] registered
[ 0.870631] acpiphp: Slot [14] registered
[ 0.871244] acpiphp: Slot [15] registered
[ 0.871684] acpiphp: Slot [16] registered
[ 0.872149] acpiphp: Slot [17] registered
[ 0.872583] acpiphp: Slot [18] registered
[ 0.873008] acpiphp: Slot [19] registered
[ 0.873491] acpiphp: Slot [20] registered
[ 0.873972] acpiphp: Slot [21] registered
[ 0.874408] acpiphp: Slot [22] registered
[ 0.874880] acpiphp: Slot [23] registered
[ 0.875435] acpiphp: Slot [24] registered
[ 0.875867] acpiphp: Slot [25] registered
[ 0.876296] acpiphp: Slot [26] registered
[ 0.876749] acpiphp: Slot [27] registered
[ 0.877186] acpiphp: Slot [28] registered
[ 0.877719] acpiphp: Slot [29] registered
[ 0.878149] acpiphp: Slot [30] registered
[ 0.878580] acpiphp: Slot [31] registered
[ 0.879290] PCI host bridge to bus 0000:00
[ 0.879731] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
[ 0.880135] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
[ 0.880540] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[ 0.880960] pci_bus 0000:00: root bus resource [mem 0x10000000-0xfebfffff window]
[ 0.881400] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window]
[ 0.881998] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.898402] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7]
[ 0.899062] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6]
[ 0.899479] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177]
[ 0.899911] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376]
[ 0.902479] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI
[ 0.903096] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB
[ 0.999262] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[ 1.000854] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[ 1.002149] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[ 1.003542] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[ 1.004334] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[ 1.016401] pci 0000:00:02.0: vgaarb: setting as boot VGA device
[ 1.016902] pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[ 1.017508] pci 0000:00:02.0: vgaarb: bridge control possible
[ 1.017951] vgaarb: loaded
[ 1.023855] SCSI subsystem initialized
[ 1.025999] ACPI: bus type USB registered
[ 1.026856] usbcore: registered new interface driver usbfs
[ 1.027391] usbcore: registered new interface driver hub
[ 1.028088] usbcore: registered new device driver usb
[ 1.030143] EDAC MC: Ver: 3.0.0
[ 1.034569] PCI: Using ACPI for IRQ routing
[ 1.047072] NetLabel: Initializing
[ 1.047338] NetLabel: domain hash size = 128
[ 1.047597] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 1.050042] NetLabel: unlabeled traffic allowed by default
[ 1.055387] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[ 1.056087] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[ 1.056488] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 1.062065] clocksource: Switched to clocksource tsc-early
[ 1.270005] VFS: Disk quotas dquot_6.6.0
[ 1.270604] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 1.277314] AppArmor: AppArmor Filesystem Enabled
[ 1.278662] pnp: PnP ACPI init
[ 1.287379] pnp: PnP ACPI: found 6 devices
[ 1.323353] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 1.326579] NET: Registered protocol family 2
[ 1.334899] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes)
[ 1.336156] TCP established hash table entries: 2048 (order: 2, 16384 bytes)
[ 1.336797] TCP bind hash table entries: 2048 (order: 3, 32768 bytes)
[ 1.337313] TCP: Hash tables configured (established 2048 bind 2048)
[ 1.339653] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 1.340237] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 1.342698] NET: Registered protocol family 1
[ 1.343742] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 1.344229] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 1.344657] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 1.345351] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[ 1.354709] Trying to unpack rootfs image as initramfs...
[ 2.708916] Freeing initrd memory: 11268K
[ 2.711141] Scanning for low memory corruption every 60 seconds
[ 2.722225] Initialise system trusted keyrings
[ 2.724539] workingset: timestamp_bits=40 max_order=16 bucket_order=0
[ 2.742365] zbud: loaded
[ 2.753683] pstore: using deflate compression
[ 2.755963] Allocating IMA blacklist keyring.
[ 2.787798] Key type asymmetric registered
[ 2.788317] Asymmetric key parser 'x509' registered
[ 2.789209] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[ 2.791500] io scheduler noop registered
[ 2.791912] io scheduler deadline registered (default)
[ 2.793536] io scheduler cfq registered
[ 2.798310] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 2.804456] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[ 2.808795] ACPI: Power Button [PWRF]
[ 2.820362] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[ 2.845154] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 2.882637] Linux agpgart interface v0.103
[ 2.942331] brd: module loaded
[ 2.966458] loop: module loaded
[ 2.985023] scsi host0: ata_piix
[ 2.988709] scsi host1: ata_piix
[ 2.989888] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc100 irq 14
[ 2.990448] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc108 irq 15
[ 2.998922] libphy: Fixed MDIO Bus: probed
[ 2.999505] tun: Universal TUN/TAP device driver, 1.6
[ 3.001512] PPP generic driver version 2.4.2
[ 3.003308] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 3.003843] ehci-pci: EHCI PCI platform driver
[ 3.004440] ehci-platform: EHCI generic platform driver
[ 3.004953] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 3.005453] ohci-pci: OHCI PCI platform driver
[ 3.006120] ohci-platform: OHCI generic platform driver
[ 3.006690] uhci_hcd: USB Universal Host Controller Interface driver
[ 3.009539] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 3.014828] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 3.015641] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 3.018958] mousedev: PS/2 mouse device common for all mice
[ 3.024949] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[ 3.026863] rtc_cmos 00:00: RTC can wake from S4
[ 3.032658] rtc_cmos 00:00: registered as rtc0
[ 3.033973] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
[ 3.034644] i2c /dev entries driver
[ 3.036056] device-mapper: uevent: version 1.0.3
[ 3.042686] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
[ 3.044218] ledtrig-cpu: registered to indicate activity on CPUs
[ 3.054662] NET: Registered protocol family 10
[ 3.078608] Segment Routing with IPv6
[ 3.081437] NET: Registered protocol family 17
[ 3.082138] Key type dns_resolver registered
[ 3.089595] sched_clock: Marking stable (3137613189, -49029043)->(3206256657, -117672511)
[ 3.100722] registered taskstats version 1
[ 3.101059] Loading compiled-in X.509 certificates
[ 3.102658] zswap: loaded using pool lzo/zbud
[ 3.136596] Key type big_key registered
[ 3.148083] Key type trusted registered
[ 3.150698] Key type encrypted registered
[ 3.158853] AppArmor: AppArmor sha1 policy hashing enabled
[ 3.172714] ima: No TPM chip found, activating TPM-bypass!
[ 3.173294] ima: Allocated hash algorithm: sha1
[ 3.181142] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[ 3.192938] evm: Initialising EVM extended attributes:
[ 3.193283] evm: security.selinux
[ 3.193522] evm: security.SMACK64
[ 3.193727] evm: security.SMACK64EXEC
[ 3.193931] evm: security.SMACK64TRANSMUTE
[ 3.194153] evm: security.SMACK64MMAP
[ 3
This message was truncated. Download the full message here.
D
D
Danny Milosavljevic wrote on 30 Nov 2018 00:19
(name . Mark H Weaver)(address . mhw@netris.org)(address . 33362@debbugs.gnu.org)
20181130001908.45041457@scratchpost.org
Hmm, I wonder whether it might make sense to add "--debug" to udevd in gnu/services/base.scm in order to track down this problem.

The loop looks rather harmless and only checks whether the actual udev socket is available - which it must be for udevd to work.

It does the following right now:

(start #~(lambda ()
(define udevd
;; 'udevd' from eudev.
#$(file-append udev "/sbin/udevd")) ; TODO: Add ' "--debug"' here

(define (wait-for-udevd)
;; Wait until someone's listening on udevd's control
;; socket.
(let ((sock (socket AF_UNIX SOCK_SEQPACKET 0)))
(let try ()
(catch 'system-error
(lambda ()
(connect sock PF_UNIX "/run/udev/control")
(close-port sock))
(lambda args
(format #t "waiting for udevd...~%")
(usleep 500000)
(try))))))
[...]
(let ((pid (fork+exec-command (list udevd))))
;; Wait until udevd is up and running. This appears to
;; be needed so that the events triggered below are
;; actually handled.
(wait-for-udevd)
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAlwAc+wACgkQ5xo1VCww
uqW+owf+KhLwqkdHf09ZLJC6MdNUCgZKDFv3Ag9EfjWDPFNQLffv3UPdMTo6ausA
IjB2fWG1whJO16jQco6WyNoFJ2p86bKiNXeMGZOwJgF3w6e4aFaiO//mN4oBFaeK
cwY6WpPmfeh7jXU2+sy0HH45YQrA0m5Z2lkP5A7CRLHhXmEUcaQgUjfLHjcVDrL3
Bc3YT/iGdqGQQbgtDxrX8dNxQUbX3OyI9zu1dAhpXvoqk1/edwym0oR8dNJptxfb
LbyfDjUU0ig/LuGYiAFUJASPnhbCvlU63yt7esX+sKK3vkkPSqZq7g6PPgn9VuND
Ds1iq0/2rIii/YAt46wV4zpGJ9wUEA==
=63vA
-----END PGP SIGNATURE-----


M
M
M
Z
Z
zimoun wrote on 26 Nov 2021 02:34
Re: bug#33362: System tests stuck in "shepherd[1]: waiting for udevd..."
(name . Mark H Weaver)(address . mhw@netris.org)(address . 33362@debbugs.gnu.org)
86o867r97l.fsf@gmail.com
Hi Mark,

On Mon, 12 Nov 2018 at 20:09, Mark H Weaver <mhw@netris.org> wrote:

Toggle quote (11 lines)
> As I write this, there are two system test builds that have been stuck
> for many hours, endlessly printing "shepherd[1]: waiting for udevd...":
>
> https://hydra.gnu.org/build/3153725
> https://hydra.gnu.org/build/3154365
>
> They are both on i686-linux, and on the 'core-updates' branch, but with
> two different build slaves (hydra.gnunet.org and guix.sjd.se).
>
> I will now abort these builds, to free up build slots for other jobs.

I am doing bug triage and I hit this old one #33362 [1] from 2018. It
is about hydra which is down now, IIRC. Does it still make sense to
keep it open? Or can we close it?



Cheers,
simon
M
M
Mark H Weaver wrote on 27 Nov 2021 00:21
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 33362@debbugs.gnu.org)
87czmmpkov.fsf@netris.org
Hi Simon,

zimoun <zimon.toutoune@gmail.com> writes:

Toggle quote (16 lines)
> On Mon, 12 Nov 2018 at 20:09, Mark H Weaver <mhw@netris.org> wrote:
>
>> As I write this, there are two system test builds that have been stuck
>> for many hours, endlessly printing "shepherd[1]: waiting for udevd...":
>>
>> https://hydra.gnu.org/build/3153725
>> https://hydra.gnu.org/build/3154365
>>
>> They are both on i686-linux, and on the 'core-updates' branch, but with
>> two different build slaves (hydra.gnunet.org and guix.sjd.se).
>>
>> I will now abort these builds, to free up build slots for other jobs.
>
> I am doing bug triage and I hit this old one #33362 [1] from 2018. It
> is about hydra which is down now, IIRC.

I doubt that this bug was about Hydra. It guess that the bug was in our
system test derivations. As far as I know, the only relevance of Hydra
to this bug is that Hydra was our CI system at that time, and therefore
it was Hydra that brought this bug to my attention.

Toggle quote (2 lines)
> Does it still make sense to keep it open? Or can we close it?

If the bug hasn't occurred recently, then I agree it's okay to close it.

It would be good to check our modern Cuirass-based ci.guix.gnu.org to
find out whether this failure mode is still occurring in our system
tests.

I see that Cuirass's web interface has improved quite significantly in
the last couple of years, and I'm very grateful to those who've worked
on it. However, Cuirass still seems to be missing some important
functionality that Hydra had. Most notably, unless I missed something,
it seems to lack the ability to compare the results of two evaluations
and show the *differences* between those results, i.e. to enumerate the
newly failing jobs, the newly succeeding jobs, and the newly aborted
jobs.

Without that functionality, it's not easy for us to notice when a job
starts to fail, unless a user files a bug report. The total list of job
failures has always been too large to easily notice changes in that list
without assistance.

What do you think?

Thanks again for your work on this.

Regards,
Mark

--
Disinformation flourishes because many people care deeply about injustice
but very few check the facts. Ask me about https://stallmansupport.org.
Z
Z
zimoun wrote on 5 Jan 2022 00:21
(name . Mark H Weaver)(address . mhw@netris.org)
86k0ffaxyl.fsf@gmail.com
Hi Mark,

CC: Mathieu, maybe they have an insight for Cuirass.

On Fri, 26 Nov 2021 at 18:21, Mark H Weaver <mhw@netris.org> wrote:
Toggle quote (21 lines)
>> On Mon, 12 Nov 2018 at 20:09, Mark H Weaver <mhw@netris.org> wrote:
>>
>>> As I write this, there are two system test builds that have been stuck
>>> for many hours, endlessly printing "shepherd[1]: waiting for udevd...":
>>>
>>> https://hydra.gnu.org/build/3153725
>>> https://hydra.gnu.org/build/3154365
>>>
>>> They are both on i686-linux, and on the 'core-updates' branch, but with
>>> two different build slaves (hydra.gnunet.org and guix.sjd.se).
>>>
>>> I will now abort these builds, to free up build slots for other jobs.
>>
>> I am doing bug triage and I hit this old one #33362 [1] from 2018. It
>> is about hydra which is down now, IIRC.
>
> I doubt that this bug was about Hydra. It guess that the bug was in our
> system test derivations. As far as I know, the only relevance of Hydra
> to this bug is that Hydra was our CI system at that time, and therefore
> it was Hydra that brought this bug to my attention.

I agree, but I am not able to connect to the mentioned logs. Are the
links still working?


Toggle quote (22 lines)
>> Does it still make sense to keep it open? Or can we close it?
>
> If the bug hasn't occurred recently, then I agree it's okay to close it.
>
> It would be good to check our modern Cuirass-based ci.guix.gnu.org to
> find out whether this failure mode is still occurring in our system
> tests.
>
> I see that Cuirass's web interface has improved quite significantly in
> the last couple of years, and I'm very grateful to those who've worked
> on it. However, Cuirass still seems to be missing some important
> functionality that Hydra had. Most notably, unless I missed something,
> it seems to lack the ability to compare the results of two evaluations
> and show the *differences* between those results, i.e. to enumerate the
> newly failing jobs, the newly succeeding jobs, and the newly aborted
> jobs.
>
> Without that functionality, it's not easy for us to notice when a job
> starts to fail, unless a user files a bug report. The total list of job
> failures has always been too large to easily notice changes in that list
> without assistance.

I agree. Maybe Mathieu could comment more because this missing feature
is floating around. :-)


Cheers,
simon
Z
Z
zimoun wrote on 3 Feb 2022 03:37
(name . Mark H Weaver)(address . mhw@netris.org)
86fsp01xr3.fsf@gmail.com
Hi,

On Wed, 05 Jan 2022 at 00:21, zimoun <zimon.toutoune@gmail.com> wrote:
Toggle quote (51 lines)
> On Fri, 26 Nov 2021 at 18:21, Mark H Weaver <mhw@netris.org> wrote:
>>> On Mon, 12 Nov 2018 at 20:09, Mark H Weaver <mhw@netris.org> wrote:
>>>
>>>> As I write this, there are two system test builds that have been stuck
>>>> for many hours, endlessly printing "shepherd[1]: waiting for udevd...":
>>>>
>>>> https://hydra.gnu.org/build/3153725
>>>> https://hydra.gnu.org/build/3154365
>>>>
>>>> They are both on i686-linux, and on the 'core-updates' branch, but with
>>>> two different build slaves (hydra.gnunet.org and guix.sjd.se).
>>>>
>>>> I will now abort these builds, to free up build slots for other jobs.
>>>
>>> I am doing bug triage and I hit this old one #33362 [1] from 2018. It
>>> is about hydra which is down now, IIRC.
>>
>> I doubt that this bug was about Hydra. It guess that the bug was in our
>> system test derivations. As far as I know, the only relevance of Hydra
>> to this bug is that Hydra was our CI system at that time, and therefore
>> it was Hydra that brought this bug to my attention.
>
> I agree, but I am not able to connect to the mentioned logs. Are the
> links still working?
>
>
>>> Does it still make sense to keep it open? Or can we close it?
>>
>> If the bug hasn't occurred recently, then I agree it's okay to close it.
>>
>> It would be good to check our modern Cuirass-based ci.guix.gnu.org to
>> find out whether this failure mode is still occurring in our system
>> tests.
>>
>> I see that Cuirass's web interface has improved quite significantly in
>> the last couple of years, and I'm very grateful to those who've worked
>> on it. However, Cuirass still seems to be missing some important
>> functionality that Hydra had. Most notably, unless I missed something,
>> it seems to lack the ability to compare the results of two evaluations
>> and show the *differences* between those results, i.e. to enumerate the
>> newly failing jobs, the newly succeeding jobs, and the newly aborted
>> jobs.
>>
>> Without that functionality, it's not easy for us to notice when a job
>> starts to fail, unless a user files a bug report. The total list of job
>> failures has always been too large to easily notice changes in that list
>> without assistance.
>
> I agree. Maybe Mathieu could comment more because this missing feature
> is floating around. :-)

Since the initial report does not seem to have occurred recently and
because the feature is not related (or let open another report), I am
closing.

Feel free to reopen if I miss an important point.


Cheers,
simon
Closed
?