Evaluations do not complete since commit c30d117822a8ca26cd8c06c0a3974955bef68eac

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
serious
L
L
Ludovic Courtès wrote on 3 Jul 2019 12:28
(address . bug-Guix@gnu.org)(name . Christopher Baines)(address . mail@cbaines.net)
87r2772yze.fsf@gnu.org
Hello,

Chris reported that evaluations “hang” since commit
c30d117822a8ca26cd8c06c0a3974955bef68eac, as can be seen at:


as well as:


shows that it hangs while computing i686 derivations:

Toggle snippet (9 lines)
[…]
debug: Starting getting inferior derivations
heap size: 1579.0 MiB
debug: Starting getting derivations for ((x86_64-linux . x86_64-linux))
debug: Finished getting derivations for ((x86_64-linux . x86_64-linux)), took 252 seconds
heap size: 1579.0 MiB
debug: Starting getting derivations for ((i686-linux . i686-linux))

If I attach to the ‘guix repl -t machine’ process started by the Cuirass
evaluation process, I see this:

Toggle snippet (22 lines)
#0 0x00007fc5248d5fb8 in _dl_update_slotinfo (req_modid=1) at ../elf/dl-tls.c:623
#1 0x00007fc5248d620c in update_get_addr (ti=0x7fc5248ab240) at ../elf/dl-tls.c:799
#2 0x00007fc5248da838 in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#3 0x00007fc5247ddbe0 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#4 0x00007fc5247cda18 in array_compare () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#5 0x00007fc5247cd9c7 in array_compare () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#6 0x00007fc5247ce675 in scm_array_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#7 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#8 0x00007fc5247ddc82 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#9 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#10 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#11 0x00007fc5247ddc82 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#101 0x00007fc5247ddc82 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#102 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#103 0x00007fc5247cbb38 in scm_sloppy_assoc () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#104 0x00007fc5247f3a92 in scm_hash_fn_ref () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#105 0x00007fc524859715 in vm_regular_engine () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#106 0x00007fc52485cd6a in scm_call_n () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
#107 0x00007fc5247e0da7 in scm_primitive_eval () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1

and with debugging symbols:

Toggle snippet (8 lines)
#3 0x00007fc5247ddc23 in scm_equal_p (x=x@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile",
y=y@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile") at eq.c:290
#4 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation>>" = {...}, s2="#<struct <derivation>>" = {...}) at struct.c:693
#5 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation-input>>" = {...}, s2="#<struct <derivation-input>>" = {...}) at struct.c:693
#6 0x00007fc5247ddc82 in scm_equal_p (
x=(#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/jc5q4i55hrilibrmacs38skk3cpg2zyb-glibc-bootstrap-0" #f #f #f 3ade740>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/nbi0mzk1dw37nnnahw9akn3bwzwi4gdk-module-import-compiled" #f #f #f 3ae6e00>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0" #f #f #f 3af1b40>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz" sha256 #vu8(183 87 205 70 191 19 236 172 131 251 142 149 95 181 0 150 172 45 23 187 97 12 168 235 129 111 41 48 42 0 168 70) #f 3af10c0>)) () ("/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors" "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") "i686-linux" "builtin:download" () (("content-addressed-mirrors" . "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") ("impureEnvVars" . "http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMNS") ("mirrors" . "/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors") ("out" . "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz") ("preferLocalBuild" . "1") ("url" . "(\"https://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"ftp://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://www.fdn.fr/~lcourtes/software/guix/packages/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://flashner.co.il/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\")")) "/gnu/store/jagg4l53ar1m4hsgqlpisw9i4rq2whzd-guile-2.0.9.tar.xz.drv" 3af14e0> ("out") 3af1780>) ("/gnu/store/d32948vi9bs5mmbnvlczqcgwwdqvygiw-build-bootstrap-guile.sh" "/gnu/store/mzfkrxd4w8vqrmyrx169wj8wyw7r8i37-bash") "i686-linux" "/gnu/store/mzfkrxd4w8 […]

So this process is busy comparing deep structures, probably as a
consequence of 5cf4b26d52bcea382d98fb4becce89be9ee37b55.

Indeed, the load on berlin is at above 60 currently, with all these
evaluations processes consuming CPU…

To be continued…

Ludo’.
L
L
Ludovic Courtès wrote on 3 Jul 2019 12:36
control message for bug #36487
(address . control@debbugs.gnu.org)
87pnmr2ymc.fsf@gnu.org
severity 36487 serious
quit
L
L
Ludovic Courtès wrote on 3 Jul 2019 19:01
Re: bug#36487: Evaluations do not complete since commit c30d117822a8ca26cd8c06c0a3974955bef68eac
875zoj2gtp.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (18 lines)
> #104 0x00007fc5247f3a92 in scm_hash_fn_ref () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
> #105 0x00007fc524859715 in vm_regular_engine () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
> #106 0x00007fc52485cd6a in scm_call_n () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
> #107 0x00007fc5247e0da7 in scm_primitive_eval () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
>
>
> and with debugging symbols:
>
> #3 0x00007fc5247ddc23 in scm_equal_p (x=x@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile",
> y=y@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile") at eq.c:290
> #4 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation>>" = {...}, s2="#<struct <derivation>>" = {...}) at struct.c:693
> #5 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation-input>>" = {...}, s2="#<struct <derivation-input>>" = {...}) at struct.c:693
> #6 0x00007fc5247ddc82 in scm_equal_p (
> x=(#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/jc5q4i55hrilibrmacs38skk3cpg2zyb-glibc-bootstrap-0" #f #f #f 3ade740>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/nbi0mzk1dw37nnnahw9akn3bwzwi4gdk-module-import-compiled" #f #f #f 3ae6e00>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0" #f #f #f 3af1b40>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz" sha256 #vu8(183 87 205 70 191 19 236 172 131 251 142 149 95 181 0 150 172 45 23 187 97 12 168 235 129 111 41 48 42 0 168 70) #f 3af10c0>)) () ("/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors" "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") "i686-linux" "builtin:download" () (("content-addressed-mirrors" . "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") ("impureEnvVars" . "http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMNS") ("mirrors" . "/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors") ("out" . "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz") ("preferLocalBuild" . "1") ("url" . "(\"https://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"ftp://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://www.fdn.fr/~lcourtes/software/guix/packages/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://flashner.co.il/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\")")) "/gnu/store/jagg4l53ar1m4hsgqlpisw9i4rq2whzd-guile-2.0.9.tar.xz.drv" 3af14e0> ("out") 3af1780>) ("/gnu/store/d32948vi9bs5mmbnvlczqcgwwdqvygiw-build-bootstrap-guile.sh" "/gnu/store/mzfkrxd4w8vqrmyrx169wj8wyw7r8i37-bash") "i686-linux" "/gnu/store/mzfkrxd4w8 […]
>
> So this process is busy comparing deep structures, probably as a
> consequence of 5cf4b26d52bcea382d98fb4becce89be9ee37b55.

With some debugging, I found that the pathological behavior comes from
the memoization hash table of ‘derivation->bytevector’, specifically
after ‘invalidate-derivation-caches!’ has been called.

Namely, after ‘invalidate-derivation-caches!’ has been called, that hash
table has 24K buckets, including 72 buckets of size 6 or more. One
bucket has 18 entries, which are all alike:

#<derivation #f => /gnu/store/…-module-import-compiled 94283c0>

The ‘equal?’ calls come from those buckets, and each of them involves
traversing the whole <derivation> tree.

I’m looking for a workaround.

Ludo’.

PS: Below is some debugging code.
(use-modules (system foreign)
(ice-9 match))

;; typedef struct {
;; 0 scm_t_weak_entry **buckets; /* the data */
;; 8 scm_i_pthread_mutex_t lock; /* the lock */
;; 48 scm_t_weak_table_kind kind; /* what kind of table it is */
;; 56 unsigned long n_buckets; /* total number of buckets. */
;; 64 unsigned long n_items; /* number of items in table */
;; 72 unsigned long lower; /* when to shrink */
;; 80 unsigned long upper; /* when to grow */
;; 88 int size_index; /* index into hashtable_size */
;; 92 int min_size_index; /* minimum size_index */
;; 96 GC_word last_gc_no;
;; } scm_t_weak_table;

(define (weak-table-data table)
"Return an alist describing TABLE."
(unless (or (weak-key-hash-table? table)
(weak-value-hash-table? table))
(error "wrong type argument" table))

(let* ((address (object-address table))
(ptr (dereference-pointer
(make-pointer (+ address 8)))))
(match (parse-c-struct (make-pointer (+ (pointer-address ptr) 48))
(list int
unsigned-long unsigned-long
unsigned-long unsigned-long
int
int
unsigned-long))
((kind buckets items lower upper size-index min-size-index last-gc-no)
`((kind . ,kind) (items . ,items) (buckets . ,buckets)
(lower . ,lower) (upper . ,upper)
(size-index . ,size-index) (min-size-index . ,min-size-index)
(last-gc-no . ,last-gc-no))))))

;; typedef struct scm_t_hashtable {
;; unsigned long n_items; /* number of items in table */
;; unsigned long lower; /* when to shrink */
;; unsigned long upper; /* when to grow */
;; int size_index; /* index into hashtable_size */
;; int min_size_index; /* minimum size_index */
;; scm_t_hash_fn hash_fn; /* for rehashing after a GC. */
;; } scm_t_hashtable;

(define (hash-table-data table)
(unless (hash-table? table)
(error "wrong type argument" table))

(let* ((address (object-address table))
(ptr (dereference-pointer
(make-pointer (+ address (* 2 (sizeof '*)))))))
(match (parse-c-struct ptr
(list unsigned-long
unsigned-long unsigned-long
int
int))
((items lower upper size-index min-size-index)
`((items . ,items)
(lower . ,lower) (upper . ,upper)
(size-index . ,size-index) (min-size-index . ,min-size-index))))))

(define (hash-table-buckets table)
(unless (hash-table? table)
(error "wrong type argument" table))

(let* ((address (object-address table))
(ptr (dereference-pointer
(make-pointer (+ address (* 1 (sizeof '*)))))))
(pointer->scm ptr)))

(define (hash-table-bucket-sizes table)
(let ((buckets (hash-table-buckets table)))
(let loop ((index 0)
(result '()))
(if (< index (vector-length buckets))
(loop (+ 1 index)
(cons (length (vector-ref buckets index))
result))
(reverse result)))))
L
L
Ludovic Courtès wrote on 3 Jul 2019 20:10
(address . 36487@debbugs.gnu.org)(name . Christopher Baines)(address . mail@cbaines.net)
87o92b0z1e.fsf@gnu.org
I pushed a workaround as e79281be105b16153c375af5506db31fd1e32698,
though more work is still needed to fully understand what’s happening.

Note that you can test things like this:

Toggle snippet (18 lines)
$ ./pre-inst-env guile
GNU Guile 2.2.5
Copyright (C) 1995-2019 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> ,use(guix store)
scheme@(guile-user)> (use-modules (gnu ci) (ice-9 match))
scheme@(guile-user)> (define store (open-connection))
scheme@(guile-user)> (define jobs (hydra-jobs store '((systems "x86_64-linux" "i686-linux"))))
evaluating for 'x86_64-linux' (heap size: 35.0 MiB)...
random seed for tests: 1562154761
evaluating for 'i686-linux' (heap size: 467.0 MiB)...

Ludo’.
M
M
Mathieu Othacehe wrote on 6 Sep 2020 09:10
control message for bug #36487
(address . control@debbugs.gnu.org)
87a6y3xtw6.fsf@cervin.i-did-not-set--mail-host-address--so-tickle-me
close 36487
quit
?