Evaluations do not complete since commit c30d117822a8ca26cd8c06c0a3974955bef68eac

DoneSubmitted by Ludovic Courtès.
Details
2 participants
  • Ludovic Courtès
  • Mathieu Othacehe
Owner
unassigned
Severity
serious
L
L
Ludovic Courtès wrote on 3 Jul 2019 12:28
(address . bug-Guix@gnu.org)(name . Christopher Baines)(address . mail@cbaines.net)
87r2772yze.fsf@gnu.org
Hello,
Chris reported that evaluations “hang” since commitc30d117822a8ca26cd8c06c0a3974955bef68eac, as can be seen at:
https://prototype-guix-data-service.cbaines.net/branch/master
as well as:
https://berlin.guixsd.org/jobset/guix-master(ca. eval. 6171)
The log at https://prototype-guix-data-service.cbaines.net/job/2674shows that it hangs while computing i686 derivations:
Toggle snippet (9 lines)[…]debug: Starting getting inferior derivationsheap size: 1579.0 MiBdebug: Starting getting derivations for ((x86_64-linux . x86_64-linux))debug: Finished getting derivations for ((x86_64-linux . x86_64-linux)), took 252 secondsheap size: 1579.0 MiBdebug: Starting getting derivations for ((i686-linux . i686-linux))
If I attach to the ‘guix repl -t machine’ process started by the Cuirassevaluation process, I see this:
Toggle snippet (22 lines)#0 0x00007fc5248d5fb8 in _dl_update_slotinfo (req_modid=1) at ../elf/dl-tls.c:623#1 0x00007fc5248d620c in update_get_addr (ti=0x7fc5248ab240) at ../elf/dl-tls.c:799#2 0x00007fc5248da838 in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55#3 0x00007fc5247ddbe0 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#4 0x00007fc5247cda18 in array_compare () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#5 0x00007fc5247cd9c7 in array_compare () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#6 0x00007fc5247ce675 in scm_array_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#7 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#8 0x00007fc5247ddc82 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#9 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#10 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#11 0x00007fc5247ddc82 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#101 0x00007fc5247ddc82 in scm_equal_p () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#102 0x00007fc52484891b in scm_i_struct_equalp () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#103 0x00007fc5247cbb38 in scm_sloppy_assoc () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#104 0x00007fc5247f3a92 in scm_hash_fn_ref () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#105 0x00007fc524859715 in vm_regular_engine () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#106 0x00007fc52485cd6a in scm_call_n () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1#107 0x00007fc5247e0da7 in scm_primitive_eval () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1
and with debugging symbols:
Toggle snippet (8 lines)#3 0x00007fc5247ddc23 in scm_equal_p (x=x@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile", y=y@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile") at eq.c:290#4 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation>>" = {...}, s2="#<struct <derivation>>" = {...}) at struct.c:693#5 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation-input>>" = {...}, s2="#<struct <derivation-input>>" = {...}) at struct.c:693#6 0x00007fc5247ddc82 in scm_equal_p ( x=(#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/jc5q4i55hrilibrmacs38skk3cpg2zyb-glibc-bootstrap-0" #f #f #f 3ade740>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/nbi0mzk1dw37nnnahw9akn3bwzwi4gdk-module-import-compiled" #f #f #f 3ae6e00>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0" #f #f #f 3af1b40>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz" sha256 #vu8(183 87 205 70 191 19 236 172 131 251 142 149 95 181 0 150 172 45 23 187 97 12 168 235 129 111 41 48 42 0 168 70) #f 3af10c0>)) () ("/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors" "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") "i686-linux" "builtin:download" () (("content-addressed-mirrors" . "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") ("impureEnvVars" . "http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMNS") ("mirrors" . "/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors") ("out" . "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz") ("preferLocalBuild" . "1") ("url" . "(\"https://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"ftp://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://www.fdn.fr/~lcourtes/software/guix/packages/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://flashner.co.il/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\")")) "/gnu/store/jagg4l53ar1m4hsgqlpisw9i4rq2whzd-guile-2.0.9.tar.xz.drv" 3af14e0> ("out") 3af1780>) ("/gnu/store/d32948vi9bs5mmbnvlczqcgwwdqvygiw-build-bootstrap-guile.sh" "/gnu/store/mzfkrxd4w8vqrmyrx169wj8wyw7r8i37-bash") "i686-linux" "/gnu/store/mzfkrxd4w8 […]
So this process is busy comparing deep structures, probably as aconsequence of 5cf4b26d52bcea382d98fb4becce89be9ee37b55.
Indeed, the load on berlin is at above 60 currently, with all theseevaluations processes consuming CPU…
To be continued…
Ludo’.
L
L
Ludovic Courtès wrote on 3 Jul 2019 12:36
control message for bug #36487
(address . control@debbugs.gnu.org)
87pnmr2ymc.fsf@gnu.org
severity 36487 seriousquit
L
L
Ludovic Courtès wrote on 3 Jul 2019 19:01
Re: bug#36487: Evaluations do not complete since commit c30d117822a8ca26cd8c06c0a3974955bef68eac
875zoj2gtp.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (18 lines)> #104 0x00007fc5247f3a92 in scm_hash_fn_ref () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1> #105 0x00007fc524859715 in vm_regular_engine () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1> #106 0x00007fc52485cd6a in scm_call_n () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1> #107 0x00007fc5247e0da7 in scm_primitive_eval () from /gnu/store/r658y3cgpnf99nxjxqgjiaizx20ac4k0-guile-2.2.4/lib/libguile-2.2.so.1>>> and with debugging symbols:>> #3 0x00007fc5247ddc23 in scm_equal_p (x=x@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile", > y=y@entry="/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0/bin/guile") at eq.c:290> #4 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation>>" = {...}, s2="#<struct <derivation>>" = {...}) at struct.c:693> #5 0x00007fc52484891b in scm_i_struct_equalp (s1="#<struct <derivation-input>>" = {...}, s2="#<struct <derivation-input>>" = {...}) at struct.c:693> #6 0x00007fc5247ddc82 in scm_equal_p (> x=(#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/jc5q4i55hrilibrmacs38skk3cpg2zyb-glibc-bootstrap-0" #f #f #f 3ade740>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/nbi0mzk1dw37nnnahw9akn3bwzwi4gdk-module-import-compiled" #f #f #f 3ae6e00>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/djh3drjx3hnxlx1bsdnixdm3xjbg5v2c-guile-bootstrap-2.0" #f #f #f 3af1b40>)) (#<struct <derivation-input> #<struct <derivation> (("out" . #<struct <derivation-output> "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz" sha256 #vu8(183 87 205 70 191 19 236 172 131 251 142 149 95 181 0 150 172 45 23 187 97 12 168 235 129 111 41 48 42 0 168 70) #f 3af10c0>)) () ("/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors" "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") "i686-linux" "builtin:download" () (("content-addressed-mirrors" . "/gnu/store/vwyxp1dq4lb97n6b20w5cqxasy2dai79-content-addressed-mirrors") ("impureEnvVars" . "http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMNS") ("mirrors" . "/gnu/store/7f1alh9qj2h0wwy2220npgnmw6pbrkwx-mirrors") ("out" . "/gnu/store/dgncc5wmw8prxq09y71hqjc6g7rxqvvb-guile-2.0.9.tar.xz") ("preferLocalBuild" . "1") ("url" . "(\"https://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"ftp://alpha.gnu.org/gnu/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://www.fdn.fr/~lcourtes/software/guix/packages/i686-linux/20131110/guile-2.0.9.tar.xz\" \"http://flashner.co.il/guix/bootstrap/i686-linux/20131110/guile-2.0.9.tar.xz\")")) "/gnu/store/jagg4l53ar1m4hsgqlpisw9i4rq2whzd-guile-2.0.9.tar.xz.drv" 3af14e0> ("out") 3af1780>) ("/gnu/store/d32948vi9bs5mmbnvlczqcgwwdqvygiw-build-bootstrap-guile.sh" "/gnu/store/mzfkrxd4w8vqrmyrx169wj8wyw7r8i37-bash") "i686-linux" "/gnu/store/mzfkrxd4w8 […]>> So this process is busy comparing deep structures, probably as a> consequence of 5cf4b26d52bcea382d98fb4becce89be9ee37b55.
With some debugging, I found that the pathological behavior comes fromthe memoization hash table of ‘derivation->bytevector’, specificallyafter ‘invalidate-derivation-caches!’ has been called.
Namely, after ‘invalidate-derivation-caches!’ has been called, that hashtable has 24K buckets, including 72 buckets of size 6 or more. Onebucket has 18 entries, which are all alike:
#<derivation #f => /gnu/store/…-module-import-compiled 94283c0>
The ‘equal?’ calls come from those buckets, and each of them involvestraversing the whole <derivation> tree.
I’m looking for a workaround.
Ludo’.
PS: Below is some debugging code.
(use-modules (system foreign) (ice-9 match))
;; typedef struct {;; 0 scm_t_weak_entry **buckets; /* the data */;; 8 scm_i_pthread_mutex_t lock; /* the lock */;; 48 scm_t_weak_table_kind kind; /* what kind of table it is */;; 56 unsigned long n_buckets; /* total number of buckets. */;; 64 unsigned long n_items; /* number of items in table */;; 72 unsigned long lower; /* when to shrink */;; 80 unsigned long upper; /* when to grow */;; 88 int size_index; /* index into hashtable_size */;; 92 int min_size_index; /* minimum size_index */;; 96 GC_word last_gc_no;;; } scm_t_weak_table;
(define (weak-table-data table) "Return an alist describing TABLE." (unless (or (weak-key-hash-table? table) (weak-value-hash-table? table)) (error "wrong type argument" table))
(let* ((address (object-address table)) (ptr (dereference-pointer (make-pointer (+ address 8))))) (match (parse-c-struct (make-pointer (+ (pointer-address ptr) 48)) (list int unsigned-long unsigned-long unsigned-long unsigned-long int int unsigned-long)) ((kind buckets items lower upper size-index min-size-index last-gc-no) `((kind . ,kind) (items . ,items) (buckets . ,buckets) (lower . ,lower) (upper . ,upper) (size-index . ,size-index) (min-size-index . ,min-size-index) (last-gc-no . ,last-gc-no))))))
;; typedef struct scm_t_hashtable {;; unsigned long n_items; /* number of items in table */;; unsigned long lower; /* when to shrink */;; unsigned long upper; /* when to grow */;; int size_index; /* index into hashtable_size */;; int min_size_index; /* minimum size_index */;; scm_t_hash_fn hash_fn; /* for rehashing after a GC. */;; } scm_t_hashtable;
(define (hash-table-data table) (unless (hash-table? table) (error "wrong type argument" table))
(let* ((address (object-address table)) (ptr (dereference-pointer (make-pointer (+ address (* 2 (sizeof '*))))))) (match (parse-c-struct ptr (list unsigned-long unsigned-long unsigned-long int int)) ((items lower upper size-index min-size-index) `((items . ,items) (lower . ,lower) (upper . ,upper) (size-index . ,size-index) (min-size-index . ,min-size-index))))))
(define (hash-table-buckets table) (unless (hash-table? table) (error "wrong type argument" table))
(let* ((address (object-address table)) (ptr (dereference-pointer (make-pointer (+ address (* 1 (sizeof '*))))))) (pointer->scm ptr)))
(define (hash-table-bucket-sizes table) (let ((buckets (hash-table-buckets table))) (let loop ((index 0) (result '())) (if (< index (vector-length buckets)) (loop (+ 1 index) (cons (length (vector-ref buckets index)) result)) (reverse result)))))
L
L
Ludovic Courtès wrote on 3 Jul 2019 20:10
(address . 36487@debbugs.gnu.org)(name . Christopher Baines)(address . mail@cbaines.net)
87o92b0z1e.fsf@gnu.org
I pushed a workaround as e79281be105b16153c375af5506db31fd1e32698,though more work is still needed to fully understand what’s happening.
Note that you can test things like this:
Toggle snippet (18 lines)$ ./pre-inst-env guileGNU Guile 2.2.5Copyright (C) 1995-2019 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.This program is free software, and you are welcome to redistribute itunder certain conditions; type `,show c' for details.
Enter `,help' for help.scheme@(guile-user)> ,use(guix store)scheme@(guile-user)> (use-modules (gnu ci) (ice-9 match))scheme@(guile-user)> (define store (open-connection))scheme@(guile-user)> (define jobs (hydra-jobs store '((systems "x86_64-linux" "i686-linux"))))evaluating for 'x86_64-linux' (heap size: 35.0 MiB)...random seed for tests: 1562154761evaluating for 'i686-linux' (heap size: 467.0 MiB)...
Ludo’.
M
M
Mathieu Othacehe wrote on 6 Sep 09:10 +0200
control message for bug #36487
(address . control@debbugs.gnu.org)
87a6y3xtw6.fsf@cervin.i-did-not-set--mail-host-address--so-tickle-me
close 36487 quit
?