'guix system' failure due to substitute being temporarily 404 -> Implement a retry attempts?

OpenSubmitted by Jacob Hrbek.
Details
5 participants
  • Jacob Hrbek
  • Liliana Marie Prikler
  • Ludovic Courtès
  • Maxim Cournoyer
  • Maxime Devos
Owner
unassigned
Severity
important
Merged with
J
J
Jacob Hrbek wrote on 30 Jun 10:19 +0200
(address . bug-guix@gnu.org)
16ae5abb-d585-5f3b-c6b2-c892a997ff35@rixotstudio.cz
Seems that the substitute server went down just as i was doing `guix
system reconfigure /path/to/config.scm` which produced a failure thus
proposing to implement a retry attempts to make this process more
reliable in mission critical environment
substitute: updating substitutes from
'https://substitutes.domain.tld'...   0.0%Backtrace:
substitute: In ice-9/boot-9.scm:
substitute:   1752:10 17 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In unknown file:
substitute:           16 (apply-smob/0 #<thunk 7f93f45d22e0>)
substitute: In ice-9/boot-9.scm:
substitute:     724:2 15 (call-with-prompt _ _ #<procedure
default-prompt-handle…>)
substitute: In ice-9/eval.scm:
substitute:     619:8 14 (_ #(#(#<directory (guile-user) 7f93f45d7c80>)))
substitute: In guix/ui.scm:
substitute:    2230:7 13 (run-guix . _)
substitute:   2193:10 12 (run-guix-command _ . _)
substitute: In ice-9/boot-9.scm:
substitute:   1752:10 11 (with-exception-handler _ _ #:unwind? _ # _)
substitute:   1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In guix/scripts/substitute.scm:
substitute:    757:18  9 (_)
substitute:    348:26  8 (process-query #<output: file 4> _ #:cache-urls
_ #:acl _)
substitute: In guix/substitutes.scm:
substitute:    365:27  7 (lookup-narinfos/diverse _ _ #<procedure
7f93f229df40 …> …)
substitute:    322:31  6 (lookup-narinfos
"https://substitutes.domain.tld" _ # _ …)
substitute:    245:26  5 (fetch-narinfos _ _ #:open-connection _ # _)
substitute: In ice-9/boot-9.scm:
substitute:   1685:16  4 (raise-exception _ #:continuable? _)
substitute:   1685:16  3 (raise-exception _ #:continuable? _)
substitute:   1780:13  2 (_ #<&compound-exception components: (#<&error>
#<&orig…>)
substitute:   1685:16  1 (raise-exception _ #:continuable? _)
substitute:   1685:16  0 (raise-exception _ #:continuable? _)
substitute:
substitute: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
substitute: In procedure write_wait_fd: unimplemented
guix system: error:
`/gnu/store/5hzz6px2jp2kx8fnvfg80m6f6al3y190-guix-1.3.0-27.598f728/bin/guix
substitute' died unexpectedly
--
-- Jacob Hrbek #StandWithUkraine
Attachment: signature.asc
L
L
Liliana Marie Prikler wrote on 30 Jun 11:09 +0200
(address . control@debbugs.gnu.org)
9a2b7eee7eaeef53f86d9f14607cb040cb199586.camel@ist.tugraz.at
merge 56319 56320
thanks

Am Donnerstag, dem 30.06.2022 um 08:19 +0000 schrieb Jacob Hrbek:
Toggle quote (4 lines)
> Seems that the substitute server went down just as i was doing `guix
> system reconfigure /path/to/config.scm` which produced a failure thus
> proposing to implement a retry attempts to make this process more
> reliable in mission critical environment
While retrying and potentially falling back to local builds would be
nice improvements to user experience, I seriously hope your mission-
critical software does not assume reliability of anything network-
related. As a matter of fact, Guix already provides the only guarantee
any system can have under this circumstance: that your device won't be
bricked when failing to download something from the internet.

Cheers
M
M
Maxime Devos wrote on 1 Jul 16:22 +0200
1656685348683.68627@student.kuleuven.be
merge 56005 56320 56319
thanks

This doesn't look Guix System-specific to me and looks to have the same issue (i.e., something about write_wait_fd) as
https://issues.guix.gnu.org/56005, so merging into 56005.

Also, is the substitute server being down a hypothesis for the error but not actually known to actually have hapend, or something you have noticed the substitute server being down via other methods and believe it to be a plausible hypothesis for the error cause?

In all cases, I believe write_wait_fd: unimplemented to be a bug that needs to be investigated for the root cause and fixed, not something to accumulate work-arounds for and ignore by automatic retries.

Greetings,
Maxime.
L
L
Ludovic Courtès wrote on 3 Jul 22:56 +0200
control message for bug #56320
(address . control@debbugs.gnu.org)
878rp9zzil.fsf@gnu.org
severity 56320 important
quit
M
M
Maxim Cournoyer wrote on 13 Jul 14:47 +0200
control message for bug #56319
(address . control@debbugs.gnu.org)
878rox2n8y.fsf@gmail.com
close 56319
quit
?