mumi crashes

  • Open
  • quality assurance status badge
Details
4 participants
  • Arun Isaac
  • Felix Lechner
  • Christopher Baines
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 11 Sep 2023 03:28
(name . bug-guix)(address . bug-guix@gnu.org)(name . guix-sysadmin)(address . guix-sysadmin@gnu.org)
87leddv6p2.fsf@gmail.com
Hello,

Since today, Mumi was found crashing with the following errors in
/var/log/mumi.log:

Toggle snippet (20 lines)
2023-09-10 11:49:01 GET GC Warning: Repeated allocation of very large block (appr. size 4194304):
2023-09-10 11:49:01 May lead to memory leak and poor performance
2023-09-10 12:02:37 Throw to key `match-error' with args `("match" "no matching pattern" (HEAD "63641"))'.
2023-09-10 12:06:00 Uncaught exception in fiber ##f:
2023-09-10 12:06:00 In ice-9/boot-9.scm:
2023-09-10 12:06:00 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 12:06:00 In fibers/web/server.scm:
2023-09-10 12:06:00 211:21 4 (_)
2023-09-10 12:06:00 In ice-9/suspendable-ports.scm:
2023-09-10 12:06:00 83:4 3 (write-bytes #<closed: file 7f4efcabc8c0> #vu8(60 33 …) …)
2023-09-10 12:06:00 In unknown file:
2023-09-10 12:06:00 2 (port-write #<closed: file 7f4efcabc8c0> #vu8(60 33 # …) …)
2023-09-10 12:06:00 In ice-9/boot-9.scm:
2023-09-10 12:06:00 1685:16 1 (raise-exception _ #:continuable? _)
2023-09-10 12:06:00 1685:16 0 (raise-exception _ #:continuable? _)
2023-09-10 12:06:00 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 12:06:00 In procedure fport_write: Broken pipe
2023-09-10 12:15:36 /graphql

And also:

Toggle snippet (49 lines)
[...]
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07 Uncaught exception in fiber ##f:
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 22:23:07 In fibers/web/server.scm:
2023-09-10 22:23:07 211:21 4 (_)
2023-09-10 22:23:07 In ice-9/suspendable-ports.scm:
2023-09-10 22:23:07 83:4 3 (write-bytes #<closed: file 7f1caec72d90> #vu8(60 33 …) …)
2023-09-10 22:23:07 In unknown file:
2023-09-10 22:23:07 2 (port-write #<closed: file 7f1caec72d90> #vu8(60 33 # …) …)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07 1685:16 1 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 Uncaught exception in fiber ##f:
2023-09-10 22:23:07 1685:16 0 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 22:23:07 In fibers/web/server.scm:
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07 211:21 4 (_)
2023-09-10 22:23:07 In ice-9/suspendable-ports.scm:
2023-09-10 22:23:07 83:4 3 (write-bytes #<closed: file 7f1cb6303e00> #vu8(60 33 …) …)
2023-09-10 22:23:07 In unknown file:
2023-09-10 22:23:07 2 (port-write #<closed: file 7f1cb6303e00> #vu8(60 33 # …) …)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07 1685:16 1 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 1685:16 0 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07 Uncaught exception in fiber ##f:
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 22:23:07 In fibers/web/server.scm:
2023-09-10 22:23:07 211:21 4 (_)
2023-09-10 22:23:07 In ice-9/suspendable-ports.scm:
2023-09-10 22:23:07 83:4 3 (write-bytes #<closed: file 7f1cac7101c0> #vu8(60 33 …) …)
2023-09-10 22:23:07 In unknown file:
2023-09-10 22:23:07 2 (port-write #<closed: file 7f1cac7101c0> #vu8(60 33 # …) …)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07 1685:16 1 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 1685:16 0 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe

I've restarted the 'mumi' service and it's up and running again, for
now.

Attached is the complete mumi log for reference.
Attachment: mumi.log.gz
--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote on 8 Oct 2023 05:13
(address . 65858@debbugs.gnu.org)(name . guix-sysadmin)(address . guix-sysadmin@gnu.org)
87jzrxhkmn.fsf@gmail.com
Hello,

Today, Mumi (version 0.0.5-4.2453a5a) was crashing again, with the
website returning 502 Bad Gateway errors, with the same error, it seems:

Toggle snippet (17 lines)
2023-10-07 11:48:52 In procedure fport_write: Broken pipe
2023-10-07 11:48:52 Uncaught exception in fiber ##f:
2023-10-07 11:48:52 In ice-9/boot-9.scm:
2023-10-07 11:48:52 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
2023-10-07 11:48:52 In fibers/web/server.scm:
2023-10-07 11:48:52 211:21 4 (_)
2023-10-07 11:48:52 In ice-9/suspendable-ports.scm:
2023-10-07 11:48:52 83:4 3 (write-bytes #<closed: file 7f55f4a47070> #vu8(60 33 …) …)
2023-10-07 11:48:52 In unknown file:
2023-10-07 11:48:52 2 (port-write #<closed: file 7f55f4a47070> #vu8(60 33 # …) …)
2023-10-07 11:48:52 In ice-9/boot-9.scm:
2023-10-07 11:48:52 1685:16 1 (raise-exception _ #:continuable? _)
2023-10-07 11:48:52 1685:16 0 (raise-exception _ #:continuable? _)
2023-10-07 11:48:52 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-10-07 11:48:52 In procedure fport_write: Broken pipe

That's from /var/log/mumi.log.1.gz. The worker is peaking 1 to 4 CPUs;
here's what strace looked like at some point:

Toggle snippet (85 lines)
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f831b49a910, parent_tid=0x7f831b49a910, exit_signal=0, stack=0x7f831ac9a000, stack_size=0x7fff00, tls=0x7f831b49a640} => {parent_tid=[87808]}, 88) = 87808
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e3177c0, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f8319c97910, parent_tid=0x7f8319c97910, exit_signal=0, stack=0x7f8319497000, stack_size=0x7fff00, tls=0x7f8319c97640} => {parent_tid=[87809]}, 88) = 87809
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f70, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e317780, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f831ac99910, parent_tid=0x7f831ac99910, exit_signal=0, stack=0x7f831a499000, stack_size=0x7fff00, tls=0x7f831ac99640} => {parent_tid=[87810]}, 88) = 87810
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f831e317740, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8329190c80, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f8329190c80, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f831a498910, parent_tid=0x7f831a498910, exit_signal=0, stack=0x7f8319c98000, stack_size=0x7fff00, tls=0x7f831a498640} => {parent_tid=[87811]}, 88) = 87811
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f70, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f831e317700, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e317700, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8328b87f70, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e317780, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e3177c0, FUTEX_WAKE_PRIVATE, 1) = 0
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian", {st_mode=S_IFDIR|0755, st_size=144, ...}, 0) = 0
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian/iamchert", 0x7ffcd7e8ec60, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian/iamglass", {st_mode=S_IFREG|0644, st_size=155, ...}, 0) = 0
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian/postlist.glass", {st_mode=S_IFREG|0644, st_size=389939200, ...}, 0) = 0
openat(AT_FDCWD, "/var/mumi/db/mumi.xapian/flintlock", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 10
fcntl(10, F_OFD_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0

Restarting it resolved it for now.

--
Thanks,
Maxim
A
A
Arun Isaac wrote on 10 Oct 2023 01:02
(name . guix-sysadmin)(address . guix-sysadmin@gnu.org)
874jizcsci.fsf@systemreboot.net
Hi Maxim,

Thanks for monitoring mumi. I'm not sure why these mumi crashes are
happening. I can think of two leads to start working on.

1. I see many lines such as those below that warn of garbage collection
issues with "repeated allocations of large blocks". I am not sure, but
this sounds like some limitation of the guile garbage collector. I
suspect these warnings arise because the guile-email parser thrashes the
garbage collector too much. I could look into rewriting parts of
guile-email to be more careful with memory use.

Toggle snippet (8 lines)
2023-09-09 16:21:00 GC Warning: Repeated allocation of very large block (appr. size 16777216):
2023-09-09 16:21:00 May lead to memory leak and poor performance
2023-09-09 16:30:24 GC Warning: Repeated allocation of very large block (appr. size 16138240):
2023-09-09 16:30:24 May lead to memory leak and poor performance
2023-09-09 16:39:11 GC Warning: Repeated allocation of very large block (appr. size 16781312):
2023-09-09 16:39:11 May lead to memory leak and poor performance

2. If there is a specific query that causes mumi to crash, I'd like to
find out. This could be a GraphQL query. Unfortunately the actual
GraphQL document lives in the query part of the URI, and we don't log
the query part. I can patch mumi to log the query part.

WDYT?

Regards,
Arun
M
M
Maxim Cournoyer wrote on 10 Oct 2023 05:51
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87y1gb5e51.fsf@gmail.com
Hi Arun,

Arun Isaac <arunisaac@systemreboot.net> writes:

Toggle quote (19 lines)
> Hi Maxim,
>
> Thanks for monitoring mumi. I'm not sure why these mumi crashes are
> happening. I can think of two leads to start working on.
>
> 1. I see many lines such as those below that warn of garbage collection
> issues with "repeated allocations of large blocks". I am not sure, but
> this sounds like some limitation of the guile garbage collector. I
> suspect these warnings arise because the guile-email parser thrashes the
> garbage collector too much. I could look into rewriting parts of
> guile-email to be more careful with memory use.
>
> 2023-09-09 16:21:00 GC Warning: Repeated allocation of very large block (appr. size 16777216):
> 2023-09-09 16:21:00 May lead to memory leak and poor performance
> 2023-09-09 16:30:24 GC Warning: Repeated allocation of very large block (appr. size 16138240):
> 2023-09-09 16:30:24 May lead to memory leak and poor performance
> 2023-09-09 16:39:11 GC Warning: Repeated allocation of very large block (appr. size 16781312):
> 2023-09-09 16:39:11 May lead to memory leak and poor performance

Perhaps that could be related, but I have no such experience to
corroborate unfortunately.

Toggle quote (7 lines)
> 2. If there is a specific query that causes mumi to crash, I'd like to
> find out. This could be a GraphQL query. Unfortunately the actual
> GraphQL document lives in the query part of the URI, and we don't log
> the query part. I can patch mumi to log the query part.
>
> WDYT?

I think adding more logging to narrow where the problem is happening
would be a fine action to take here!

--
Thanks,
Maxim
A
A
Arun Isaac wrote on 11 Oct 2023 00:57
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87edi22ii3.fsf@systemreboot.net
Hi Maxim,

I have made a number of changes to mumi and reconfigured berlin with the
latest mumi. Here is a quick summary of the main changes to mumi.

- We now log the complete URI and the response code for every request to
mumi.
- We now handle HEAD requests correctly. This should eliminate some of
the crashes we saw in the mumi log.

Regards,
Arun
M
M
Maxim Cournoyer wrote on 11 Oct 2023 03:29
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
877cnueyl8.fsf@gmail.com
Hi Arun,

Arun Isaac <arunisaac@systemreboot.net> writes:

Toggle quote (10 lines)
> Hi Maxim,
>
> I have made a number of changes to mumi and reconfigured berlin with the
> latest mumi. Here is a quick summary of the main changes to mumi.
>
> - We now log the complete URI and the response code for every request to
> mumi.
> - We now handle HEAD requests correctly. This should eliminate some of
> the crashes we saw in the mumi log.

Thanks! Let's keep an eye on things.

--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote on 24 Oct 2023 06:26
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
874jig63z6.fsf@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (16 lines)
> Hi Arun,
>
> Arun Isaac <arunisaac@systemreboot.net> writes:
>
>> Hi Maxim,
>>
>> I have made a number of changes to mumi and reconfigured berlin with the
>> latest mumi. Here is a quick summary of the main changes to mumi.
>>
>> - We now log the complete URI and the response code for every request to
>> mumi.
>> - We now handle HEAD requests correctly. This should eliminate some of
>> the crashes we saw in the mumi log.
>
> Thanks! Let's keep an eye on things.

Here's a fresh crash (on berlin):

Toggle snippet (27 lines)
2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
2023-10-24 06:23:15 GET Uncaught exception in task:
2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong type argument in position 1 (expecting
open port): #<closed: file 7f2ac30077e0>
2023-10-24 06:23:15 In fibers.scm:
2023-10-24 06:23:15 172:8 6 (_)
2023-10-24 06:23:15 In ice-9/boot-9.scm:
2023-10-24 06:23:15 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
2023-10-24 06:23:15 In fibers/web/server.scm:
2023-10-24 06:23:15 214:25 4 (_)
2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
2023-10-24 06:23:15 83:4 3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
2023-10-24 06:23:15 In unknown file:
2023-10-24 06:23:15 2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
2023-10-24 06:23:15 In ice-9/boot-9.scm:
2023-10-24 06:23:15 1685:16 1 (raise-exception _ #:continuable? _)
2023-10-24 06:23:15 1685:16 0 (raise-exception _ #:continuable? _)
2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-10-24 06:23:15 In procedure fport_write: Broken pipe

--
Thanks,
Maxim
C
C
Christopher Baines wrote on 24 Oct 2023 09:13
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87v8aw32z7.fsf@cbaines.net
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (48 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hi Arun,
>>
>> Arun Isaac <arunisaac@systemreboot.net> writes:
>>
>>> Hi Maxim,
>>>
>>> I have made a number of changes to mumi and reconfigured berlin with the
>>> latest mumi. Here is a quick summary of the main changes to mumi.
>>>
>>> - We now log the complete URI and the response code for every request to
>>> mumi.
>>> - We now handle HEAD requests correctly. This should eliminate some of
>>> the crashes we saw in the mumi log.
>>
>> Thanks! Let's keep an eye on things.
>
> Here's a fresh crash (on berlin):
>
> 2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
> 2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
> 2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
> 2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
> 2023-10-24 06:23:15 GET Uncaught exception in task:
> 2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong type argument in position 1 (expecting
> open port): #<closed: file 7f2ac30077e0>
> 2023-10-24 06:23:15 In fibers.scm:
> 2023-10-24 06:23:15 172:8 6 (_)
> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
> 2023-10-24 06:23:15 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
> 2023-10-24 06:23:15 In fibers/web/server.scm:
> 2023-10-24 06:23:15 214:25 4 (_)
> 2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
> 2023-10-24 06:23:15 83:4 3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
> 2023-10-24 06:23:15 In unknown file:
> 2023-10-24 06:23:15 2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
> 2023-10-24 06:23:15 1685:16 1 (raise-exception _ #:continuable? _)
> 2023-10-24 06:23:15 1685:16 0 (raise-exception _ #:continuable? _)
> 2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> 2023-10-24 06:23:15 In procedure fport_write: Broken pipe

I think this is kind of expected. If NGinx hits the proxy_read_timeout
it'll return a 504 to the client and close the connection to Mumi I
think. I think what you're seeing here is Mumi trying to respond to a
request from NGinx that NGinx has closed.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmU3byxfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfLVxAAnq/sWXwuVgvxiN8gmJ+fQMHacg2eEbC1
7LezVz+uV9R0yR+w3Im5AAA0ua6j2y3Ndsf8ob3BDYqnX3lagVzSV8g86wkKpdkb
AEI0ydnALsjP77FF9UA82ToLJWJaQdsqd7DWDX/C0S21P0aaAPo93Ue5W9Gi+XUs
1VmKKFuMJZtFPcIJFLvU3fA/eYgSqHMFHufNNCb5FtNA+4eWs5m/a61sMaJHJ+Z7
aeSvf9QyBO4sNup/D6C4D/ITY0i3d1n+Y+tK9yZwcimE29oI8eJqfjWBUaYVhP5f
yt7um4Dj7hLsqXyDy2Ji1HRwR6DOCFHwad4yfF+LDQICKKhAPRHU68z+q3jB1X+d
cVySU9toFZarpMdv18PNxKjKmRfOmlELANF2BCvzzEsY/GVjkvDHPUYlpfYyH0dh
j02MOt100OacjLyOKh3k5hw/ihtIZeOkz9YxGW4SYwZfAblW2nhV/eFPMEdKFONS
+GI1uC/xisXWinyBeAIHQK2vVIvCxdOKH8jwmv6Hz2IBpD17mDAigA9fSsXuKiU1
scvyhhg/49HKHJZFgesFzL1RlLzN4tn3O85ptSRBsGpMZhEQZXptk7czZEO8FpuA
58csRkHgSQb5KAsPQi42BtZNfCML2D5XmLkTxCge2ciCH7N6xc3CQPU3Td9cgobR
LdWAB/aXaeE=
=c2Wn
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 24 Oct 2023 14:47
(name . Christopher Baines)(address . mail@cbaines.net)
87wmvc4273.fsf@gmail.com
Hi Christopher,

Christopher Baines <mail@cbaines.net> writes:

[...]

Toggle quote (37 lines)
>> Here's a fresh crash (on berlin):
>>
>> 2023-10-24 06:22:58 GET
>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>> 2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
>> 2023-10-24 06:22:58 GET
>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>> 2023-10-24 06:22:58 GET
>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>> 2023-10-24 06:23:15 GET Uncaught exception in task:
>> 2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong
>> type argument in position 1 (expecting
>> open port): #<closed: file 7f2ac30077e0>
>> 2023-10-24 06:23:15 In fibers.scm:
>> 2023-10-24 06:23:15 172:8 6 (_)
>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>> 2023-10-24 06:23:15 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
>> 2023-10-24 06:23:15 In fibers/web/server.scm:
>> 2023-10-24 06:23:15 214:25 4 (_)
>> 2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
>> 2023-10-24 06:23:15 83:4 3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
>> 2023-10-24 06:23:15 In unknown file:
>> 2023-10-24 06:23:15 2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>> 2023-10-24 06:23:15 1685:16 1 (raise-exception _ #:continuable? _)
>> 2023-10-24 06:23:15 1685:16 0 (raise-exception _ #:continuable? _)
>> 2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> 2023-10-24 06:23:15 In procedure fport_write: Broken pipe
>
> I think this is kind of expected. If NGinx hits the proxy_read_timeout
> it'll return a 504 to the client and close the connection to Mumi I
> think. I think what you're seeing here is Mumi trying to respond to a
> request from NGinx that NGinx has closed.

If it's expected, we should handle it and produce a useful warning
instead of crashing, right?

Thanks for tipping in!

--
Thanks,
Maxim
C
C
Christopher Baines wrote on 24 Oct 2023 15:15
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87msw82lvu.fsf@cbaines.net
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (46 lines)
> Hi Christopher,
>
> Christopher Baines <mail@cbaines.net> writes:
>
> [...]
>
>>> Here's a fresh crash (on berlin):
>>>
>>> 2023-10-24 06:22:58 GET
>>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
>>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>>> 2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
>>> 2023-10-24 06:22:58 GET
>>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
>>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>>> 2023-10-24 06:22:58 GET
>>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
>>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>>> 2023-10-24 06:23:15 GET Uncaught exception in task:
>>> 2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong
>>> type argument in position 1 (expecting
>>> open port): #<closed: file 7f2ac30077e0>
>>> 2023-10-24 06:23:15 In fibers.scm:
>>> 2023-10-24 06:23:15 172:8 6 (_)
>>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>>> 2023-10-24 06:23:15 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
>>> 2023-10-24 06:23:15 In fibers/web/server.scm:
>>> 2023-10-24 06:23:15 214:25 4 (_)
>>> 2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
>>> 2023-10-24 06:23:15 83:4 3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
>>> 2023-10-24 06:23:15 In unknown file:
>>> 2023-10-24 06:23:15 2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
>>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>>> 2023-10-24 06:23:15 1685:16 1 (raise-exception _ #:continuable? _)
>>> 2023-10-24 06:23:15 1685:16 0 (raise-exception _ #:continuable? _)
>>> 2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>>> 2023-10-24 06:23:15 In procedure fport_write: Broken pipe
>>
>> I think this is kind of expected. If NGinx hits the proxy_read_timeout
>> it'll return a 504 to the client and close the connection to Mumi I
>> think. I think what you're seeing here is Mumi trying to respond to a
>> request from NGinx that NGinx has closed.
>
> If it's expected, we should handle it and produce a useful warning
> instead of crashing, right?

As you can see from the above stack trace, this doesn't involve Mumi,
it's more of an issue for the guile-fibers web server. But yes, it would
be nice not to have this clutter in the logs.

That being said, if there are ways of having the application internally
timeout before NGinx times out, that can help to avoid this kind of
issue. Maybe that's not that easy for Mumi though.

I think I remember Ricardo doing some investigating and experimenting
with switching to non-suspendable ports when writing responses, and I
think that might have exhibited different behaviour.

It's probably worth trying to reproduce this in isolation, and double
checking if the ports implementation has any effect (obviously
suspendable ports are better, but this might be informative).
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmU3xbVfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfH1Q//eee4/eh0hr0sGaqww/9oWoxAgvBBi2uO
ydhjmbqhxwasobop2WurO2kw/DjVMW0gR6vuX7SeS5Lglv9DIx1MuavVUYYoSWW8
5tVEXcnaK/Im4UqxuUdXeuvf77gVsQaPMMTY0xMc5+CEnYPFGO8veWOa4ml2Trpm
641yqwKYhN2pI3RqERWj/q3H4mvqef6XaY+RcPEbxjP+dgGbdESbwkHZOeeZpvnk
V7aqXOsvseGrV9oWXUW/BzQ87kAmxGoFKcxytwg0Ks+4mcWlVxwxMAtHu8kHbnMX
YN5ohIpmzdvgONGyuNGpSSJHF+b8vsTOXIRI62iDc3vEPrMcRBdZ2TZYcMOK0CYb
hB10DQuHXdqbn02E9evS0VeRZSi3xJGLBKFjXsFfU/mF/wdhL+CdtTVSqjOu7S1W
0v2lRhCQZ2Be0njlVUjTUzJ9SLvqeWvPISTeHrFpsmXNau433EwZ8lO0LavviHgF
6WcPZc3FBIozST/wZTvKnyxdvO6Z4c5i2WU0lT4XxLx7PZ3fTfaWz5MBEE4LiKyD
vckyqrPT8uEL7PvYDBIFeXB1ZZeX49bfKEfAZ46oaPC6ltelOYMUV7KUNAKsuSjz
kOP+zPT4XXCoCZtnU4cxWm1sdDXQqtEMCsekDXg0/rKR7a02Vneb9pmgkMYZ7llb
vVAqp19LErY=
=b4Gi
-----END PGP SIGNATURE-----

A
A
Arun Isaac wrote on 26 Oct 2023 02:16
877cnai6gq.fsf@systemreboot.net
Hi Chris and Maxim,

Toggle quote (5 lines)
>> I think this is kind of expected. If NGinx hits the proxy_read_timeout
>> it'll return a 504 to the client and close the connection to Mumi I
>> think. I think what you're seeing here is Mumi trying to respond to a
>> request from NGinx that NGinx has closed.

Thanks for the tip. This indeed makes a lot of sense!

But, I don't understand why a proxy read timeout happens. The requests
in question seem to be very simple requests that shoud be processed
quickly. Should we increase nginx's proxy_read_timeout?

Toggle quote (3 lines)
> If it's expected, we should handle it and produce a useful warning
> instead of crashing, right?

A couple of weeks ago, I put in a mechanism to better catch and report
errors.


I'm not sure this is working correctly, though. I have used `guard` in the
commit above. Does `guard` catch all exceptions correctly, or should I
instead use `catch`?

Also, I believe the logs are written in parallel by fibers. This may
mangle up the logs a bit, but I'm not sure if we're actually see
requests rapid enough for this to happen.

Regards,
Arun
F
F
Felix Lechner wrote on 8 Feb 19:09 +0100
(no subject)
(address . control@debbugs.gnu.org)
87h6ii6d58.fsf@lease-up.com
reassign 65858 mumi
reassign 65809 mumi
reassign 49295 mumi
reassign 64833 mumi
reassign 54204 mumi
reassign 68920 mumi
reassign 55765 mumi
reassign 65923 mumi
reassign 63507 mumi
reassign 60226 mumi
reassign 63936 mumi
reassign 60951 mumi
reassign 66027 mumi
thanks
?