ci.guix.gnu.org narinfos with excessive NarSize

OpenSubmitted by Christopher Baines.
Details
3 participants
  • Ludovic Courtès
  • Christopher Baines
  • Mathieu Othacehe
Owner
unassigned
Severity
important
C
C
Christopher Baines wrote on 31 Jan 15:47 +0100
(address . bug-guix@gnu.org)
87zh0pf9ip.fsf@cbaines.net
I noticed through the Guix Data Service that some narinfo files fromci.guix.gnu.org have an excessive NarSize.
These are the three that I've found, but there could be more.

/gnu/store/qln574djfgl8h9glij9id8jips7nnrlw-flightgear-2018.3.5NarSize: 18446744072099351584
/gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1NarSize: 18446744071612438544
/gnu/store/wd9z64xpck56xzf52jwlpg8vb610b0ym-repeat-masker-4.1.1NarSize: 18446744071612438544

There's additional information on IRC: http://logs.guix.gnu.org/guix/2021-01-31.log#152751
Cc'ing Mathieu in case this is related to the new offloading mechanism.
-----BEGIN PGP SIGNATURE-----
iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmAWww5fFIAAAAAALgAoaXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNFODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2JhaW5lcy5uZXQACgkQXiijOwuE9XcpZxAAkBR7gl7a2vCi2Ag6iEZIGQrrDkWnE8KW9IkLhfuDPJxvxo59Cu7D2EZ5cKCifC3xOtns3j76FngsbQOmMNXs2A6XzvNQA0jdHiVLg/JF/MthrOupDo2mfflbkIu0wOxei5MS2JP3dVXHzlvsP4Tw1hsyydI/67HwdXa6r9ILvC61xjjx3Cq8kC4drQefpbtWWA3kMym0NJO7vLYGgKXAadZdJ6qsc9ZnZffA9xixua56tTy/nceRNyaAaXbSG/FLKEhLMGug8ryDq0NqR7WODzAzRa7RxPyywJAqQrGPtJ8BPnXJJNNc1L9QY6OnKT0C/Ayyi2AToE/nXp4DvC7iz+MQ6FnLK/7IesHhzFEnAJW8f2XvKQVcmgnSEolJqgc8zo4WqjFzEsfiPJsFhovJO8kjNrTj4VY4W7qkexm/65k4Sosjp7PimSaLgKWJcG2jUKwmGJUlUGMDD3RSSGV6gLtVxM4eG+lBhbuvD0aW7wtzG15BORKCdv34RydvurKF1BHa80WbnTI8SNVmj5mfeFYVoS1Dq1rR4K1fzEC6oworx34uyIvQMuDcE/Lq5DdqihjMYwFsrLbQHF2vdaeZa3enFwRK5xiH3CtyrUmZmC0B3RHS5Kr8tRHkQAiYWFf6dO0sV2LQSBq5uhp9vuauCeHYd3o0zedv0gVX/8By0q0==qZAJ-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 31 Jan 16:17 +0100
control message for bug #46212
(address . control@debbugs.gnu.org)
874kixp240.fsf@gnu.org
severity 46212 importantquit
L
L
Ludovic Courtès wrote on 31 Jan 16:20 +0100
Re: bug#46212: ci.guix.gnu.org narinfos with excessive NarSize
(name . Christopher Baines)(address . mail@cbaines.net)
87y2g9nneh.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:
Toggle quote (9 lines)> /gnu/store/qln574djfgl8h9glij9id8jips7nnrlw-flightgear-2018.3.5> NarSize: 18446744072099351584>> /gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1> NarSize: 18446744071612438544>> /gnu/store/wd9z64xpck56xzf52jwlpg8vb610b0ym-repeat-masker-4.1.1> NarSize: 18446744071612438544
The key point here is that ‘narSize’ in the database is negative:
Toggle snippet (4 lines)sqlite> select * from validpaths where path = '/gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1';43262123|/gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1|sha256:33328e16d8d83dcf1a6e031598dbc517aff18e6c7ccd55f7894102bab55fcdb9|1611849907|/gnu/store/rr532q5fmwj1gafdgk6nhxg9khnbsw3z-repeat-masker-4.1.1.drv|-2097113072
The actual nar size in this case is just above 2³¹ so most likely we’reseeing a signed integer wrapping error.
I believe this is a very recent regression (the registration data forthe item above is Jan. 28th); we have older store items with a correct‘narSize’, such ashttps://ci.guix.gnu.org/nm6w84c9zj3yiylal3dk1sqzxq11sjzw.narinfo.
Thoughts?
Ludo’.
L
L
Ludovic Courtès wrote on 1 Feb 10:15 +0100
(name . Christopher Baines)(address . mail@cbaines.net)
87mtwoi1wz.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (2 lines)> The key point here is that ‘narSize’ in the database is negative:
With commit 13a7d2a538b00aa0a8cf9b999f1a4ff3e5959af9, ‘register-items’ &co. will now detect invalid nar size values early on.
Ludo’.
C
C
Christopher Baines wrote on 1 Feb 20:57 +0100
Re: ci.guix.gnu.org narinfos with excessive NarSize
(address . 46212@debbugs.gnu.org)
87r1lzftnx.fsf@cbaines.net
Christopher Baines <mail@cbaines.net> writes:
Toggle quote (15 lines)> I noticed through the Guix Data Service that some narinfo files from> ci.guix.gnu.org have an excessive NarSize.>> These are the three that I've found, but there could be more.>>> /gnu/store/qln574djfgl8h9glij9id8jips7nnrlw-flightgear-2018.3.5> NarSize: 18446744072099351584>> /gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1> NarSize: 18446744071612438544>> /gnu/store/wd9z64xpck56xzf52jwlpg8vb610b0ym-repeat-masker-4.1.1> NarSize: 18446744071612438544
Guix gives the following error when it encounters one of these badnarinfos:
error: integer expected from stream
-----BEGIN PGP SIGNATURE-----
iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmAYXRJfFIAAAAAALgAoaXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNFODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2JhaW5lcy5uZXQACgkQXiijOwuE9XfnohAAuNiVFQGrybaWryk6qXtAq2nfelX0wx29Ew5acRsqz3X8lXLQhsZqKM3JLK3jpuDfsgkhI5m2HLlILkwbq4VgMAgyHNCyASZUtjcghSEZ6inUmBe5S5V+GrCR7cvrh3pwL6X/XFxACdQzaRTtc5BfjS/tzt/gRIbNYc4V/Avq4d7UhkNBTKsghiYeJz/hl1KDCvpEwFqdS1nc57RlaPBEfJ6LJ7geervDLRqg7dGx19A8lULFE2myJEyF9rkuNOgFSlMR4hoxwbi3qURyad77roFC6qf70zzjy4nGeQLQf6hFQPyLJusRraQc0KZsQWrBoutAcjcnE0Dkn30nEIDw2BsxsWk/ttUxxgZF+f7pWbrYNHxtakRoaO/Y0HRxfFkcq8VnHPJmiZJFLWPA7XwBOVmw8gG17Vh5YfNvIT7y5xvaI0FvFFMeRPiLCy2pjEfMRWE46HHaP2HJmCobLy/UinuB24kGVIHSno8WCgn3rvb9ABWnKygHkT1Du+mHd2ldh27uQvxvUJltcYChrgQWm+C1gUUplwg4J6VHeh+J/uqAvWQD9QlLjCsh4TODm1YT36E4leWvRHhvpkgoJgpd1GfltmBhpClrFeXDtKzFzjlcFj62a6qJP3KU5sPbGjxAjILAWMBCIQVteYHa2iP6Wl/Q8LRhhk8qXgNowRKvtto==TA04-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 2 Feb 22:48 +0100
Re: bug#46212: ci.guix.gnu.org narinfos with excessive NarSize
(name . Christopher Baines)(address . mail@cbaines.net)(address . 46212@debbugs.gnu.org)
87eehycf9w.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:
Toggle quote (22 lines)> Christopher Baines <mail@cbaines.net> writes:>>> I noticed through the Guix Data Service that some narinfo files from>> ci.guix.gnu.org have an excessive NarSize.>>>> These are the three that I've found, but there could be more.>>>>>> /gnu/store/qln574djfgl8h9glij9id8jips7nnrlw-flightgear-2018.3.5>> NarSize: 18446744072099351584>>>> /gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1>> NarSize: 18446744071612438544>>>> /gnu/store/wd9z64xpck56xzf52jwlpg8vb610b0ym-repeat-masker-4.1.1>> NarSize: 18446744071612438544>> Guix gives the following error when it encounters one of these bad> narinfos:>> error: integer expected from stream
I guess ‘guix substitute --query’ reads the narinfo, passes the negativeinteger as is to the daemon (for ‘query-path-info’ RPCs), whichrightfully complains.
It would be nice for ‘guix publish’ to not emit broken narinfos, andperhaps we can also add extra checks in the (guix narinfo) reader.
Ludo’.
L
L
Ludovic Courtès wrote on 19 Feb 16:11 +0100
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
87blcgm6st.fsf@gnu.org
Hi Mathieu,
Did you eventually find out where the negative size comes from?
https://issues.guix.gnu.org/46212
What should we do in your opinion with database entries that have anegative size?
Thanks,Ludo’.
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (27 lines)> Christopher Baines <mail@cbaines.net> skribis:>>> /gnu/store/qln574djfgl8h9glij9id8jips7nnrlw-flightgear-2018.3.5>> NarSize: 18446744072099351584>>>> /gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1>> NarSize: 18446744071612438544>>>> /gnu/store/wd9z64xpck56xzf52jwlpg8vb610b0ym-repeat-masker-4.1.1>> NarSize: 18446744071612438544>> The key point here is that ‘narSize’ in the database is negative:>> sqlite> select * from validpaths where path = '/gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1';> 43262123|/gnu/store/qhix6afvy2a6n7hlx4qgdns461p8kdnv-repeat-masker-4.1.1|sha256:33328e16d8d83dcf1a6e031598dbc517aff18e6c7ccd55f7894102bab55fcdb9|1611849907|/gnu/store/rr532q5fmwj1gafdgk6nhxg9khnbsw3z-repeat-masker-4.1.1.drv|-2097113072>> The actual nar size in this case is just above 2³¹ so most likely we’re> seeing a signed integer wrapping error.>> I believe this is a very recent regression (the registration data for> the item above is Jan. 28th); we have older store items with a correct> ‘narSize’, such as> <https://ci.guix.gnu.org/nm6w84c9zj3yiylal3dk1sqzxq11sjzw.narinfo>.>> Thoughts?>> Ludo’.
M
M
Mathieu Othacehe wrote on 22 Feb 09:59 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
87blcciike.fsf@gnu.org
Hey Ludo,
Toggle quote (7 lines)> Did you eventually find out where the negative size comes from?>> https://issues.guix.gnu.org/46212>> What should we do in your opinion with database entries that have a> negative size?
I didn't look closely to this problem yet. However, I fixed an issuewith locales in the remote building mechanism that caused publish servercrashes:https://lists.gnu.org/archive/html/bug-guix/2021-02/msg00231.html.
That's maybe somehow related.
Thanks,
Mathieu
L
L
Ludovic Courtès wrote on 22 Feb 14:03 +0100
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
87a6rwb6go.fsf@gnu.org
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
Toggle quote (12 lines)>> Did you eventually find out where the negative size comes from?>>>> https://issues.guix.gnu.org/46212>>>> What should we do in your opinion with database entries that have a>> negative size?>> I didn't look closely to this problem yet. However, I fixed an issue> with locales in the remote building mechanism that caused publish server> crashes:> https://lists.gnu.org/archive/html/bug-guix/2021-02/msg00231.html.
Hmm I don’t think so.
The bug here is likely due to 32-bit signed integer wrapping. That canonly happen in C code, so to me possible culprits would beguile-simple-zmq or the layer above it (if there’s a binary protocolinvolved) or the postgresql interface. Only a vague intuition, though.
Ludo’.
M
M
Mathieu Othacehe wrote 6 days ago
(name . Ludovic Courtès)(address . ludo@gnu.org)
87blc9oj70.fsf@gnu.org
Hey,
Toggle quote (5 lines)> The bug here is likely due to 32-bit signed integer wrapping. That can> only happen in C code, so to me possible culprits would be> guile-simple-zmq or the layer above it (if there’s a binary protocol> involved) or the postgresql interface. Only a vague intuition, though.
Hmm, looks like you're right! There's a memory corruption in theremote-server process that's really hard to reproduce. I suspect the ZMQlibrary or its Guile bindings.
I'm trying to valgrind the process to identify the issue, without muchsuccess for now.
Thanks,
Mathieu
?