detect loops in module/package graph

  • Open
  • quality assurance status badge
Details
4 participants
  • Ludovic Courtès
  • Mark H Weaver
  • raingloom
  • zimoun
Owner
unassigned
Submitted by
raingloom
Severity
normal
R
R
raingloom wrote on 5 Oct 2021 02:58
(name . Guix Bugs)(address . bug-guix@gnu.org)
20211005025819.3f7756d7@riseup.net
I'll be short and blunt, currently it sucks big time whenever you have
a loop in your packages.
There is no indication other than Guix hanging forever without any
output.

I don't want to spend my time manually finding loops in graphs,
computers are better at that.

Sadly I don't know when I'll have time to implement this, so if someone
knows of a solution, they should not hesitate with sending a patch and
making all our lives easier.
M
M
Mark H Weaver wrote on 5 Oct 2021 09:59
87czojkilc.fsf@netris.org
Hi,

raingloom <raingloom@riseup.net> writes:
Toggle quote (3 lines)
> I'll be short and blunt, currently it sucks big time whenever you have
> a loop in your packages.

Agreed. I've been concerned about this problem since the early days of

Back in August 2014, there was a strongly connected component (SCC)
containing 51 package modules:

(acl algebra attr avahi base curl cyrus-sasl dejagnu docbook doxygen
fltk fontutils gcc gd gdb gettext ghostscript gl glib gnome gnupg
gnutls graphviz groff gsasl gstreamer gtk iso-codes lesstif
libcanberra linux lisp maths mit-krb5 mpi netpbm openldap pdf
pulseaudio rrdtool shishi ssh tcl tcsh texinfo texlive valgrind web
xiph xml xorg)

Since then, that SCC has grown to 277 package modules:

(acl admin aidc algebra apr aspell assembly astronomy attr audio
augeas authentication autogen autotools avahi backup base bash bdw-gc
bison bittorrent boost build-tools c calendar cdrom certs check cmake
code compression cook cpio cpp crates-graphics crates-gtk crates-io
cross-base crypto cryptsetup cups curl cyrus-sasl databases
datastructures dav debian dejagnu dictionaries disk django djvu dns
docbook docker documentation ebook ed elf emacs emacs-xyz enchant
enlightenment fabric-management fcitx file-systems finance firmware
flex fltk fonts fontutils freedesktop freeipmi ftp game-development
gawk gcc gd gdb geo gettext ghostscript gimp gl glib gnome gnome-xyz
gnu-doc gnunet gnupg gnuzilla golang gpodder gps graphics graphviz
groff groovy gsasl gstreamer gtk guile guile-xyz gv haskell
haskell-apps haskell-check haskell-crypto haskell-web haskell-xyz
hurd ibus icu4c image image-processing imagemagick inkscape iso-codes
java java-compression javascript jemalloc jupyter kde kde-frameworks
kde-internet kde-pim kde-plasma kerberos language less lesstif
libcanberra libedit libevent libffi libftdi libidn libphidget
libreoffice libunistring libusb linphone linux lirc lisp lisp-xyz
llvm logging lsof lua mail man markup maths matrix maven
maven-parent-pom mcrypt mes messaging mingw monitoring mono mp3 mpd
mpi multiprecision music nano ncurses netpbm nettle networking nfs
ninja node noweb nss ocaml ocr onc-rpc openbox opencl openldap
openstack package-management parallel password-utils patchutils
pciutils pcre pdf perl perl-check perl-compression perl-maths
perl-web photo php plotutils polkit popt pretty-print protobuf
pulseaudio python python-check python-compression python-crypto
python-science python-web python-xyz qt rails rdesktop rdf readline
rpc rrdtool rsync ruby rust rust-apps samba scanner scheme screen sdl
search security-token selinux serialization shells slang speech
sphinx spice sqlite ssh sssd suckless swig sync tcl telephony
terminals tex texinfo text-editors textutils time tls tor uml upnp
valgrind version-control video vim virtualization vpn vulkan w3m web
web-browsers webkit wget wine wordnet wxwidgets xdisorg xfig xiph xml
xorg)

There's also a second, much smaller SCC:

(bioconductor bioinformatics cran graph machine-learning statistics)

Toggle quote (7 lines)
> I don't want to spend my time manually finding loops in graphs,
> computers are better at that.
>
> Sadly I don't know when I'll have time to implement this, so if someone
> knows of a solution, they should not hesitate with sending a patch and
> making all our lives easier.

I've attached a script that I hacked up in 2014 to analyze the Guix
package module dependency graph. Note the (chdir "gnu/packages") in the
middle of it, so it must be loaded from the top directory of the Guix
source code, and the REPL will be in "gnu/packages" after loading it.
Here's an example of its use:

Toggle snippet (59 lines)
mhw@jojen ~/guix$ guile -l cycle-viewer.scm
Found the following non-trivial strongly-connected components:
(bioconductor
bioinformatics
cran
graph
statistics
machine-learning)

(autotools
perl
base
[… 272 lines elided …]
libftdi
perl-maths)

GNU Guile 3.0.2
Copyright (C) 1995-2020 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (length non-trivial-sccs)
$1 = 2
scheme@(guile-user)> (map length non-trivial-sccs)
$2 = (6 277)
scheme@(guile-user)> (first non-trivial-sccs)
$3 = (bioconductor bioinformatics cran graph statistics machine-learning)
scheme@(guile-user)> (second non-trivial-sccs)
$4 = (autotools perl base acl attr gettext check bash compression …)
scheme@(guile-user)> ,pp (edges-within (first non-trivial-sccs))
$5 = ((bioconductor . statistics)
(bioconductor . graph)
(bioconductor . cran)
(bioconductor . bioinformatics)
(bioinformatics . statistics)
(bioinformatics . machine-learning)
(bioinformatics . graph)
(bioinformatics . cran)
(bioinformatics . bioconductor)
(cran . statistics)
(cran . machine-learning)
(cran . graph)
(cran . bioinformatics)
(graph . statistics)
(graph . cran)
(graph . bioinformatics)
(graph . bioconductor)
(machine-learning . statistics)
(machine-learning . cran)
(statistics . machine-learning)
(statistics . cran))
scheme@(guile-user)> (with-output-to-file "/tmp/BIO-SCC.dot" (lambda () (write-scc-dot (first non-trivial-sccs))))
scheme@(guile-user)> (with-output-to-file "/tmp/MAIN-SCC.dot" (lambda () (write-scc-dot (second non-trivial-sccs))))
scheme@(guile-user)> ,quit

If someone would like to polish this into a more usable tool, and
possibly integrate it into Guix, please feel free.

Mark

--
Disinformation flourishes because many people care deeply about injustice
but very few check the facts. Ask me about https://stallmansupport.org.
M
M
Mark H Weaver wrote on 5 Oct 2021 10:03
87a6jnkie0.fsf@netris.org
Earlier, I wrote
Toggle quote (3 lines)
> I've attached a script that I hacked up in 2014 to analyze the Guix
> package module dependency graph.

Here's the script:
Attachment: cycle-viewer.scm
--
Disinformation flourishes because many people care deeply about injustice
but very few check the facts. Ask me about https://stallmansupport.org.
L
L
Ludovic Courtès wrote on 7 Oct 2021 15:28
(name . Mark H Weaver)(address . mhw@netris.org)
87o881c6b3.fsf@gnu.org
Hi Mark,

Mark H Weaver <mhw@netris.org> skribis:

Toggle quote (10 lines)
> raingloom <raingloom@riseup.net> writes:
>> I'll be short and blunt, currently it sucks big time whenever you have
>> a loop in your packages.
>
> Agreed. I've been concerned about this problem since the early days of
> Guix. See <https://bugs.gnu.org/18247>.
>
> Back in August 2014, there was a strongly connected component (SCC)
> containing 51 package modules:

Thanks for the analysis and for the updated patch!

Module cycles are something we allow and even rely on, so finding cycles
in itself is not necessarily helpful. What would help is finding cyclic
top-level references, which are those that cause problems, but that’s
another story.

WDYT?

Now, I’m not sure if raingloom was talking about these cycles, or rather
about cycles in the package dependency graph?

Chris Baines proposed a patch a while back to report those, though I
can’t find it anymore. IIRC, the difficulty was in making sure cycle
detection would not be too expensive, and in keeping a readable style.

Thanks,
Ludo’.
Z
Z
zimoun wrote on 11 Oct 2021 09:49
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ2RA3en1f2VirYFJakrHMashyE24J5wevq_pEmyd=-dDg@mail.gmail.com
Hi Ludo,

On Thu, 7 Oct 2021 at 15:34, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (19 lines)
> Mark H Weaver <mhw@netris.org> skribis:
> > raingloom <raingloom@riseup.net> writes:

> >> I'll be short and blunt, currently it sucks big time whenever you have
> >> a loop in your packages.
> >
> > Agreed. I've been concerned about this problem since the early days of
> > Guix. See <https://bugs.gnu.org/18247>.
> >
> > Back in August 2014, there was a strongly connected component (SCC)
> > containing 51 package modules:
>
> Thanks for the analysis and for the updated patch!
>
> Module cycles are something we allow and even rely on, so finding cycles
> in itself is not necessarily helpful. What would help is finding cyclic
> top-level references, which are those that cause problems, but that’s
> another story.

What Mark had implemented [1] works for any directed graph. What do
you mean by "top-level references"?


Toggle quote (3 lines)
> Now, I’m not sure if raingloom was talking about these cycles, or rather
> about cycles in the package dependency graph?

Probably. ;-)

But the way to detect cycle could be applied to any graph that Guix
uses. For instance, something totally irrelevant that I would never
think of: channels [2]. :-)


Toggle quote (4 lines)
> Chris Baines proposed a patch a while back to report those, though I
> can’t find it anymore. IIRC, the difficulty was in making sure cycle
> detection would not be too expensive, and in keeping a readable style.

From my memories about Graph Theory, the algorithm Mark is proposing
is an efficient way to detect cycles (cycle is strong connected
component). BTW, detect cycle is (almost) the same complexity as
topological sort; since cycles are obstacle for topological sort to
exist, somehow. I remember too the Chris's proposal but I do not
remember what they implemented.

I do not understand what you mean by "keeping a readable style".

All the best,
simon
L
L
Ludovic Courtès wrote on 12 Oct 2021 11:47
(name . zimoun)(address . zimon.toutoune@gmail.com)
87czoawp50.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (3 lines)
> What Mark had implemented [1] works for any directed graph. What do
> you mean by "top-level references"?

Reference to variables coming from one module of an SCC that appear at
the top level of another module in the SCC.

Toggle quote (7 lines)
>> Chris Baines proposed a patch a while back to report those, though I
>> can’t find it anymore. IIRC, the difficulty was in making sure cycle
>> detection would not be too expensive, and in keeping a readable style.
>
> From my memories about Graph Theory, the algorithm Mark is proposing
> is an efficient way to detect cycles

What I meant is that ‘package-derivation’ traverses the package graph,
so it’s a natural place to add cycle detection.

But since ‘package-derivation’ is so central, care must be taken about
the performance hit and about its readability.

Thanks,
Ludo’.
?