[PATCH 0/1] teams: Add packages stats script.

  • Open
  • quality assurance status badge
Details
2 participants
  • Sharlatan Hellseher
  • Troy Figiel
Owner
unassigned
Submitted by
Sharlatan Hellseher
Severity
normal
S
S
Sharlatan Hellseher wrote on 8 Nov 22:31 +0100
(address . guix-patches@gnu.org)(name . Sharlatan Hellseher)(address . sharlatanus@gmail.com)
cover.1731100267.git.sharlatanus@gmail.com
Hi Guix!

During working on python-team branch I aimed to fix as much packages as
possible to prepare it for the upcoming merge to master. I faced with a fact
that there is no (or maybe I do not know about) tooling providing some larger
scale stats for packages within team scope.

Just a simple reasoning: Which N packages may be updated without triggering
larger scale rebuilds? Which packages need to be update on team branch? Which
affect ratio does this package has? Which type of package modification would
trigger the rebuild of dependent (it was suprise that order of inputs does
trigger...)?

So...

Toggle quote (2 lines)
> ./pre-inst-env etc/teams-package-stats.scm stats python

Will generate a list of all packages for python team (defined for now as any
package which build-system is pyton or pyproject).

- module-file-name
- build-system-name
- package-name
- package-guix-version
- package-upstream-version
- all-inputs-count
- dependents-count
- affect-ratio"

Command `column' may be used to produce JSON for the farther analysys and
preparation.

e.g. some packages which have 0 impact if they are refreshed:
Toggle snippet (23 lines)
> column -s, -t 1731089576-python-team | sort -k8 -n | head -n20
deprecated-package pyproject python-language-server 1.11.0 nil 23 0 0.0
deprecated-package python beets-next 1.6.0 nil 31 0 0.0
deprecated-package python python-trytond-purchase 6.2.3 nil 21 0 0.0
gnu/packages/android.scm python fdroidserver 1.1.9 nil 21 0 0.0
gnu/packages/astronomy.scm pyproject ginga-qt5 5.1.0 nil 22 0 0.0
gnu/packages/astronomy.scm pyproject python-poliastro 0.17.0 nil 20 0 0.0
gnu/packages/bioinformatics.scm pyproject fanc 0-1.354401e nil 29 0 0.0
gnu/packages/bioinformatics.scm pyproject python-baltica 1.1.2 nil 27 0 0.0
gnu/packages/bioinformatics.scm pyproject python-episcanpy 0.4.0 nil 25 0 0.0
gnu/packages/bioinformatics.scm pyproject python-fanc 0.9.25 nil 26 0 0.0
gnu/packages/bioinformatics.scm pyproject python-hicexplorer 3.7.4 nil 27 0 0.0
gnu/packages/bioinformatics.scm pyproject python-liana-py 1.1.0 nil 25 0 0.0
gnu/packages/bioinformatics.scm pyproject python-metacells 0.9.4 nil 26 0 0.0
gnu/packages/bittorrent.scm python deluge 2.1.1 nil 21 0 0.0
gnu/packages/bootloaders.scm pyproject patman 2024.01 nil 20 0 0.0
gnu/packages/databases.scm pyproject datasette 1.0a7 nil 29 0 0.0
gnu/packages/databases.scm python python-pyarrow 0.16.0 nil 28 0 0.0
gnu/packages/finance.scm python electron-cash 4.4.1 nil 23 0 0.0
gnu/packages/genealogy.scm python gramps 5.1.4 nil 23 0 0.0
gnu/packages/gnome.scm python terminator 2.1.4 nil 20 0 0.0

I'm not confident in my Guile Scheme ;-) any review points are welcome.

CC core team for wider spread within teams.

Sharlatan Hellseher (1):
etc: Add teams-packages-stats script.

etc/teams-packages-stats.scm | 218 +++++++++++++++++++++++++++++++++++
1 file changed, 218 insertions(+)
create mode 100755 etc/teams-packages-stats.scm


base-commit: 2a6d96425eea57dc6dd48a2bec16743046e32e06
--
2.46.0
S
S
Sharlatan Hellseher wrote on 8 Nov 22:32 +0100
[PATCH 1/1] etc: Add teams-packages-stats script.
(address . 74268@debbugs.gnu.org)(name . Sharlatan Hellseher)(address . sharlatanus@gmail.com)
bdd10b14ee47680b9840ed75d0658c6978efe3b6.1731100267.git.sharlatanus@gmail.com
This is a proposal of the helper script which aims to asist in decision
making during cascade packages refresh task in the team scope.

* etc/teams-packages-stats.scm: New file.

Change-Id: I4af5ce1c3cbebed1793628229b29acba1f737c9d
---
etc/teams-packages-stats.scm | 218 +++++++++++++++++++++++++++++++++++
1 file changed, 218 insertions(+)
create mode 100755 etc/teams-packages-stats.scm

Toggle diff (226 lines)
diff --git a/etc/teams-packages-stats.scm b/etc/teams-packages-stats.scm
new file mode 100755
index 0000000000..a95d913a79
--- /dev/null
+++ b/etc/teams-packages-stats.scm
@@ -0,0 +1,218 @@
+#!/bin/sh
+# -*- mode: scheme; -*-
+# Extra care is taken here to ensure this script can run in most environments,
+# since it is invoked by 'git send-email'.
+pre_inst_env_maybe=
+command -v guix > /dev/null || pre_inst_env_maybe=./pre-inst-env
+exec $pre_inst_env_maybe guix repl -- "$0" "$@"
+!#
+
+;;; GNU Guix --- Functional package management for GNU
+;;; Copyright © 2024 Sharlatan Hellseher <sharlatanus@mgail.com>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix. If not, see <http://www.gnu.org/licenses/>.
+
+;;; This file returns a manifest containing origins of all the packages. The
+;;; main purpose is to allow continuous integration services to keep upstream
+;;; source code around. It can also be passed to 'guix weather -m'.
+
+;;; Commentary:
+
+;; This code defines helpers for cascade packages refresh withing team scopes.
+;; The output may be piped to CLI commands like awk, column to compile a
+;; dataframe (e.g. JSON).
+;;
+;; ~$ column \
+;; --json \
+;; --table \
+;; --separator=, \
+;; --table-columns=module-file-name,build-system-name,package-name,\
+;; package-guix-version,package-upstream-version,all-inputs-count,\
+;; dependents-count,affect-ratio \
+;; <output> \
+;; > <output>.json
+;;
+;; TODO:
+;; - Implement manifests per team on some gradual criterias
+;; - Add more controls via command-line options
+;; - Improve the performance of dependents calculation, it takes about 30min
+;; to provide a list for packages with python/pyproject build system
+;; - Add save as JSON,CSV data formats for father analysis
+
+
+;;; Code:
+
+(use-modules (git)
+ (gnu packages)
+ (guix build-system)
+ (guix diagnostics)
+ (guix discovery)
+ (guix gnupg)
+ (guix graph)
+ (guix hash)
+ (guix monads)
+ (guix packages)
+ (guix profiles)
+ (guix scripts graph)
+ (guix scripts)
+ (guix store)
+ (guix ui)
+ (guix upstream)
+ (guix utils)
+ (ice-9 format)
+ (ice-9 match)
+ (ice-9 rdelim)
+ (ice-9 regex)
+ (srfi srfi-1)
+ (srfi srfi-26)
+ (srfi srfi-37)
+ (srfi srfi-71)
+ (srfi srfi-9))
+
+(define* (packages-by-team #:key (team "all"))
+ "Return the list of packages for the TEAM by certain criteria or fail over
+to all packages available."
+ (cond
+ ((string=? team "go")
+ (fold-packages
+ (lambda (package result)
+ (if (or (eq? (build-system-name (package-build-system package))
+ (quote go))
+ ;; XXX: Add other checks such Go is in inputs*.
+ )
+ (cons package result) result)) (list)))
+ ((string=? team "python")
+ (fold-packages
+ (lambda (package result)
+ (if (or (eq? (build-system-name (package-build-system package))
+ (quote pyproject))
+ (eq? (build-system-name (package-build-system package))
+ (quote python)))
+ (cons package result) result)) (list)))
+ ((string=? team "ruby")
+ (fold-packages
+ (lambda (package result)
+ (if (or (eq? (build-system-name (package-build-system package))
+ (quote ruby))
+ ;; XXX: Add other checkes such Ruby is in inputs*.
+ )
+ (cons package result) result)) (list)))
+ (else
+ (fold-packages
+ (lambda (package result)
+ (if (package-superseded package)
+ result
+ (cons package result)))
+ '()
+ #:select? (const #true)))))
+
+(define (dependents-count package)
+ "Return the count of requiring rebuild packages when PACKAGE is updated."
+ (with-error-handling ;; XXX: Taken from guix scripts refresh
+ (with-store store
+ (run-with-store store
+ (mlet %store-monad ((edges
+ (node-back-edges %bag-node-type
+ (package-closure (packages-by-team)))))
+ (let* ((dependents
+ (node-transitive-edges (list package) edges)))
+ (return (length dependents))))))))
+
+(define* (stats team
+ #:key (build-systems '())
+ (check-dependents? #false)
+ (check-deprecated? #false)
+ (check-upstream-version? #false)
+ (dependents-threshold-ratio 0.001)
+ (inputs-threshold 0))
+ "Return a detailed stats for the given TEAM packages which may help to make
+a decision during cascade updates.
+
+Parameters:
+- build-system :: The optional list of build system names to select.
+
+- check-dependents? :: Whether to query or not the dependents count, it might
+take time for a long list of provided packages.
+
+- check-deprecated? :: Whether to show or not the deprecated packages.
+
+- check-upstream-version? :: Check for the latest available version on
+upstream.
+
+- dependents-threshold-ratio :: Print out only packages which dependent count
+ration is bigger or equal given threshold. (dependents/all-packages * 100.0).
+
+- inputs-threshold :: The minimum number of inputs which package needs to
+have.
+
+Returns:
+- module-file-name
+- build-system-name
+- package-name
+- package-guix-version
+- package-upstream-version
+- all-inputs-count
+- dependents-count
+- affect-ratio"
+ (let ((team-packages (packages-by-team #:team team))
+ (all-packages-count (length (packages-by-team))))
+ (map (lambda (package)
+ (let ((all-inputs-count
+ (+ (length (package-inputs package))
+ (length (package-native-inputs package))
+ (length (package-propagated-inputs package))))
+ (module-path
+ (false-if-exception
+ (location-file (package-definition-location package))))
+ (build-system-name
+ (build-system-name (package-build-system package))))
+ (if (>= all-inputs-count inputs-threshold)
+ (let* ((dependents
+ (if check-dependents?
+ (dependents-count package)
+ "nil"))
+ (affect-ratio
+ (if check-dependents?
+ (* (/ dependents all-packages-count) 100.0)
+ "nil")))
+ (format #true "~{~a,~}~8f~%"
+ (list
+ (if (string? module-path)
+ module-path
+ "deprecated-package")
+ build-system-name
+ (package-name package)
+ (package-version package)
+ (if check-upstream-version? "TBA" "nil")
+ all-inputs-count
+ dependents)
+ affect-ratio)))))
+ team-packages)))
+
+(define (main . args)
+ (match args
+ (("stats" . team-name)
+ (apply (stats (car team-name) #:check-dependents? #true)))
+ (anything
+ (format (current-error-port)
+ "Usage: etc/teams-packages-stats.scm <command> [<args>]
+
+Commands:~
+ stats <team-name>
+ get a list of packages belonging to the given <team-name> with basic
+ affect ratio, which may help to plan cascade packages refresh task.%"))))
+
+(apply main (cdr (command-line)))
--
2.46.0
T
T
Troy Figiel wrote on 10 Nov 17:44 +0100
(name . Sharlatan Hellseher)(address . sharlatanus@gmail.com)(address . 74268@debbugs.gnu.org)
6e0a2f0b064817a1882eb81a3d977c7f806f4cc6.camel@troyfigiel.com
Hi Oleg,

I do not have too much to add, but just wanted to mention this solves a
pain point I had a while back. I thought it would be nice to have a
tool that in some sense does the opposite of `guix refresh -l' and
`guix refresh -T', showing me all packages with a given number of
dependencies. When you are new to updating packages in Guix, it can be
quite difficult to find some simple candidates.

Two other points that come to mind regarding your script:
- jsonl output probably gives you more flexibility in the future.
- I could imagine you might want to filter for module paths.

As a sidenote, when I run your script, the final 0.0 is often
misaligned. It seems to have extra spaces.

In any case, I think it is a nice tool to have.

Best wishes,

Troy
-----BEGIN PGP SIGNATURE-----

iQJIBAABCgAyFiEE5HwNzSdo36E4/NzWxnyRgbOJP7AFAmcw4vcUHHRyb3lAdHJv
eWZpZ2llbC5jb20ACgkQxnyRgbOJP7Cy1A//aGZux8JJDFKo1h3DtFgbCpTac6TP
PwGKz0vl0TYWiEbDl92AOkq0RAJFgZE4IyUmBNwSfRaw2QrbwXpnhbQtElZ8bf6W
0dCoze3S3rd/LCL2tRr/3QIWV648GXXXugEwC+BGLe7H4gstnCMGtpkyc466AFpr
JwSRZSC4IOOvwaAtY62Z0JsrbCgEfKfbyFXEpnepCl8huk8LKz8OhnUtv5a5V7Ln
0+991YANuavgaBOBwTxzn3huPr24CeGI7959KsCpy4TkeKEvK9RtqYXEwGW3odXT
DwuGB4UN6qgpPSy3AxPC/BLpwE+mQJJHqLg1VkZOQEHwlzLlxG+Vm94f7I+UnPbR
lTQEoOy2ei+5opxCURS6VhZOl2F5tYngvvd6WCWXIvQ26AxXjrZwsB1lv7X9RVMV
R7oDp7htGUdUU5mMlFZ8kZTE3wX3s8h/iUyXU+BHi+K4TKVqbDDH1rtrS0EJPxoj
qqDcdsaBK9az9pBrQqclx2TZU/7QOmOsuRr9zqt1SqPfvUtBtkhPkJqSvBUQgTte
JrOr+PdWVv/FX2nOBRJY2UTc3M4aARLFWFbl/dxPOzr8TtTlxzKtPGN6E0n37XCY
ZLTnGIz5A6g2eUe8+5Gt9dvxGm18f2ia68gAZHlyng+1UXGiYlEoSav/2YSp2AlG
5mILycn/YNUOzoI=
=+44T
-----END PGP SIGNATURE-----


?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 74268@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 74268
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch