Ricardo Wurmus writes: > Next up: Seven of Nine, tertiary adjunct of unimatrix zero one: Ehe! I had to look up the reference; I'm not much of a Star Trek fan obviously :-P. >> From 37e499d5d5d5f690aa0a065c730e13f6a31dd30d Mon Sep 17 00:00:00 2001 >> From: Maxim Cournoyer >> Date: Thu, 28 Mar 2019 23:12:26 -0400 >> Subject: [PATCH 7/9] import: pypi: Include optional test inputs as >> native-inputs. >> >> * guix/import/pypi.scm (maybe-inputs): Add INPUT-TYPE argument, and use it. >> (test-section?): New predicate. >> (parse-requires.txt): Collect the optional test inputs, and return them as the >> second element of the returned list. > > AFAICT parse-requires.txt now returns a list of pairs, but used to > return a plain list before. Is this correct? Right, a list of two lists to be technically correct. >> (define (parse-requires.txt requires.txt) >> - "Given REQUIRES.TXT, a Setuptools requires.txt file, return a list of >> -requirement names." >> - ;; This is a very incomplete parser, which job is to select the non-optional >> - ;; dependencies and strip them out of any version information. >> + "Given REQUIRES.TXT, a Setuptools requires.txt file, return a pair of requirements. >> + >> +The first element of the pair contains the required dependencies while the >> +second the optional test dependencies. Note that currently, optional, >> +non-test dependencies are omitted since these can be difficult or expensive to >> +satisfy." >> + >> + ;; This is a very incomplete parser, which job is to read in the requirement >> + ;; specification lines, and strip them out of any version information. >> ;; Alternatively, we could implement a PEG parser with the (ice-9 peg) >> ;; library and the requirements grammar defined by PEP-0508 >> ;; (https://www.python.org/dev/peps/pep-0508/). > > Does it really return a pair? Or a list of pairs? Or is it a > two-element list of lists? The latter! I've fixed the docstring accordingly. >> (call-with-input-file requires.txt >> (lambda (port) >> - (let loop ((result '())) >> + (let loop ((required-deps '()) >> + (test-deps '()) >> + (inside-test-section? #f) >> + (optional? #f)) >> (let ((line (read-line port))) >> - ;; Stop when a section is encountered, as sections contains optional >> - ;; (extra) requirements. Non-optional requirements must appear >> - ;; before any section is defined. >> - (if (or (eof-object? line) (section-header? line)) >> + (if (eof-object? line) >> ;; Duplicates can occur, since the same requirement can be >> ;; listed multiple times with different conditional markers, e.g. >> ;; pytest >= 3 ; python_version >= "3.3" >> ;; pytest < 3 ; python_version < "3.3" >> - (reverse (delete-duplicates result)) >> + (map (compose reverse delete-duplicates) >> + (list required-deps test-deps)) > > Looks like a list of lists to me. “delete-duplicates” now won’t delete > a name that is in both “required-deps” as well as in “test-deps”. Is > this acceptable? It is acceptable, as this corner case cannot exist given the current code (a requirement can exist in either required-deps or test-deps, but never in both). It also doesn't make sense that a run time requirement would also be listed as a test requirement, so that corner case is not likely to exist in the future either. > Personally, I’m not a fan of using data structures for returning > multiple values, because we can simply return multiple values. I thought the Guile supported multiple values return value would be great here as well, but I've found that for this specific case here, a list of lists worked better, since the two lists contain requirements to be processed the same, which "map" can readily do (i.e. less ceremony is required). > Or we could have more than just strings. The meaning of these strings > is provided by the bin into which they are thrown — either > “required-deps” or “test-deps”. It could be an option to collect tagged > values instead and have the caller deal with filtering. Sounds neat, but I'd rather punt on this one for now. >> (define (parse-wheel-metadata metadata) >> - "Given METADATA, a Wheel metadata file, return a list of requirement names." >> + "Given METADATA, a Wheel metadata file, return a pair of requirements. >> + >> +The first element of the pair contains the required dependencies while the second the optional >> +test dependencies. Note that currently, optional, non-test dependencies are >> +omitted since these can be difficult or expensive to satisfy." >> ;; METADATA is a RFC-2822-like, header based file. > > This sounds like this is going to duplicate the previous procedures. Instead of duplicating the docstring, I'm now referring to that of PARSE-REQUIRES.TXT for PARSE-WHEEL-METADATA. The two procedures share the same interface, but implement a different parser, which consist of mostly a loop + conditional branches. IMHO, it's not worth, or even, desirable to try to merge those two parsers into one; I prefer to keep the logic of two distinct parsers separate. >> (define (requires-dist-header? line) >> ;; Return #t if the given LINE is a Requires-Dist header. >> - (regexp-match? (string-match "^Requires-Dist: " line))) >> + (string-match "^Requires-Dist: " line)) >> >> (define (requires-dist-value line) >> (string-drop line (string-length "Requires-Dist: "))) >> >> (define (extra? line) >> ;; Return #t if the given LINE is an "extra" requirement. >> - (regexp-match? (string-match "extra == " line))) >> + (string-match "extra == '(.*)'" line)) > > These hunks should be part of the previous patch where they were > introduced. (See my comments there about “regexp-match?”.) Done. >> + (define (test-requirement? line) >> + (let ((extra-label (match:substring (extra? line) 1))) >> + (and extra-label (test-section? extra-label)))) > > You can use “and=>” instead of binding a name: > > (and=> (match:substring (extra? line) 1) test-section?) Neat! I still don't have the reflex to use "and=>", thanks for reminding me about it! >> (call-with-input-file metadata >> (lambda (port) >> - (let loop ((requirements '())) >> + (let loop ((required-deps '()) >> + (test-deps '())) >> (let ((line (read-line port))) >> - ;; Stop at the first 'Provides-Extra' section: the non-optional >> - ;; requirements appear before the optional ones. >> (if (eof-object? line) >> - (reverse (delete-duplicates requirements)) >> + (map (compose reverse delete-duplicates) >> + (list required-deps test-deps)) >> (cond >> ((and (requires-dist-header? line) (not (extra? line))) >> (loop (cons (specification->requirement-name >> (requires-dist-value line)) >> - requirements))) >> + required-deps) >> + test-deps)) >> + ((and (requires-dist-header? line) (test-requirement? line)) >> + (loop required-deps >> + (cons (specification->requirement-name (requires-dist-value line)) >> + test-deps))) >> (else >> - (loop requirements))))))))) >> + (loop required-deps test-deps))))))))) ;skip line > > Could you refactor this so that the other parser can be reused? If not, > the same comments about “if” and “cond” and the use of pairs/lists > instead of “values” applies here. Done w.r.t. using only "cond" rather than "if" + "cond". As explained above, the docstring's body now refer to the documentation of PARSE-REQUIRES.TXT; otherwise I prefer to keep the parsers separate. >> (define (guess-requirements source-url wheel-url archive) >> - "Given SOURCE-URL, WHEEL-URL and a ARCHIVE of the package, return a list >> + "Given SOURCE-URL, WHEEL-URL and an ARCHIVE of the package, return a list >> of the required packages specified in the requirements.txt file. ARCHIVE will >> be extracted in a temporary directory." >> >> @@ -244,7 +289,10 @@ cannot determine package dependencies") (file-extension url)) >> (metadata (string-append dirname "/METADATA"))) >> (call-with-temporary-directory >> (lambda (dir) >> - (if (zero? (system* "unzip" "-q" wheel-archive "-d" dir metadata)) >> + (if (zero? >> + (parameterize ((current-error-port (%make-void-port "rw+")) >> + (current-output-port (%make-void-port "rw+"))) >> + (system* "unzip" wheel-archive "-d" dir metadata))) >> (parse-wheel-metadata (string-append dir "/" metadata)) >> (begin >> (warning >> @@ -283,32 +331,41 @@ cannot determine package dependencies") (file-extension url)) >> (warning >> (G_ "Failed to extract file: ~a from source.~%") >> requires.txt) >> - '()))))) >> - '()))) >> + (list '() '())))))) >> + (list '() '())))) > > I would like to see cosmetic changes like these three hunks in separate > commits. Done for the first two hunks; the last one isn't cosmetic; it changes the default return from an empty list to a list of two empty lists. >> (define (compute-inputs source-url wheel-url archive) >> - "Given the SOURCE-URL of an already downloaded ARCHIVE, return a list of >> -name/variable pairs describing the required inputs of this package. Also >> + "Given the SOURCE-URL and WHEEL-URL of an already downloaded ARCHIVE, return >> +a pair of lists, each consisting of a list of name/variable pairs, for the >> +propagated inputs and the native inputs, respectively. Also >> return the unaltered list of upstream dependency names." >> - (let ((dependencies >> - (remove (cut string=? "argparse" <>) >> - (guess-requirements source-url wheel-url archive)))) >> - (values (sort >> - (map (lambda (input) >> - (let ((guix-name (python->package-name input))) >> - (list guix-name (list 'unquote (string->symbol guix-name))))) >> - dependencies) >> - (lambda args >> - (match args >> - (((a _ ...) (b _ ...)) >> - (string-ci> - dependencies))) >> + >> + (define (strip-argparse deps) >> + (remove (cut string=? "argparse" <>) deps)) >> + >> + (define (requirement->package-name/sort deps) >> + (sort >> + (map (lambda (input) >> + (let ((guix-name (python->package-name input))) >> + (list guix-name (list 'unquote (string->symbol guix-name))))) >> + deps) >> + (lambda args >> + (match args >> + (((a _ ...) (b _ ...)) >> + (string-ci> + >> + (define process-requirements >> + (compose requirement->package-name/sort strip-argparse)) >> + >> + (let ((dependencies (guess-requirements source-url wheel-url archive))) >> + (values (map process-requirements dependencies) >> + (concatenate dependencies)))) > > Giving names to these processing steps is fine and improves clarity, but > I’m not so comfortable about returning ad-hoc pairs *and* multiple > values (like above). Using "values" is required by the API of the recursive importer. >> + (match guix-dependencies >> + ((required-inputs test-inputs) >> + (values >> + `(package >> + (name ,(python->package-name name)) >> + (version ,version) >> + (source (origin >> + (method url-fetch) >> + ;; Sometimes 'pypi-uri' doesn't quite work due to mixed >> + ;; cases in NAME, for instance, as is the case with >> + ;; "uwsgi". In that case, fall back to a full URL. >> + (uri (pypi-uri ,(string-downcase name) version)) >> + (sha256 >> + (base32 >> + ,(guix-hash-url temp))))) >> + (build-system python-build-system) >> + ,@(maybe-inputs required-inputs 'propagated-inputs) >> + ,@(maybe-inputs test-inputs 'native-inputs) >> + (home-page ,home-page) >> + (synopsis ,synopsis) >> + (description ,description) >> + (license ,(license->symbol license))) >> + ;; Flatten the nested lists and return the upstream >> + ;; dependencies. >> + upstream-dependencies)))))))) > > I don’t see anything being flattened here? Good catch! Seems a remnant of something that is now gone :-). Thanks for the review. Maxim