On +2020-02-15 21:01:36 +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
You may not need to parse the html fully if the part you need is
isolatable into delimited scopes that you can successively narrow.
For example, I while back I wanted a command I could type to get
the url of the latest linux kernel at kernel.org:
(oops, I see I din't use $0 in the usage text -- should be .scm, not -scm)
I offer it below [1], with the thought that you could probably
modify (not to mention improve :-) it to get the timestamps you want.
Especially if you could get them to make the narrow context unique enough
that it's delimiters can delimit it in one shot.
The page at kernel.org is apparently stable enough that this still works,
but YMMV until the snapshot page is similarly stable. (You could ask
HTH or is useful some way.
Toggle snippet (78 lines)
#!/usr/bin/bash
exec guile -e main -s "$0" "$@"
!#
;;;; stable-kernel.scm
;;;; goes to https://www.kernel.org/ to wget page, then
;;;; extracts name of latest stable release tarball to stdout
;;;;
(define (usage)
(format (current-error-port)
(string-join
'(
"Usage: stable-kernel-scm [ -h ]"
" -h for this message"
" (without args):"
" go to https://www.kernel.org/ to wget page,"
" extract URL of latest stable release tarball"
" and write that URL to stdout."
"")
"\n")))
(use-modules (ice-9 format))
(use-modules (ice-9 rdelim))
(use-modules (ice-9 popen))
(use-modules (ice-9 textual-ports))
(use-modules (ice-9 and-let-star))
(use-modules (ice-9 regex))
(define (extract-delimited str s-beg s-end)
(and-let*
((ix-beg (string-contains str s-beg))
(ix-post-beg (+ ix-beg (string-length s-beg)))
(ix-end (string-contains str s-end ix-post-beg)))
(substring str ix-post-beg ix-end)))
(define kernel-url "https://www.kernel.org/")
(define (get-kern-name)
(let*((cmd-kern (string-append "wget -q -O - " kernel-url))
(p-inp (open-input-pipe cmd-kern))
(wgot-pinp-str (get-string-all p-inp))
(extracted-table-releases
(extract-delimited wgot-pinp-str
"<table id=\"releases\">"
"</table>"))
(extracted-stable-tarball-anchor
(extract-delimited extracted-table-releases
"<td>stable:</td>"
">tarball<"))
(extracted-stable-href
(extract-delimited extracted-stable-tarball-anchor
"<a href=\""
"\"")))
(begin
extracted-stable-href)))
(define (main args)
(begin
(set! args (cdr args)) ;; always dump callee arg
(if (not (pair? args))
(set! args '("-do-default") ))
;; simple opts...
(cond
((string-prefix? "-h" (car args))
(begin (usage)
(exit)))
((string-prefix? "-do-default" (car args))
(let ((kern-name (get-kern-name)))
(display kern-name)(newline)))
(#t
(begin
(format (current-error-port) "Error: bad opt: '~a'\n" (car args))
(usage)
(exit))))))