(name . bug-guix)(address . bug-guix@gnu.org)
Hi,
our updater currently only supports FTP servers, but more and more
projects shutdown the FTP service and provide HTTP(S) servers only (e.g
the Linux kernel). For other projects, the main distribution point has
changed to HTTP and the mirrors still providing FTP at lagging (e.g.
KDE, see [1]).
A common case is to simply use Apache to serve the directories, but it
will deliver a HTML view on the directory contents (using mod_autoindex
[3]).
In [2] Ludo wrote:
So we need a way to list the latest releases somehow. If they publish
JSON, XML, or some other structured info format, that’s fine too. But
HTTP alone is not good: we’d have to infer the information from HTML
pages, which sounds fragile.
IMHO we can not expect project and mirror sites to provide these
additional data. Most projects simply will not do since this would
require the server to generate some data-files n the fly.
OTOH, I assume the delivered directory index pages to be well-formed
(X)HTML. Thus parsing the HTML should be quite simple: We only need to
pattern-match "<A>" tags, or – if guile has some decent one – a
xml/html-parser use this to query the data.
Only relative links without slash (except a trailing one) have to be
handled. Links with a trailing slash can be assumed to be a directories.
(Since auto-index only works if URL is pointing to a directory and the
directory is marked by a training slash we can assume the generated
links for directories will all have the trailing slash.) At least this
would be a good start which could be refined if necessary.
Please note tha I'm not suggesting to write a general-purpose parser,
but aiming for auto-index html-pages only.
Some things I already found out:
* Directory-listings generated by mod_autoindex can be provided as a
simple list by passing the query-parameter "F=0" in the URL [4].
There are other query parameters for sorting and pattern matching.
* nginx's "ngx_http_autoindex_module" [6] seem to not use query
parameters, but can be configured (on the server-side) to provide
the content as XML or json. The "fancy_index" module [7] si
documented to "Allow choosing to sort elements", but [7] does not
state how and if "fancy" can be switched off.
* Lighttp supports some of these options [5].
[5]
--
Regards
Hartmut Goebel
| Hartmut Goebel | h.goebel@crazy-compilers.com |
| www.crazy-compilers.com | compilers which you thought are impossible |