sxml simple: sxml->xml mishandles namespaces?

  • Open
  • quality assurance status badge
Details
4 participants
  • John Cowan
  • Ricardo Wurmus
  • tomas
  • Andy Wingo
Owner
unassigned
Submitted by
tomas
Severity
normal
T
(address . bug-guile@gnu.org)
20150415194714.GA30295@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I posted more details on guile-devel. Perhaps this was the wrong list?

When transforming SXML to XML, namespaces don't seem to be handled
properly:


#!/usr/bin/guile -s
!#
(use-modules (sxml simple))
;; An XML with two namespaces (one default)
(define the-svg "<svg xmlns='http://www.w3.org/2000/svg'
<rect x='5' y='5' width='20' height='20'
stroke-width='2' stroke='purple' fill='yellow'
id='rect1' />
<rect x='30' y='5' width='20' height='20'
ry='5' rx='8' stroke-width='2' stroke='purple' fill='blue'
xlink:href='#rect1' />
</svg>")
;; Note how SXML handles QNames (just concatenating NS and
;; local-name with a colon):
(define the-sxml
(with-input-from-string the-svg xml->sxml))
(format #t "~A\n" the-sxml)
;; If we try to serialize this: kaboom!
(sxml->xml the-sxml)

The parsing into SXML goes well, the (format ...) outputs what
I'd expect. But the (sxml->xml ...) dies with:

ERROR: In procedure scm-error:
ERROR: Invalid QName: more than one colon http://www.w3.org/2000/svg:svg

The problem is that SXML used the concatenated (full) namespace with the
name as tag (and attribute) names for namespaced items. When serializing
to XML it should try to find abbreviations for those namespaces and issue
the corresponding namespace declarations.

Instead, sxml->xml tries to split the (namespace:name) combination
at the first colon and to check the name -- and fails miserably at
(namespace:name) combinations à la "http://www.w3.org/1999/xlink:href"
(procedure check-name). Since there are two colons, the name part
has now a colon.

There are more details at:


with a first attempt at a patch against guile (GNU Guile) 2.0.5-deb+1-3.
I'm more than willing to beat the patch into shape, but will possibly
need some guidance. Perhaps I'd need to sign papers with the FSF, which
I'd gladly do.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlUuwEIACgkQBcgs9XrR2kbJWQCfQ/ALFQrf0crOK47SbaOlJlMv
MwAAn3fxDBWOhgNF0L7E35k0skol2T0V
=FIId
-----END PGP SIGNATURE-----
T
[PATCH] sxml->xml and namespaces: updated patch
(address . 20339@debbugs.gnu.org)
20150420074517.GA31087@tuxteam.de
Hi,

I've embellished my proposed patch a bit:

- use values resp. call-with-values instead of passing around
lists.

This was one thing I didn't like about my first patch candidate:
the namespace --> ns abbreviation lookup had two things to return,
for noe the abbreviation, and whether this abbreviation was "new"
(for convenience in the form of a (namespace . abbreviation) pair).
Instead of returning a list, now it returns multiple values.

- patch is now against current stable instead of against "whatever
Debian stable packages", i.e. against

d680713 2015-04-03 16:35:54 +0200 Ludovic Courtès (stable-2.0) doc: Update libgc URL.

I'm still not sure whether this is the way to go (i.e. mixing the
abbreviation stuff into the serialization), or whether a pre-pass
(replacing namespaces by abbreviations and generating the namespace
declaration "attributes") would be the way to go.

Besides, I'd like to have some input on whether it'd be worth to
follow the usual convention and to put the namespace declarations
before regular attributes (forcing us to do two passes on a tag
node's attribute list). The generated XML looks pretty weird as
is now.

What I'd still like to introduce is a "mapping preference" as an
optional argument by the user, possibly per-node (like "I'd like
'http://www.w3.org/1999/xlink'to be abbreviated as 'xlink' or
something like that). Other XML serializers offer that. I envision
this as a function, the library would fall back to generate the
abbreviation whenever the function returns #f.

The question on whether this patch (or whatever it evolves into)
has a chance of getting into Guile is still open: I'd have to
get my papers from the FSF in this case.

Inputs?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU0rowACgkQBcgs9XrR2kZRwACffTrZx5cCTIr7pMETu2kLbqvZ
H8kAnAq9DYpMgKjL7sRpox496i/QN7Dl
=Yxx8
-----END PGP SIGNATURE-----


R
R
Ricardo Wurmus wrote on 21 Apr 2015 11:24
Re: bug#20339: sxml simple: sxml->xml mishandles namespaces?
(address . tomas@tuxteam.de)(address . 20339@debbugs.gnu.org)
87oamh25sc.fsf@mango.localdomain
Hi Tomás,

tomas@tuxteam.de writes:

Toggle quote (3 lines)
> When transforming SXML to XML, namespaces don't seem to be handled
> properly:
>
[...]
Toggle quote (12 lines)
>
> The problem is that SXML used the concatenated (full) namespace with the
> name as tag (and attribute) names for namespaced items. When serializing
> to XML it should try to find abbreviations for those namespaces and issue
> the corresponding namespace declarations.
>
> Instead, sxml->xml tries to split the (namespace:name) combination
> at the first colon and to check the name -- and fails miserably at
> (namespace:name) combinations à la "http://www.w3.org/1999/xlink:href"
> (procedure check-name). Since there are two colons, the name part
> has now a colon.

xml->sxml has an optional #:namespaces argument, where you can pass an
alist of keys to URLs to be used in the sxml output:

(let* ((ns '((svg . "http://www.w3.org/2000/svg")
(the-sxml (xml->sxml the-svg #:namespaces ns)))
(display the-sxml))

=> (*TOP*
(svg:svg
(svg:rect (@ (y 5)
(x 5)
(width 20)
(stroke-width 2)
(stroke purple)
(id rect1)
(height 20)
(fill yellow)))
(svg:rect (@ (xlink:href #rect1)
(y 5)
(x 30)
(width 20)
(stroke-width 2)
(stroke purple)
(ry 5)
(rx 8)
(height 20)
(fill blue)))))

Passing this to sxml->xml yields:

<svg:svg>
<svg:rect y="5" x="5"
width="20"
stroke-width="2"
stroke="purple"
id="rect1"
height="20"
fill="yellow" />
<svg:rect xlink:href="#rect1"
y="5" x="30"
width="20"
stroke-width="2"
stroke="purple"
ry="5" rx="8"
height="20"
fill="blue" />
</svg:svg>

Unfortunately, sxml->xml will not replace the namespace abbreviations,
nor will it add appropriate xmlns attributes, so "svg" and "xlink" are
devoid of any meaning.

Since xml->sxml accepts a namespace alist I suppose it would make sense
to extend sxml->xml to do the same.

~~ Ricardo
T
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20150421094438.GA22715@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Apr 21, 2015 at 11:24:03AM +0200, Ricardo Wurmus wrote:
Toggle quote (23 lines)
> Hi Tomás,
>
> tomas@tuxteam.de writes:
>
> > When transforming SXML to XML, namespaces don't seem to be handled
> > properly:
> >
> [...]
> >
> > The problem is that SXML used the concatenated (full) namespace with the
> > name as tag (and attribute) names for namespaced items. When serializing
> > to XML it should try to find abbreviations for those namespaces and issue
> > the corresponding namespace declarations.
> >
> > Instead, sxml->xml tries to split the (namespace:name) combination
> > at the first colon and to check the name -- and fails miserably at
> > (namespace:name) combinations à la "http://www.w3.org/1999/xlink:href"
> > (procedure check-name). Since there are two colons, the name part
> > has now a colon.
>
> xml->sxml has an optional #:namespaces argument, where you can pass an
> alist of keys to URLs to be used in the sxml output:

Aha. Didn't know about this one, thanks. Yes, the problem is that SXML
loses the link to the "real" namespaces: the application around it has
to keep track of that.

Toggle quote (20 lines)
> Passing this to sxml->xml yields:
>
> <svg:svg>
> <svg:rect y="5" x="5"
> width="20"
> stroke-width="2"
> stroke="purple"
> id="rect1"
> height="20"
> fill="yellow" />
> <svg:rect xlink:href="#rect1"
> y="5" x="30"
> width="20"
> stroke-width="2"
> stroke="purple"
> ry="5" rx="8"
> height="20"
> fill="blue" />
> </svg:svg>

Yes, this looks "nearly" right, except...

Toggle quote (4 lines)
> Unfortunately, sxml->xml will not replace the namespace abbreviations,
> nor will it add appropriate xmlns attributes, so "svg" and "xlink" are
> devoid of any meaning.

exactly.

Toggle quote (3 lines)
> Since xml->sxml accepts a namespace alist I suppose it would make sense
> to extend sxml->xml to do the same.

This is more or less what I do in my proposed patch (it's in the bugs
mailing list as 20339@debbugs.gnu.org). It passes around an alist of
(namespace . abbrev) associations (it's inverted wrt #:namespaces in
xml->sxml). Only that the abbreviations are "generated" as ns1, ns2
and so on (and the namespace declarations are woven into the attributes
list).

So far not reply to my bug report, but this gives me the chance to
bikeshed my patch to death :-P

Thanks for looking into that -- and for prodding me into looking at
more sources :)

Regards
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU2HAYACgkQBcgs9XrR2kYq+gCfexhJ5qFyN4QmIf4TfddPqyfT
434An3BSVKtyovRJdg8MGHzAY8I0/NTD
=O9Kj
-----END PGP SIGNATURE-----
R
R
Ricardo Wurmus wrote on 22 Apr 2015 16:29
(address . tomas@tuxteam.de)(address . 20339@debbugs.gnu.org)
87fv7s1bjn.fsf@mango.localdomain
Toggle quote (3 lines)
>> Since xml->sxml accepts a namespace alist I suppose it would make sense
>> to extend sxml->xml to do the same.

Attached is a minimal patch to extend "sxml->xml" such that it accepts an
optional keyword argument "namespaces" with an alist of prefixes to
URLs, analogous to "xml->sxml".

When the namespaces alist is provided, "xmlns:prefix=url" attributes are
prepended to the element's list of attributes.


;; Define SVG document with namespaces
(define the-svg "<svg xmlns='http://www.w3.org/2000/svg'
<rect x='5' y='5' width='20' height='20'
stroke-width='2' stroke='purple' fill='yellow'
id='rect1' />
<rect x='30' y='5' width='20' height='20'
ry='5' rx='8' stroke-width='2' stroke='purple' fill='blue'
xlink:href='#rect1' />
</svg>")

;; Define alist of namespaces
(define ns '((svg . "http://www.w3.org/2000/svg")

;; Convert to SXML, abbreviate namespaces according to ns alist
(define the-sxml (xml->sxml the-svg #:namespaces ns))

;; Convert back to XML
(sxml->xml the-sxml #:namespaces ns)

=> <svg:svg xmlns:svg="http://www.w3.org/2000/svg"
<svg:rect y="5" x="5"
width="20"
stroke-width="2"
stroke="purple"
id="rect1"
height="20"
fill="yellow" />
<svg:rect xlink:href="#rect1"
y="5" x="30"
width="20"
stroke-width="2"
stroke="purple"
ry="5" rx="8"
height="20"
fill="blue" />
</svg:svg>

Does this do what you want?

~~ Ricardo
From 81fa92ad0c5537c41419fa1e55c6130bf0558c9f Mon Sep 17 00:00:00 2001
From: rekado <rekado@elephly.net>
Date: Wed, 22 Apr 2015 13:09:27 +0200
Subject: [PATCH] Write XML namespaces when serializing.

* module/sxml/simple.scm (sxml->xml): Add optional keyword argument
"namespaces".
---
module/sxml/simple.scm | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

Toggle diff (45 lines)
diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 703ad91..8cc20dd 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -311,7 +311,8 @@ port."
(display str port)
(display "?>" port))
-(define* (sxml->xml tree #:optional (port (current-output-port)))
+(define* (sxml->xml tree #:optional (port (current-output-port)) #:key
+ (namespaces '()))
"Serialize the sxml tree @var{tree} as XML. The output will be written
to the current output port, unless the optional argument @var{port} is
present."
@@ -322,7 +323,7 @@ present."
(let ((tag (car tree)))
(case tag
((*TOP*)
- (sxml->xml (cdr tree) port))
+ (sxml->xml (cdr tree) port #:namespaces namespaces))
((*ENTITY*)
(if (and (list? (cdr tree)) (= (length (cdr tree)) 1))
(entity->xml (cadr tree) port)
@@ -335,10 +336,16 @@ present."
(let* ((elems (cdr tree))
(attrs (and (pair? elems) (pair? (car elems))
(eq? '@ (caar elems))
- (cdar elems))))
- (element->xml tag attrs (if attrs (cdr elems) elems) port)))))
+ (cdar elems)))
+ (xmlns (map (lambda (x)
+ (cons (symbol-append 'xmlns: (car x))
+ (cdr x)))
+ namespaces)))
+ (element->xml tag
+ (if attrs (append xmlns attrs) xmlns)
+ (if attrs (cdr elems) elems) port)))))
;; A nodelist.
- (for-each (lambda (x) (sxml->xml x port)) tree)))
+ (for-each (lambda (x) (sxml->xml x port #:namespaces namespaces)) tree)))
((string? tree)
(string->escaped-xml tree port))
((null? tree) *unspecified*)
--
2.1.0
T
(name . Ricardo Wurmus)(address . rekado@elephly.net)(address . 20339@debbugs.gnu.org)
20150423065714.GB19410@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Apr 22, 2015 at 04:29:32PM +0200, Ricardo Wurmus wrote:
Toggle quote (7 lines)
> >> Since xml->sxml accepts a namespace alist I suppose it would make sense
> >> to extend sxml->xml to do the same.
>
> Attached is a minimal patch to extend "sxml->xml" such that it accepts an
> optional keyword argument "namespaces" with an alist of prefixes to
> URLs, analogous to "xml->sxml".

Thanks, I'll have a look at this this afternoon.

Your code is far prettier than mine, that's for sure :-)

What's yet missing (as far as I can read off the diff) is a way to
"dream up" an abbreviation when it's not in the namespaces alist.

Thanks again and regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU4l8oACgkQBcgs9XrR2kb7SwCeNO0Z+RJZy6VUeQotm3+qX5rd
nXMAn2QeowgVnEj+9Zh3gMIBZW99Y3bx
=BrEt
-----END PGP SIGNATURE-----
R
R
Ricardo Wurmus wrote on 23 Apr 2015 09:04
(address . tomas@tuxteam.de)(address . 20339@debbugs.gnu.org)
878udj1g1d.fsf@mango.localdomain
tomas@tuxteam.de writes:

Toggle quote (3 lines)
> What's yet missing (as far as I can read off the diff) is a way to
> "dream up" an abbreviation when it's not in the namespaces alist.

True.

Ideally, this should work even without passing a namespaces alist at all
in both "xml->sxml" and "sxml->xml". The non-abbreviated namespaces
should not cause "sxml->xml" to fail.

Passing around a namespaces alist to both these procedures is the least
invasive approach I could think of, but I still think that it *should*
be made to work without explicitly declaring namespaces.
T
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20150423074034.GA20961@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Apr 23, 2015 at 09:04:46AM +0200, Ricardo Wurmus wrote:
Toggle quote (16 lines)
>
> tomas@tuxteam.de writes:
>
> > What's yet missing (as far as I can read off the diff) is a way to
> > "dream up" an abbreviation when it's not in the namespaces alist.
>
> True.
>
> Ideally, this should work even without passing a namespaces alist at all
> in both "xml->sxml" and "sxml->xml". The non-abbreviated namespaces
> should not cause "sxml->xml" to fail.
>
> Passing around a namespaces alist to both these procedures is the least
> invasive approach I could think of, but I still think that it *should*
> be made to work without explicitly declaring namespaces.

I think a combination of our approaches could work: the only difference
(apart of the code elegance) is that my patch grows this alist on its
way down the tree as it encounters new namespace. This meshes well with
the namespace declaration, which scopes recursively down the XML tree.

This afternoon, while I sit at the e-Lok waiting for the FSFE meeting
is a very good moment for me to look into it. I'll report tonight :-)

Thanks & later (dayjob calling)
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU4ofIACgkQBcgs9XrR2kaFNwCfWzPunxHiiDJIJean02rx7pMT
92IAn2IGYW01Cx7aJt32MLRDQYuY9FbP
=owfk
-----END PGP SIGNATURE-----
T
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20150425202509.GA3544@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Apr 22, 2015 at 04:29:32PM +0200, Ricardo Wurmus wrote:
Toggle quote (7 lines)
> >> Since xml->sxml accepts a namespace alist I suppose it would make sense
> >> to extend sxml->xml to do the same.
>
> Attached is a minimal patch to extend "sxml->xml" such that it accepts an
> optional keyword argument "namespaces" with an alist of prefixes to
> URLs, analogous to "xml->sxml".

Thank you again for the patch. I applied it against 2.0.11, and can confirm
that it works as advertised :-)

I didn't see that xml->sxml has an optional parameter #:namespaces --
to be honest, I didn't expect it there.

So if one knows beforehand what namespaces are used in the XML in question,
it's possible to use the pair xml->sxml and xml->sxml this way (with your
patch, of course, because otherwise sxml->xml "forgets" to output the
relevant XML namespace declarations).

Reading again Oleg Kiselyov's paper[1] I understand that SXML can, as does
XML have namespace abbreviations (called there user-ns-shortcut). It's not
exctly the same thing, but somehow isomorphic. One might use the XML's
abbreviations in the SXML representation, of course.

The problem with this approach is that you either have to carry the
namespace associations "out-of-band", and that you have to know which
namespaces to expect before parsing the XML.

A (more cosmtic) problem is that all namespace declarations are "moved"
to the top-level, because the SXML keeps no "memory" of which node the
namespace declarations were attached to in the original XML.

In [1], there is a mechanism for stashing namespace mappings in the
"attributes list" (strictly in the annotations, which are optionally
tacked to the tail of the attributes list, under the tag *NAMESPACES*.

Anyway -- what would be a good way forward here?

I could imagine taking note of the namespace abbreviations in the
*NAMESPACES* list (while xml->sxml) and issuing the corresponding
declarations in sxml->xml.

Makes sense?

Regards


- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU7+CUACgkQBcgs9XrR2kaSxACfdljxbGyVNILgombB3jYWjeOq
1zwAn2RzIEHcJbJIlIMRkaEAIjNFcH7M
=MSYu
-----END PGP SIGNATURE-----
T
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20150426102810.GB5922@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Apr 25, 2015 at 10:25:09PM +0200, tomas@tuxteam.de wrote:

[...]

Toggle quote (5 lines)
> Reading again Oleg Kiselyov's paper[1] I understand that SXML can, as does
> XML have namespace abbreviations (called there user-ns-shortcut). It's not
> exctly the same thing, but somehow isomorphic. One might use the XML's
> abbreviations in the SXML representation, of course.

I take that back: as far as I understand the paper, the (SXML-side) shortcuts
are global to the document, whereas the (XML-side) abbreviations are subtree-
scoped (i.e. for the whole subtree of the element where the declaration
is attached. I don't know ATM whether shadowing is allowed, but I'll look that
up).

So there *is* a subtle difference between "user-ns-shortcut" (the one
you were manipulating with #:namespaces) and the XML "namespace abbreviation"
(the official jargon is "namespace prefix").

Regards


- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU8vboACgkQBcgs9XrR2kadlACeI+p4W8N/dJ49cGBypYNEP/ta
l6MAn3exlNUpj6Z4cYG0Dcb1ltyuQQBB
=x74j
-----END PGP SIGNATURE-----
A
T
(address . 20339@debbugs.gnu.org)
20160713132403.GA2349@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Jun 23, 2016 at 09:32:16PM +0200, Andy Wingo wrote:
Toggle quote (6 lines)
> See thread here as well:
> http://thread.gmane.org/gmane.lisp.guile.devel/17709
>
> I like Ricardo's patch but have some comments here:
> http://article.gmane.org/gmane.lisp.guile.devel/18384

(sorry for cc'ing both of you, but I don't know whether you are
subscribed to the bug. Two copies seemed more polite than none).

Sorry folks for not coming back earlier. Real Life and things.

Since I'm going to be off the 'net for one month starting next Friday,
I thought I'll write a short note.

I'll be back the 15th of August and am really willing to do whatever
it takes to bring this forward. OTOH, if any of you decides to pick
it up, I'm sure the results will be better :-)

Referring to Oleg Kiseliov's paper [1], there are actually three
things involved:

- the namespace. This is an XML thing and will typically be
an URI (I don't quite remember whether it *must* be an
URI, but that's irrelevant. It may contain nasty characters
(to XML: it isn't an XML "Name", and potentially to Scheme:
there may be patentheses and things in there, so some
Schemes won't make a symbol of that; Guile doesn't mind)

- the namespace prefix. Again, an XML thing, basically giving
a non-nasty abbreviation for the namespace, to stick it to
the Name, making a "QName". The association prefix -> namespace
is scoped to a node and its descendants, and can be shadowed
at some node below

- the namespace-id, an SXML thing. In [1], this is typically
the namespace, but Oleg Kyselyov made provisions in [1] for a
similar "abbreviation" (the user-ns-shortcut in [1], page 3),
whose mapping can be attached to any node via the
pseudo-attribute *NAMESPACES* [2], which can also carry the
original (XML) namespace prefix.

As far as I understand the paper, most of the time this
namespace-id will be identical to the URI, but it is this
what will be prefixed to the tag name symbols in the
SXML representation.

What Ricardo's patch does is to conflate namespace prefix and
namespace-id and provide a mapping (namespace-id aka prefix) ->
namespace. This is actually quite elegant, since we don't need
the distinction between (XML) prefix and (SXML) namespace-id.

I think that we can, at least as (sxml simple) is concerned,
ignore this distinction.

What is missing? From my point of view:

- At xml->sxml time, the user doesn't know which namespaces
are in the xml. So it would be nice if the XML parser
could provide that.

- It would be super-nice if the XML parser could put that
into the same nodes it found it, as described in [1]
(i.e. in the (*NAMESPACES* ...) pseudo-attribute).
This way we wouldn't have a global mapping, but one
that resembles the original XML, even with the same
prefixes. Less surprises overall. The round trip
xml -> sxml -> xml would be (nearly) the identity.

With Ricardo's patch it would lump all the namespace
declarations up in the top node, which formally is
correct, but might scare XML people a bit :-)

- At sxml->xml time there should be a way to somehow
generate prefixex for "new" namespaces. I don't know
at the moment how this would work, that depends on
how the user is supposed to insert new nodes in the
SXML. Does she specify the namespace? Both prefix
(aka namespace-id, under my current assumption) *and*
namespace? (note that the namespace-id/prefix alone
wouldn't be sufficient).

Sorry for this wall of text. I hope it makes some sense.

Regards

[2] Actually, I'm cheating here: the thing is part of an
"annotations" part, which according to the grammar comes
*last*, after all the attributes. But it looks a bit
like an attribute, with a strange name and a more
complex value.

- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleGQPMACgkQBcgs9XrR2kaMfgCeKbA4pWFrCZoxofDF4n9utgnZ
IzYAn1gozFwBLPd/rmNkZvJYDTJ9cIvr
=etJd
-----END PGP SIGNATURE-----
T
(address . 20339@debbugs.gnu.org)
20160713180854.GA12635@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Jul 13, 2016 at 03:24:03PM +0200, tomas@tuxteam.de wrote:

[...]

Toggle quote (27 lines)
> What is missing? From my point of view:
>
> - At xml->sxml time, the user doesn't know which namespaces
> are in the xml. So it would be nice if the XML parser
> could provide that.
>
> - It would be super-nice if the XML parser could put that
> into the same nodes it found it, as described in [1]
> (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
> This way we wouldn't have a global mapping, but one
> that resembles the original XML, even with the same
> prefixes. Less surprises overall. The round trip
> xml -> sxml -> xml would be (nearly) the identity.
>
> With Ricardo's patch it would lump all the namespace
> declarations up in the top node, which formally is
> correct, but might scare XML people a bit :-)
>
> - At sxml->xml time there should be a way to somehow
> generate prefixex for "new" namespaces. I don't know
> at the moment how this would work, that depends on
> how the user is supposed to insert new nodes in the
> SXML. Does she specify the namespace? Both prefix
> (aka namespace-id, under my current assumption) *and*
> namespace? (note that the namespace-id/prefix alone
> wouldn't be sufficient).

Argh. First post, then think, sorry.

Actually ditch the last point. I think it would be OK
to make the user responsible to keep the *NAMESPACES*
pseudo-attribute up-to-date whenever she adds nodes
with new namespaces to the SXML.

regards


- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleGg7YACgkQBcgs9XrR2kY7hACdG5drjpPVlzB4wW6sXhuRKliv
h3cAnAmHC5RxiEc6RXi0tu5U3yF4YYbx
=7uGa
-----END PGP SIGNATURE-----
A
A
Andy Wingo wrote on 14 Jul 2016 12:10
(address . tomas@tuxteam.de)
87furc1qeu.fsf@pobox.com
Hi :)

On Wed 13 Jul 2016 15:24, tomas@tuxteam.de writes:

Toggle quote (3 lines)
> Referring to Oleg Kiseliov's paper [1], there are actually three
> things involved:

This summary is helpful, thanks.
Toggle quote (6 lines)
> What is missing? From my point of view:
>
> - At xml->sxml time, the user doesn't know which namespaces
> are in the xml. So it would be nice if the XML parser
> could provide that.

For some documents you do know, of course.

And for larger perspective, I think that SSAX gives you all the tools
you need to build specialist and very flexible XML parsers. So to an
extent solving the general problem isn't necessary -- we can always
point people to SSAX. But that's a bit rude ;) so if there are common
patterns we should try to capture them in xml->sxml. I see this bug as
being a search for those patterns, but without the requirement of
solving the problem in its most general form.

Toggle quote (12 lines)
> - It would be super-nice if the XML parser could put that
> into the same nodes it found it, as described in [1]
> (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
> This way we wouldn't have a global mapping, but one
> that resembles the original XML, even with the same
> prefixes. Less surprises overall. The round trip
> xml -> sxml -> xml would be (nearly) the identity.
>
> With Ricardo's patch it would lump all the namespace
> declarations up in the top node, which formally is
> correct, but might scare XML people a bit :-)

ACK.

Toggle quote (9 lines)
> - At sxml->xml time there should be a way to somehow
> generate prefixex for "new" namespaces. I don't know
> at the moment how this would work, that depends on
> how the user is supposed to insert new nodes in the
> SXML. Does she specify the namespace? Both prefix
> (aka namespace-id, under my current assumption) *and*
> namespace? (note that the namespace-id/prefix alone
> wouldn't be sufficient).

ACK.

What do you think the next step is? I am happy to wait FWIW, dunno if
Ricardo has any feelings here.

Enjoy your holiday :)

Andy
T
(name . Andy Wingo)(address . wingo@pobox.com)
20160714102631.GB5611@tuxteam.de
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Jul 14, 2016 at 12:10:17PM +0200, Andy Wingo wrote:
Toggle quote (24 lines)
> Hi :)
>
> On Wed 13 Jul 2016 15:24, tomas@tuxteam.de writes:
>
> > Referring to Oleg Kiseliov's paper [1], there are actually three
> > things involved:
>
> This summary is helpful, thanks.
> > What is missing? From my point of view:
> >
> > - At xml->sxml time, the user doesn't know which namespaces
> > are in the xml. So it would be nice if the XML parser
> > could provide that.
>
> For some documents you do know, of course.
>
> And for larger perspective, I think that SSAX gives you all the tools
> you need to build specialist and very flexible XML parsers. So to an
> extent solving the general problem isn't necessary -- we can always
> point people to SSAX. But that's a bit rude ;) so if there are common
> patterns we should try to capture them in xml->sxml. I see this bug as
> being a search for those patterns, but without the requirement of
> solving the problem in its most general form.

It's (sxml simple), after all. I too hesitate to stuff too much into
it. For me, a documented "no, we don't do namespaces" would be one
valid pattern.

Toggle quote (13 lines)
> > - It would be super-nice if the XML parser could put that
> > into the same nodes it found it [...]

> ACK.
>
> > - At sxml->xml time there should be a way to somehow
> > generate prefixex [...]

> ACK.
>
> What do you think the next step is? I am happy to wait FWIW, dunno if
> Ricardo has any feelings here.

We meet this afternoon anyway. On my side, I'd be happy to try
something along the sketched lines when I'm back. If someone
who cares beats me at it, I'd be as happy.

Toggle quote (2 lines)
> Enjoy your holiday :)

Looking forward to. BTW: if I understood properly the area you're
living in, we'll cycle past you (somewhat to the West) on our
way to the north.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleHaNcACgkQBcgs9XrR2kaQgQCaAzyyBkI3w0XGJ0HUI9Dz/YXa
7yQAni4CWIDE5ezu+x0DwanoAjfH4Wr2
=DEuD
-----END PGP SIGNATURE-----
R
R
Ricardo Wurmus wrote on 4 Feb 2019 21:44
(name . Andy Wingo)(address . wingo@pobox.com)
87a7jbi8rx.fsf@elephly.net
Hello!

I just looked at this again and I think I came with something useful.
Here’s some context:

Andy Wingo <wingo@pobox.com> writes:

Toggle quote (52 lines)
> Hi :)
>
> On Wed 13 Jul 2016 15:24, tomas@tuxteam.de writes:
>
>> Referring to Oleg Kiseliov's paper [1], there are actually three
>> things involved:
>
> This summary is helpful, thanks.
>> What is missing? From my point of view:
>>
>> - At xml->sxml time, the user doesn't know which namespaces
>> are in the xml. So it would be nice if the XML parser
>> could provide that.
>
> For some documents you do know, of course.
>
> And for larger perspective, I think that SSAX gives you all the tools
> you need to build specialist and very flexible XML parsers. So to an
> extent solving the general problem isn't necessary -- we can always
> point people to SSAX. But that's a bit rude ;) so if there are common
> patterns we should try to capture them in xml->sxml. I see this bug as
> being a search for those patterns, but without the requirement of
> solving the problem in its most general form.
>
>> - It would be super-nice if the XML parser could put that
>> into the same nodes it found it, as described in [1]
>> (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
>> This way we wouldn't have a global mapping, but one
>> that resembles the original XML, even with the same
>> prefixes. Less surprises overall. The round trip
>> xml -> sxml -> xml would be (nearly) the identity.
>>
>> With Ricardo's patch it would lump all the namespace
>> declarations up in the top node, which formally is
>> correct, but might scare XML people a bit :-)
>
> ACK.
>
>> - At sxml->xml time there should be a way to somehow
>> generate prefixex for "new" namespaces. I don't know
>> at the moment how this would work, that depends on
>> how the user is supposed to insert new nodes in the
>> SXML. Does she specify the namespace? Both prefix
>> (aka namespace-id, under my current assumption) *and*
>> namespace? (note that the namespace-id/prefix alone
>> wouldn't be sufficient).
>
> ACK.
>
> What do you think the next step is? I am happy to wait FWIW, dunno if
> Ricardo has any feelings here.

Attached is a patch that does the requested things. The parser
procedures like FINISH-ELEMENT have access to all the namespaces, so we
I changed the FINISH-ELEMENT procedure to return the list of namespaces
in addition to its SXML tree return value.

I changed name->sxml to use only the namespace aliases / abbreviations
instead of the namespace URIs. (This is not very efficient because we
need to traverse the list of namespaces every time. Maybe we could
memoize this. On the other hand, the length of the namespaces list may
not be large enough to affect performance too much.)

In the end we get both namespace list and SXML tree from running the
parser. Before wrapping this up in *TOP* we generate xmlns attributes
for all abbreviations and “patch” the first proper element’s attribute
list (i.e. we skip over a *PI* element if it exists).

The result is an SXML tree that begins with namespace declarations,
mapping abbreviations to URIs. Within the SXML tree we’re only using
abbreviations, so there are no more invalid characters when converting
SXML to a string.

I would be happy if you could test this as I’m not 100% confident that
this is correct. Here are questions I wasn’t able to answer
conclusively:

* Is the value for “namespaces” that’s passed in to the
FINISH-ELEMENT procedure always the same?

* Will the second return value of the final call to FINISH-ELEMENT
really always be the complete list of *all* namespaces that have been
encountered?

* Are there valid XML documents for which the match patterns to inject
namespace declarations would not apply? (e.g. documents with a PI
element and two separate XML trees)

--
Ricardo
From 83ee9de18a0ecaa237eb73e1b75d0b21e3e8d321 Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <rekado@elephly.net>
Date: Mon, 4 Feb 2019 21:39:06 +0100
Subject: [PATCH] sxml: xml->sxml: Record and use namespace abbreviations.

* module/sxml/simple.scm (xml->sxml): Add namespace declarations to the
attribute list of the first XML element.
[name->sxml]: Accept namespaces argument to look up abbreviation.
Return name with abbreviation prefix.
[parser]: Let FINISH-ELEMENT procedure return namespaces in addition to
SXML tree.
---
module/sxml/simple.scm | 50 +++++++++++++++++++++++++++++++++---------
1 file changed, 40 insertions(+), 10 deletions(-)

Toggle diff (95 lines)
diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 703ad9137..52dd9af12 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -1,7 +1,8 @@
;;;; (sxml simple) -- a simple interface to the SSAX parser
;;;;
-;;;; Copyright (C) 2009, 2010, 2013 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009, 2010, 2013, 2019 Free Software Foundation, Inc.
;;;; Modified 2004 by Andy Wingo <wingo at pobox dot com>.
+;;;; Modified 2019 by Ricardo Wurmus <rekado@elephly.net>.
;;;; Originally written by Oleg Kiselyov <oleg at pobox dot com> as SXML-to-HTML.scm.
;;;;
;;;; This library is free software; you can redistribute it and/or
@@ -30,6 +31,7 @@
#:use-module (sxml ssax)
#:use-module (sxml transform)
#:use-module (ice-9 match)
+ #:use-module (srfi srfi-1)
#:use-module (srfi srfi-13)
#:export (xml->sxml sxml->xml sxml->string))
@@ -123,10 +125,15 @@ port."
(acons '*DEFAULT* default-entity-handler entities)
entities))
- (define (name->sxml name)
+ (define (name->sxml name namespaces)
(match name
((prefix . local-part)
- (symbol-append prefix (string->symbol ":") local-part))
+ (let ((abbrev (and=> (find (match-lambda
+ ((abbrev uri . rest)
+ (and (eq? uri prefix) abbrev)))
+ namespaces)
+ first)))
+ (symbol-append abbrev (string->symbol ":") local-part)))
(_ name)))
(define (doctype-continuation seed)
@@ -152,14 +159,16 @@ port."
(ssax:reverse-collect-str seed)))
(attrs (attlist-fold
(lambda (attr accum)
- (cons (list (name->sxml (car attr)) (cdr attr))
+ (cons (list (name->sxml (car attr) namespaces)
+ (cdr attr))
accum))
'() attributes)))
- (acons (name->sxml elem-gi)
- (if (null? attrs)
- seed
- (cons (cons '@ attrs) seed))
- parent-seed)))
+ (values (acons (name->sxml elem-gi namespaces)
+ (if (null? attrs)
+ seed
+ (cons (cons '@ attrs) seed))
+ parent-seed)
+ namespaces)))
CHAR-DATA-HANDLER ; fhere
(lambda (string1 string2 seed)
@@ -212,7 +221,28 @@ port."
(let* ((port (if (string? string-or-port)
(open-input-string string-or-port)
string-or-port))
- (elements (reverse (parser port '()))))
+ (elements (call-with-values
+ (lambda () (parser port '()))
+ (lambda (elements namespaces)
+ ;; Generate namespace declarations mapping
+ ;; abbreviations to URLs.
+ (let ((ns-declarations
+ (filter-map (match-lambda
+ (('*DEFAULT* . _) #f)
+ ((abbrev uri . _)
+ (list (symbol-append 'xmlns: abbrev)
+ (symbol->string uri))))
+ namespaces)))
+ ;; Inject namespace declarations into the first
+ ;; proper element.
+ (match (reverse elements)
+ (((and pi-elem ('*PI* . _))
+ (tag ('@ . attrs) . children))
+ `(,pi-elem (,tag (@ ,@ns-declarations ,attrs)
+ ,@children)))
+ (((tag ('@ . attrs) . children))
+ `(,tag (@ ,@ns-declarations ,attrs)
+ ,@children))))))))
`(*TOP* ,@elements)))
(define check-name
--
2.20.1
J
J
John Cowan wrote on 4 Feb 2019 23:55
(name . Ricardo Wurmus)(address . rekado@elephly.net)
CAD2gp_ScjmURZ7yTFronxyR9r4P4P2L91mXNHguXpZG86chdVA@mail.gmail.com
On Mon, Feb 4, 2019 at 3:45 PM Ricardo Wurmus <rekado@elephly.net> wrote:

\I changed name->sxml to use only the namespace aliases / abbreviations
Toggle quote (3 lines)
> instead of the namespace URIs.


The trouble with that is that XML rnamespaces are lexically scoped, like
Scheme
local variables. It is perfectly valid to map a prefix to more than one
URL,
as long as the namespace declarations are in either disjoint or nested
elements. So you don't know what the absolute name of the element
or attribute is from just the prefix and the local part.

Furthermore, it is also legal to define more than one prefix for
the same URL, in which case names using either prefix are normally
treated as equivalent (however, you can't have elements like
<a:foo>...</b:foo>
even if a and b map to the same namespace).

* Is the value for “namespaces” that’s passed in to the
Toggle quote (7 lines)
> FINISH-ELEMENT procedure always the same?
>
> * Will the second return value of the final call to FINISH-ELEMENT
> really always be the complete list of *all* namespaces that have been
> encountered?
>

Definitely not, only the namespaces that are currently in scope.

* Are there valid XML documents for which the match patterns to inject
Toggle quote (4 lines)
> namespace declarations would not apply? (e.g. documents with a PI
> element and two separate XML trees)
>

That's not well-formed: you can only have a single element tree per XML
document, although you can have any number of PIs, comments, and
whitespace (which is normally ignored) before and after.

--
John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org
If I have seen farther than others, it is because I was looking through a
spyglass with my one good eye, with a parrot standing on my shoulder. --"Y"
Attachment: file
R
R
Ricardo Wurmus wrote on 5 Feb 2019 10:12
(name . John Cowan)(address . cowan@ccil.org)
874l9iiopl.fsf@elephly.net
Hi John,

Toggle quote (24 lines)
> The trouble with that is that XML rnamespaces are lexically scoped, like
> Scheme
> local variables. It is perfectly valid to map a prefix to more than one
> URL,
> as long as the namespace declarations are in either disjoint or nested
> elements. So you don't know what the absolute name of the element
> or attribute is from just the prefix and the local part.
>
> Furthermore, it is also legal to define more than one prefix for
> the same URL, in which case names using either prefix are normally
> treated as equivalent (however, you can't have elements like
> <a:foo>...</b:foo>
> even if a and b map to the same namespace).
>
> * Is the value for “namespaces” that’s passed in to the
>> FINISH-ELEMENT procedure always the same?
>>
>> * Will the second return value of the final call to FINISH-ELEMENT
>> really always be the complete list of *all* namespaces that have been
>> encountered?
>>
>
> Definitely not, only the namespaces that are currently in scope.

Thanks for the clarifications!

In that case we coud have FINISH-ELEMENT add all namespace declarations
that are in scope to the current node that is about to be returned. It
would be a little verbose, but more correct.

What do you think?

--
Ricardo
R
R
Ricardo Wurmus wrote on 5 Feb 2019 13:57
(name . John Cowan)(address . cowan@ccil.org)(address . 20339@debbugs.gnu.org)
87r2cmgzq0.fsf@elephly.net
Ricardo Wurmus <rekado@elephly.net> writes:

Toggle quote (4 lines)
> In that case we coud have FINISH-ELEMENT add all namespace declarations
> that are in scope to the current node that is about to be returned. It
> would be a little verbose, but more correct.

Like this:
From d44c702718baea4c4557d12ca8dd7dab724c7fb6 Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <rekado@elephly.net>
Date: Mon, 4 Feb 2019 21:39:06 +0100
Subject: [PATCH] sxml: xml->sxml: Record and use namespace abbreviations.

* module/sxml/simple.scm (xml->sxml)
[name->sxml]: Accept namespaces argument to look up abbreviation.
Return name with abbreviation prefix.
[parser]: Let FINISH-ELEMENT procedure return namespaces in addition to
the SXML tree's attributes.
---
module/sxml/simple.scm | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)

Toggle diff (70 lines)
diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 703ad9137..2bb332c83 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -1,7 +1,8 @@
;;;; (sxml simple) -- a simple interface to the SSAX parser
;;;;
-;;;; Copyright (C) 2009, 2010, 2013 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009, 2010, 2013, 2019 Free Software Foundation, Inc.
;;;; Modified 2004 by Andy Wingo <wingo at pobox dot com>.
+;;;; Modified 2019 by Ricardo Wurmus <rekado@elephly.net>.
;;;; Originally written by Oleg Kiselyov <oleg at pobox dot com> as SXML-to-HTML.scm.
;;;;
;;;; This library is free software; you can redistribute it and/or
@@ -30,6 +31,7 @@
#:use-module (sxml ssax)
#:use-module (sxml transform)
#:use-module (ice-9 match)
+ #:use-module (srfi srfi-1)
#:use-module (srfi srfi-13)
#:export (xml->sxml sxml->xml sxml->string))
@@ -123,10 +125,15 @@ port."
(acons '*DEFAULT* default-entity-handler entities)
entities))
- (define (name->sxml name)
+ (define (name->sxml name namespaces)
(match name
((prefix . local-part)
- (symbol-append prefix (string->symbol ":") local-part))
+ (let ((abbrev (and=> (find (match-lambda
+ ((abbrev uri . rest)
+ (and (eq? uri prefix) abbrev)))
+ namespaces)
+ first)))
+ (symbol-append abbrev (string->symbol ":") local-part)))
(_ name)))
(define (doctype-continuation seed)
@@ -150,12 +157,21 @@ port."
(let ((seed (if trim-whitespace?
(ssax:reverse-collect-str-drop-ws seed)
(ssax:reverse-collect-str seed)))
- (attrs (attlist-fold
- (lambda (attr accum)
- (cons (list (name->sxml (car attr)) (cdr attr))
- accum))
- '() attributes)))
- (acons (name->sxml elem-gi)
+ (attrs (append
+ ;; Namespace declarations
+ (filter-map (match-lambda
+ (('*DEFAULT* . _) #f)
+ ((abbrev uri . _)
+ (list (symbol-append 'xmlns: abbrev)
+ (symbol->string uri))))
+ namespaces)
+ (attlist-fold
+ (lambda (attr accum)
+ (cons (list (name->sxml (car attr) namespaces)
+ (cdr attr))
+ accum))
+ '() attributes))))
+ (acons (name->sxml elem-gi namespaces)
(if (null? attrs)
seed
(cons (cons '@ attrs) seed))
--
2.20.1
It’s quite verbose because it doesn’t check if a namespace declaration
is the same in a parent.

--
Ricardo
T
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20190212095602.GD13448@tuxteam.de
On Mon, Feb 04, 2019 at 09:44:02PM +0100, Ricardo Wurmus wrote:
Toggle quote (5 lines)
> Hello!
>
> I just looked at this again and I think I came with something useful.
> Here’s some context:

[...]

Toggle quote (5 lines)
> Attached is a patch that does the requested things. The parser
> procedures like FINISH-ELEMENT have access to all the namespaces, so we
> I changed the FINISH-ELEMENT procedure to return the list of namespaces
> in addition to its SXML tree return value.

It's great that you pick that up, I'm excited :-)

I have lost a bit of contact to Guile as of late. But I'm preparing
some tooling to give your patches a whirl; in the meantime a couple
of comments from the peanut gallery:

As John has noted, the namespace mappings (i.e. the prefix -> namespace
URI binding) are kind of lexically scoped (I'd call it subtree scoped,
but structurally it is the same). While parsing is "easy" (assuming
well-formed XML), serializing is not unambiguous. In a way, the library
might want to be prepared to take hints from the application (as far
as the XML is to be read by humans, there might be "better" and "worse"
serializations).

It may take me a couple of days to come up to speed.

Thanks a lot & cheers
-- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEUEARECAAYFAlximDIACgkQBcgs9XrR2kbicwCWNOloNf1OUTw7vsDBAlmuxDLi
egCffA4PYlxxVDtlzgdSZ4HqlUTN1o4=
=DZql
-----END PGP SIGNATURE-----


R
R
Ricardo Wurmus wrote on 12 Feb 2019 21:30
(address . tomas@tuxteam.de)
87wom4iwc3.fsf@elephly.net
tomas@tuxteam.de writes:

Toggle quote (5 lines)
> As John has noted, the namespace mappings (i.e. the prefix -> namespace
> URI binding) are kind of lexically scoped (I'd call it subtree scoped,
> but structurally it is the same). While parsing is "easy" (assuming
> well-formed XML), serializing is not unambiguous.

The “fup” handler of the parser visits every element and has a list of
namespaces that are in scope at this point. Its purpose is to return
the SXML representation of that element. At this point we can record
the namespaces as attributes. (That’s what the patch does.)

When baking XML from SXML we don’t need to do anything special — we only
need to convert everything to text, including the recorded namespace
attributes. This isn’t pretty SXML (nor is it pretty XML), but it
appears to be correct as none of the namespace information is lost.

To get a better serialized representation the parser needs to do a
better job of identifying “new” namespaces.

Toggle quote (4 lines)
> In a way, the library might want to be prepared to take hints from the
> application (as far as the XML is to be read by humans, there might be
> "better" and "worse" serializations).

The XML produced when this patch is applied will not be pretty. To
generate minimal/pretty XML knowledge of the parent elements’ namespaces
is required — knowledge that the parser’s “fup” handler does not have.

We could try to alter the parser so that it not only passes the list of
namespaces that are currently in scope, but also a list of namespaces
that are in scope for the parent node. This would allow us to determine
the list of *new* namespaces that absolutely must be declared for the
current node. If there are no new namespaces we can simply ignore them
and produce minimal SXML (and thus minimal XML later when the SXML is
serialized).

--
Ricardo
T
T
tomas wrote on 8 Apr 2019 14:14
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20190408121403.GA781@tuxteam.de
Attachment: file
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlyrOwsACgkQBcgs9XrR2kabQgCeIvJGAfCZb5KnVNe7M7VFapAY
l9kAn110JNoUb3XRLxV8nCAk4ihppgsF
=bnBc
-----END PGP SIGNATURE-----


T
T
tomas wrote on 3 May 2019 12:46
Re: bug#20339: Taking a step back (was: sxml simple: sxml->xml mishandles namespaces?)
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20190503104627.GE31083@tuxteam.de
Hi,

after mulling over it for a while, I think it's time to take
a step back and think a bit about where we'd like to go with
this.

Note that I'm ignoring technical details (the fact that the
SXML, and thus the XML serialization now has namespace declarations
everywhere down the path instead of just at the corresponding root
node, and the thing with the default namespaces, as noted in [1],
seem to me "fixable" technical details).

Your patch, Ricardo, takes a new approach wrt. the SXML resulting
from an XML parse: the full tag names (the QNAMEs, in XML parlance)
are now composed of <prefix>:<name> (mimicking the XML) instead
of <namespace uri>:<name>, as the former (sxml simple) used to
do. This has upsides and downsides.

I'll call your approach the "prefix" approach (as having the
prefixes to qualify the tag names) and the approach followed
by (sxml simple) up to now the "URI" approach, which haves
the full namespace URI qualifying the name.

In the URI approach, a qualified tag name would look like


whereas in the prefix approach, it'd look like

"myns:root"

plus the knowledge somewhere that the prefix "myns" stands for


Upsides of the prefix approach:

+ it mimics more closely the XML syntax. Since that
is what the XML folks see, that follows the "principle
of least astonishment" (aka POLA)
+ it is forced to keep the prefix -> namespace associations
(it would be semantically incomplete if not, since what
counts semantically is the namespace URI)

Downsides

- it contradicts current documentation
"All namespaces in the XML document must be declared, via
xmlns attributes. SXML elements built from non-default
namespaces will have their tags prefixed with their
URI. Users can specify custom prefixes for certain
namespaces with the #:namespaces keyword argument to
xml->sxml." [2]

This can be changed, of course :-)
But perhaps someone is already relying on it?

- working on the resulting SXML becomes harder, because
to compare two qualified names, we'd have to resolve
the namespace associations.

Upsides of the URI approach

+ it is what the documentation says
+ it follows more closely the XML semantics (the namespace
prefix in itself is irrelevant after all). As a corollary,
working on the SXML becomes easier: a comparison of two
qualified names becomes a simple string comparison, etc.

I think that is why (sxml simple)'s original design followed
this path.

Downsides

Well, negate the "prefix approach" upsides :-)

Let me just say that there seem to be precedents for the
prefix approach out there in the 'net: the Wikipedia
article [3] (yes, there's a wikipedia on that!) follows
the prefix approach. This nice blog post [4] too.

I think I'll stop here. Mi fingers itch with some hacking,
but I think we should pause and ponder before hacking.

Perhaps we should take this to guile-devel? OTOH, if someone
knows The Way Forward (TM), I'm willing to hack in this
direction.

Cheers & thanks

[1] Message ID <20190408121403.GA781@tuxteam.de>

-- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlzMHAMACgkQBcgs9XrR2karLACdFBBbZnzvLF3kxFuyGiO1LdFl
7a8An3REZ122yhfCev5iLBMuQTKWSwMH
=m2+q
-----END PGP SIGNATURE-----


?