sxml simple: sxml->xml mishandles namespaces?

Open

Details

4 participants

John Cowan
Ricardo Wurmus
tomas
Andy Wingo

Owner: unassigned

Submitted by: tomas

Severity: normal

tomas wrote on 15 Apr 2015 21:47

Recipients:(address . bug-guile@gnu.org)

Message-ID:20150415194714.GA30295@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

Hi,

I posted more details on guile-devel. Perhaps this was the wrong list?

When transforming SXML to XML, namespaces don't seem to be handled

properly:

#!/usr/bin/guile -s

(use-modules (sxml simple))

;; An XML with two namespaces (one default)

(define the-svg "<svg xmlns='http://www.w3.org/2000/svg'

xmlns:xlink='http://www.w3.org/1999/xlink'

<rect x='5' y='5' width='20' height='20'

stroke-width='2' stroke='purple' fill='yellow'

id='rect1' />

<rect x='30' y='5' width='20' height='20'

ry='5' rx='8' stroke-width='2' stroke='purple' fill='blue'

xlink:href='#rect1' />

</svg>")

;; Note how SXML handles QNames (just concatenating NS and

;; local-name with a colon):

(define the-sxml

(with-input-from-string the-svg xml->sxml))

(format #t "~A\n" the-sxml)

;; If we try to serialize this: kaboom!

(sxml->xml the-sxml)

The parsing into SXML goes well, the (format ...) outputs what

I'd expect. But the (sxml->xml ...) dies with:

ERROR: In procedure scm-error:

ERROR: Invalid QName: more than one colon http://www.w3.org/2000/svg:svg

The problem is that SXML used the concatenated (full) namespace with the

name as tag (and attribute) names for namespaced items. When serializing

to XML it should try to find abbreviations for those namespaces and issue

the corresponding namespace declarations.

Instead, sxml->xml tries to split the (namespace:name) combination

at the first colon and to check the name -- and fails miserably at

(namespace:name) combinations à la "http://www.w3.org/1999/xlink:href"

(procedure check-name). Since there are two colons, the name part

has now a colon.

There are more details at:

http://lists.gnu.org/archive/html/guile-devel/2015-04/msg00000.html

with a first attempt at a patch against guile (GNU Guile) 2.0.5-deb+1-3.

I'm more than willing to beat the patch into shape, but will possibly

need some guidance. Perhaps I'd need to sign papers with the FSF, which

I'd gladly do.

Regards

- -- tomás

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlUuwEIACgkQBcgs9XrR2kbJWQCfQ/ALFQrf0crOK47SbaOlJlMv

MwAAn3fxDBWOhgNF0L7E35k0skol2T0V

=FIId

-----END PGP SIGNATURE-----

tomas wrote on 20 Apr 2015 09:45

[PATCH] sxml->xml and namespaces: updated patch

Recipients:(address . 20339@debbugs.gnu.org)

Message-ID:20150420074517.GA31087@tuxteam.de

Hi,

I've embellished my proposed patch a bit:

- use values resp. call-with-values instead of passing around

lists.

This was one thing I didn't like about my first patch candidate:

the namespace --> ns abbreviation lookup had two things to return,

for noe the abbreviation, and whether this abbreviation was "new"

(for convenience in the form of a (namespace . abbreviation) pair).

Instead of returning a list, now it returns multiple values.

- patch is now against current stable instead of against "whatever

Debian stable packages", i.e. against

d680713 2015-04-03 16:35:54 +0200 Ludovic Courtès (stable-2.0) doc: Update libgc URL.

I'm still not sure whether this is the way to go (i.e. mixing the

abbreviation stuff into the serialization), or whether a pre-pass

(replacing namespaces by abbreviations and generating the namespace

declaration "attributes") would be the way to go.

Besides, I'd like to have some input on whether it'd be worth to

follow the usual convention and to put the namespace declarations

before regular attributes (forcing us to do two passes on a tag

node's attribute list). The generated XML looks pretty weird as

is now.

What I'd still like to introduce is a "mapping preference" as an

optional argument by the user, possibly per-node (like "I'd like

'http://www.w3.org/1999/xlink'to be abbreviated as 'xlink' or

something like that). Other XML serializers offer that. I envision

this as a function, the library would fall back to generate the

abbreviation whenever the function returns #f.

The question on whether this patch (or whatever it evolves into)

has a chance of getting into Guile is still open: I'd have to

get my papers from the FSF in this case.

Inputs?

Attachment: abbreviate-and-declare-namespaces.patch

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU0rowACgkQBcgs9XrR2kZRwACffTrZx5cCTIr7pMETu2kLbqvZ

H8kAnAq9DYpMgKjL7sRpox496i/QN7Dl

=Yxx8

-----END PGP SIGNATURE-----

Ricardo Wurmus wrote on 21 Apr 2015 11:24

Re: bug#20339: sxml simple: sxml->xml mishandles namespaces?

Recipients:(address . tomas@tuxteam.de)(address . 20339@debbugs.gnu.org)

Message-ID:87oamh25sc.fsf@mango.localdomain

Hi Tomás,

tomas@tuxteam.de writes:

Toggle quote (3 lines)

> When transforming SXML to XML, namespaces don't seem to be handled

> properly:

[...]

Toggle quote (12 lines)>
> The problem is that SXML used the concatenated (full) namespace with the
> name as tag (and attribute) names for namespaced items. When serializing
> to XML it should try to find abbreviations for those namespaces and issue
> the corresponding namespace declarations.
>
> Instead, sxml->xml tries to split the (namespace:name) combination
> at the first colon and to check the name -- and fails miserably at
> (namespace:name) combinations à la "http://www.w3.org/1999/xlink:href"
> (procedure check-name). Since there are two colons, the name part
> has now a colon.

xml->sxml has an optional #:namespaces argument, where you can pass an
alist of keys to URLs to be used in the sxml output:

   (let* ((ns '((svg . "http://www.w3.org/2000/svg")
                (xlink . "http://www.w3.org/1999/xlink")))
          (the-sxml (xml->sxml the-svg #:namespaces ns)))
     (display the-sxml))

=> (*TOP*
     (svg:svg
       (svg:rect (@ (y 5)
                    (x 5)
                    (width 20)
                    (stroke-width 2)
                    (stroke purple)
                    (id rect1)
                    (height 20)
                    (fill yellow)))
       (svg:rect (@ (xlink:href #rect1)
                    (y 5)
                    (x 30)
                    (width 20)
                    (stroke-width 2)
                    (stroke purple)
                    (ry 5)
                    (rx 8)
                    (height 20)
                    (fill blue)))))

Passing this to sxml->xml yields:

  <svg:svg>
    <svg:rect y="5" x="5"
              width="20"
              stroke-width="2"
              stroke="purple"
              id="rect1"
              height="20"
              fill="yellow" />
    <svg:rect xlink:href="#rect1"
              y="5" x="30"
              width="20"
              stroke-width="2"
              stroke="purple"
              ry="5" rx="8"
              height="20"
              fill="blue" />
  </svg:svg>

Unfortunately, sxml->xml will not replace the namespace abbreviations,
nor will it add appropriate xmlns attributes, so "svg" and "xlink" are
devoid of any meaning.

Since xml->sxml accepts a namespace alist I suppose it would make sense
to extend sxml->xml to do the same.

~~ Ricardo

tomas wrote on 21 Apr 2015 11:44

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:20150421094438.GA22715@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Tue, Apr 21, 2015 at 11:24:03AM +0200, Ricardo Wurmus wrote:

Toggle quote (23 lines)> Hi Tomás,
> 
> tomas@tuxteam.de writes:
> 
> > When transforming SXML to XML, namespaces don't seem to be handled
> > properly:
> >
> [...]
> >
> > The problem is that SXML used the concatenated (full) namespace with the
> > name as tag (and attribute) names for namespaced items. When serializing
> > to XML it should try to find abbreviations for those namespaces and issue
> > the corresponding namespace declarations.
> >
> > Instead, sxml->xml tries to split the (namespace:name) combination
> > at the first colon and to check the name -- and fails miserably at
> > (namespace:name) combinations à la "http://www.w3.org/1999/xlink:href"
> > (procedure check-name). Since there are two colons, the name part
> > has now a colon.
> 
> xml->sxml has an optional #:namespaces argument, where you can pass an
> alist of keys to URLs to be used in the sxml output:

Aha. Didn't know about this one, thanks. Yes, the problem is that SXML

loses the link to the "real" namespaces: the application around it has

to keep track of that.

Toggle quote (20 lines)> Passing this to sxml->xml yields:
> 
>   <svg:svg>
>     <svg:rect y="5" x="5"
>               width="20"
>               stroke-width="2"
>               stroke="purple"
>               id="rect1"
>               height="20"
>               fill="yellow" />
>     <svg:rect xlink:href="#rect1"
>               y="5" x="30"
>               width="20"
>               stroke-width="2"
>               stroke="purple"
>               ry="5" rx="8"
>               height="20"
>               fill="blue" />
>   </svg:svg>

Yes, this looks "nearly" right, except...

Toggle quote (4 lines)

> Unfortunately, sxml->xml will not replace the namespace abbreviations,

> nor will it add appropriate xmlns attributes, so "svg" and "xlink" are

> devoid of any meaning.

exactly.

Toggle quote (3 lines)

> Since xml->sxml accepts a namespace alist I suppose it would make sense

> to extend sxml->xml to do the same.

This is more or less what I do in my proposed patch (it's in the bugs
mailing list as 20339@debbugs.gnu.org). It passes around an alist of
(namespace . abbrev) associations (it's inverted wrt #:namespaces in
xml->sxml). Only that the abbreviations are "generated" as ns1, ns2
and so on (and the namespace declarations are woven into the attributes
list).

So far not reply to my bug report, but this gives me the chance to
bikeshed my patch to death :-P

Thanks for looking into that -- and for prodding me into looking at
more sources :)

Regards
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU2HAYACgkQBcgs9XrR2kYq+gCfexhJ5qFyN4QmIf4TfddPqyfT
434An3BSVKtyovRJdg8MGHzAY8I0/NTD
=O9Kj
-----END PGP SIGNATURE-----

Ricardo Wurmus wrote on 22 Apr 2015 16:29

Recipients:(address . tomas@tuxteam.de)(address . 20339@debbugs.gnu.org)

Message-ID:87fv7s1bjn.fsf@mango.localdomain

Toggle quote (3 lines)

>> Since xml->sxml accepts a namespace alist I suppose it would make sense

>> to extend sxml->xml to do the same.

Attached is a minimal patch to extend "sxml->xml" such that it accepts an
optional keyword argument "namespaces" with an alist of prefixes to
URLs, analogous to "xml->sxml".

When the namespaces alist is provided, "xmlns:prefix=url" attributes are
prepended to the element's list of attributes.


    ;; Define SVG document with namespaces
    (define the-svg "<svg xmlns='http://www.w3.org/2000/svg'
       xmlns:xlink='http://www.w3.org/1999/xlink'
    <rect x='5' y='5' width='20' height='20'
          stroke-width='2' stroke='purple' fill='yellow'
          id='rect1' />
    <rect x='30' y='5' width='20' height='20'
          ry='5' rx='8' stroke-width='2' stroke='purple' fill='blue'
          xlink:href='#rect1' />
    </svg>")

    ;; Define alist of namespaces
    (define ns '((svg . "http://www.w3.org/2000/svg")
                 (xlink . "http://www.w3.org/1999/xlink")))

    ;; Convert to SXML, abbreviate namespaces according to ns alist
    (define the-sxml (xml->sxml the-svg #:namespaces ns))

    ;; Convert back to XML
    (sxml->xml the-sxml #:namespaces ns)

    => <svg:svg xmlns:svg="http://www.w3.org/2000/svg"
                xmlns:xlink="http://www.w3.org/1999/xlink"
         <svg:rect y="5" x="5"
                   width="20"
                   stroke-width="2"
                   stroke="purple"
                   id="rect1"
                   height="20"
                   fill="yellow" />
         <svg:rect xlink:href="#rect1"
                   y="5" x="30"
                   width="20"
                   stroke-width="2"
                   stroke="purple"
                   ry="5" rx="8"
                   height="20"
                   fill="blue" />
       </svg:svg>

Does this do what you want?

~~ Ricardo

From 81fa92ad0c5537c41419fa1e55c6130bf0558c9f Mon Sep 17 00:00:00 2001
From: rekado <rekado@elephly.net>
Date: Wed, 22 Apr 2015 13:09:27 +0200
Subject: [PATCH] Write XML namespaces when serializing.

* module/sxml/simple.scm (sxml->xml): Add optional keyword argument
  "namespaces".
---
 module/sxml/simple.scm | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Toggle diff (45 lines)diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 703ad91..8cc20dd 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -311,7 +311,8 @@ port."
   (display str port)
   (display "?>" port))
 
-(define* (sxml->xml tree #:optional (port (current-output-port)))
+(define* (sxml->xml tree #:optional (port (current-output-port)) #:key
+                    (namespaces '()))
   "Serialize the sxml tree @var{tree} as XML. The output will be written
 to the current output port, unless the optional argument @var{port} is
 present."
@@ -322,7 +323,7 @@ present."
         (let ((tag (car tree)))
           (case tag
             ((*TOP*)
-             (sxml->xml (cdr tree) port))
+             (sxml->xml (cdr tree) port #:namespaces namespaces))
             ((*ENTITY*)
              (if (and (list? (cdr tree)) (= (length (cdr tree)) 1))
                  (entity->xml (cadr tree) port)
@@ -335,10 +336,16 @@ present."
              (let* ((elems (cdr tree))
                     (attrs (and (pair? elems) (pair? (car elems))
                                 (eq? '@ (caar elems))
-                                (cdar elems))))
-               (element->xml tag attrs (if attrs (cdr elems) elems) port)))))
+                                (cdar elems)))
+                    (xmlns (map (lambda (x)
+                                  (cons (symbol-append 'xmlns: (car x))
+                                        (cdr x)))
+                                namespaces)))
+               (element->xml tag
+                             (if attrs (append xmlns attrs) xmlns)
+                             (if attrs (cdr elems) elems) port)))))
         ;; A nodelist.
-        (for-each (lambda (x) (sxml->xml x port)) tree)))
+        (for-each (lambda (x) (sxml->xml x port #:namespaces namespaces)) tree)))
    ((string? tree)
     (string->escaped-xml tree port))
    ((null? tree) *unspecified*)
-- 
2.1.0

tomas wrote on 23 Apr 2015 08:57

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)(address . 20339@debbugs.gnu.org)

Message-ID:20150423065714.GB19410@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Wed, Apr 22, 2015 at 04:29:32PM +0200, Ricardo Wurmus wrote:

Toggle quote (7 lines)

> >> Since xml->sxml accepts a namespace alist I suppose it would make sense

> >> to extend sxml->xml to do the same.

> Attached is a minimal patch to extend "sxml->xml" such that it accepts an

> optional keyword argument "namespaces" with an alist of prefixes to

> URLs, analogous to "xml->sxml".

Thanks, I'll have a look at this this afternoon.

Your code is far prettier than mine, that's for sure :-)

What's yet missing (as far as I can read off the diff) is a way to

"dream up" an abbreviation when it's not in the namespaces alist.

Thanks again and regards

- -- tomás

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU4l8oACgkQBcgs9XrR2kb7SwCeNO0Z+RJZy6VUeQotm3+qX5rd

nXMAn2QeowgVnEj+9Zh3gMIBZW99Y3bx

=BrEt

-----END PGP SIGNATURE-----

Ricardo Wurmus wrote on 23 Apr 2015 09:04

Recipients:(address . tomas@tuxteam.de)(address . 20339@debbugs.gnu.org)

Message-ID:878udj1g1d.fsf@mango.localdomain

tomas@tuxteam.de writes:

Toggle quote (3 lines)

> What's yet missing (as far as I can read off the diff) is a way to

> "dream up" an abbreviation when it's not in the namespaces alist.

True.

Ideally, this should work even without passing a namespaces alist at all

in both "xml->sxml" and "sxml->xml". The non-abbreviated namespaces

should not cause "sxml->xml" to fail.

Passing around a namespaces alist to both these procedures is the least

invasive approach I could think of, but I still think that it *should*

be made to work without explicitly declaring namespaces.

tomas wrote on 23 Apr 2015 09:40

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:20150423074034.GA20961@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Thu, Apr 23, 2015 at 09:04:46AM +0200, Ricardo Wurmus wrote:

Toggle quote (16 lines)> 
> tomas@tuxteam.de writes:
> 
> > What's yet missing (as far as I can read off the diff) is a way to
> > "dream up" an abbreviation when it's not in the namespaces alist.
> 
> True.
> 
> Ideally, this should work even without passing a namespaces alist at all
> in both "xml->sxml" and "sxml->xml".  The non-abbreviated namespaces
> should not cause "sxml->xml" to fail.
> 
> Passing around a namespaces alist to both these procedures is the least
> invasive approach I could think of, but I still think that it *should*
> be made to work without explicitly declaring namespaces.

I think a combination of our approaches could work: the only difference
(apart of the code elegance) is that my patch grows this alist on its
way down the tree as it encounters new namespace. This meshes well with
the namespace declaration, which scopes recursively down the XML tree.

This afternoon, while I sit at the e-Lok waiting for the FSFE meeting
is a very good moment for me to look into it. I'll report tonight :-)

Thanks & later (dayjob calling)
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU4ofIACgkQBcgs9XrR2kaFNwCfWzPunxHiiDJIJean02rx7pMT
92IAn2IGYW01Cx7aJt32MLRDQYuY9FbP
=owfk
-----END PGP SIGNATURE-----

tomas wrote on 25 Apr 2015 22:25

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:20150425202509.GA3544@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Wed, Apr 22, 2015 at 04:29:32PM +0200, Ricardo Wurmus wrote:

Toggle quote (7 lines)

> >> Since xml->sxml accepts a namespace alist I suppose it would make sense

> >> to extend sxml->xml to do the same.

> Attached is a minimal patch to extend "sxml->xml" such that it accepts an

> optional keyword argument "namespaces" with an alist of prefixes to

> URLs, analogous to "xml->sxml".

Thank you again for the patch. I applied it against 2.0.11, and can confirm

that it works as advertised :-)

I didn't see that xml->sxml has an optional parameter #:namespaces --

to be honest, I didn't expect it there.

So if one knows beforehand what namespaces are used in the XML in question,

it's possible to use the pair xml->sxml and xml->sxml this way (with your

patch, of course, because otherwise sxml->xml "forgets" to output the

relevant XML namespace declarations).

Reading again Oleg Kiselyov's paper[1] I understand that SXML can, as does

XML have namespace abbreviations (called there user-ns-shortcut). It's not

exctly the same thing, but somehow isomorphic. One might use the XML's

abbreviations in the SXML representation, of course.

The problem with this approach is that you either have to carry the

namespace associations "out-of-band", and that you have to know which

namespaces to expect before parsing the XML.

A (more cosmtic) problem is that all namespace declarations are "moved"

to the top-level, because the SXML keeps no "memory" of which node the

namespace declarations were attached to in the original XML.

In [1], there is a mechanism for stashing namespace mappings in the

"attributes list" (strictly in the annotations, which are optionally

tacked to the tail of the attributes list, under the tag *NAMESPACES*.

Anyway -- what would be a good way forward here?

I could imagine taking note of the namespace abbreviations in the

*NAMESPACES* list (while xml->sxml) and issuing the corresponding

declarations in sxml->xml.

Makes sense?

Regards

[1] http://okmij.org/ftp/papers/SXML-paper.pdf

- -- tomás

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU7+CUACgkQBcgs9XrR2kaSxACfdljxbGyVNILgombB3jYWjeOq

1zwAn2RzIEHcJbJIlIMRkaEAIjNFcH7M

=MSYu

-----END PGP SIGNATURE-----

tomas wrote on 26 Apr 2015 12:28

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:20150426102810.GB5922@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Sat, Apr 25, 2015 at 10:25:09PM +0200, tomas@tuxteam.de wrote:

[...]

Toggle quote (5 lines)

> Reading again Oleg Kiselyov's paper[1] I understand that SXML can, as does

> XML have namespace abbreviations (called there user-ns-shortcut). It's not

> exctly the same thing, but somehow isomorphic. One might use the XML's

> abbreviations in the SXML representation, of course.

I take that back: as far as I understand the paper, the (SXML-side) shortcuts

are global to the document, whereas the (XML-side) abbreviations are subtree-

scoped (i.e. for the whole subtree of the element where the declaration

is attached. I don't know ATM whether shadowing is allowed, but I'll look that

up).

So there *is* a subtle difference between "user-ns-shortcut" (the one

you were manipulating with #:namespaces) and the XML "namespace abbreviation"

(the official jargon is "namespace prefix").

Regards

[1] http://okmij.org/ftp/papers/SXML-paper.pdf

- -- tomás

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlU8vboACgkQBcgs9XrR2kadlACeI+p4W8N/dJ49cGBypYNEP/ta

l6MAn3exlNUpj6Z4cYG0Dcb1ltyuQQBB

=x74j

-----END PGP SIGNATURE-----

Andy Wingo wrote on 23 Jun 2016 21:32

Recipients:(address . tomas@tuxteam.de)(address . 20339@debbugs.gnu.org)

Message-ID:87y45vln0f.fsf@pobox.com

See thread here as well:

http://thread.gmane.org/gmane.lisp.guile.devel/17709

I like Ricardo's patch but have some comments here:

http://article.gmane.org/gmane.lisp.guile.devel/18384

Andy

tomas wrote on 13 Jul 2016 15:24

Recipients:(address . 20339@debbugs.gnu.org)

Message-ID:20160713132403.GA2349@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Thu, Jun 23, 2016 at 09:32:16PM +0200, Andy Wingo wrote:

Toggle quote (6 lines)

> See thread here as well:

> http://thread.gmane.org/gmane.lisp.guile.devel/17709

> I like Ricardo's patch but have some comments here:

> http://article.gmane.org/gmane.lisp.guile.devel/18384

(sorry for cc'ing both of you, but I don't know whether you are

subscribed to the bug. Two copies seemed more polite than none).

Sorry folks for not coming back earlier. Real Life and things.

Since I'm going to be off the 'net for one month starting next Friday,

I thought I'll write a short note.

I'll be back the 15th of August and am really willing to do whatever

it takes to bring this forward. OTOH, if any of you decides to pick

it up, I'm sure the results will be better :-)

Referring to Oleg Kiseliov's paper [1], there are actually three

things involved:

- the namespace. This is an XML thing and will typically be

an URI (I don't quite remember whether it *must* be an

URI, but that's irrelevant. It may contain nasty characters

(to XML: it isn't an XML "Name", and potentially to Scheme:

there may be patentheses and things in there, so some

Schemes won't make a symbol of that; Guile doesn't mind)

- the namespace prefix. Again, an XML thing, basically giving

a non-nasty abbreviation for the namespace, to stick it to

the Name, making a "QName". The association prefix -> namespace

is scoped to a node and its descendants, and can be shadowed

at some node below

- the namespace-id, an SXML thing. In [1], this is typically

the namespace, but Oleg Kyselyov made provisions in [1] for a

similar "abbreviation" (the user-ns-shortcut in [1], page 3),

whose mapping can be attached to any node via the

pseudo-attribute *NAMESPACES* [2], which can also carry the

original (XML) namespace prefix.

As far as I understand the paper, most of the time this

namespace-id will be identical to the URI, but it is this

what will be prefixed to the tag name symbols in the

SXML representation.

What Ricardo's patch does is to conflate namespace prefix and

namespace-id and provide a mapping (namespace-id aka prefix) ->

namespace. This is actually quite elegant, since we don't need

the distinction between (XML) prefix and (SXML) namespace-id.

I think that we can, at least as (sxml simple) is concerned,

ignore this distinction.

What is missing? From my point of view:

- At xml->sxml time, the user doesn't know which namespaces

are in the xml. So it would be nice if the XML parser

could provide that.

- It would be super-nice if the XML parser could put that

into the same nodes it found it, as described in [1]

(i.e. in the (*NAMESPACES* ...) pseudo-attribute).

This way we wouldn't have a global mapping, but one

that resembles the original XML, even with the same

prefixes. Less surprises overall. The round trip

xml -> sxml -> xml would be (nearly) the identity.

With Ricardo's patch it would lump all the namespace

declarations up in the top node, which formally is

correct, but might scare XML people a bit :-)

- At sxml->xml time there should be a way to somehow

generate prefixex for "new" namespaces. I don't know

at the moment how this would work, that depends on

how the user is supposed to insert new nodes in the

SXML. Does she specify the namespace? Both prefix

(aka namespace-id, under my current assumption) *and*

namespace? (note that the namespace-id/prefix alone

wouldn't be sufficient).

Sorry for this wall of text. I hope it makes some sense.

Regards

[1] http://okmij.org/ftp/papers/SXML-paper.pdf

[2] Actually, I'm cheating here: the thing is part of an

"annotations" part, which according to the grammar comes

*last*, after all the attributes. But it looks a bit

like an attribute, with a strange name and a more

complex value.

- -- tomás

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleGQPMACgkQBcgs9XrR2kaMfgCeKbA4pWFrCZoxofDF4n9utgnZ

IzYAn1gozFwBLPd/rmNkZvJYDTJ9cIvr

=etJd

-----END PGP SIGNATURE-----

tomas wrote on 13 Jul 2016 20:08

Recipients:(address . 20339@debbugs.gnu.org)

Message-ID:20160713180854.GA12635@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Wed, Jul 13, 2016 at 03:24:03PM +0200, tomas@tuxteam.de wrote:

[...]

Toggle quote (27 lines)> What is missing? From my point of view:
> 
>  - At xml->sxml time, the user doesn't know which namespaces
>    are in the xml. So it would be nice if the XML parser
>    could provide that.
> 
>  - It would be super-nice if the XML parser could put that
>    into the same nodes it found it, as described in [1]
>    (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
>    This way we wouldn't have a global mapping, but one
>    that resembles the original XML, even with the same
>    prefixes. Less surprises overall. The round trip
>    xml -> sxml -> xml would be (nearly) the identity.
> 
>    With Ricardo's patch it would lump all the namespace
>    declarations up in the top node, which formally is
>    correct, but might scare XML people a bit :-)
> 
>  - At sxml->xml time there should be a way to somehow
>    generate prefixex for "new" namespaces. I don't know
>    at the moment how this would work, that depends on
>    how the user is supposed to insert new nodes in the
>    SXML. Does she specify the namespace? Both prefix
>    (aka namespace-id, under my current assumption) *and*
>    namespace? (note that the namespace-id/prefix alone
>    wouldn't be sufficient).

Argh. First post, then think, sorry.

Actually ditch the last point. I think it would be OK

to make the user responsible to keep the *NAMESPACES*

pseudo-attribute up-to-date whenever she adds nodes

with new namespaces to the SXML.

regards

[1] http://okmij.org/ftp/papers/SXML-paper.pdf

- -- tomás

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleGg7YACgkQBcgs9XrR2kY7hACdG5drjpPVlzB4wW6sXhuRKliv

h3cAnAmHC5RxiEc6RXi0tu5U3yF4YYbx

=7uGa

-----END PGP SIGNATURE-----

Andy Wingo wrote on 14 Jul 2016 12:10

Recipients:(address . tomas@tuxteam.de)

Message-ID:87furc1qeu.fsf@pobox.com

Hi :)

On Wed 13 Jul 2016 15:24, tomas@tuxteam.de writes:

Toggle quote (3 lines)

> Referring to Oleg Kiseliov's paper [1], there are actually three

> things involved:

This summary is helpful, thanks.

Toggle quote (6 lines)

> What is missing? From my point of view:

> - At xml->sxml time, the user doesn't know which namespaces

> are in the xml. So it would be nice if the XML parser

> could provide that.

For some documents you do know, of course.

And for larger perspective, I think that SSAX gives you all the tools

you need to build specialist and very flexible XML parsers. So to an

extent solving the general problem isn't necessary -- we can always

point people to SSAX. But that's a bit rude ;) so if there are common

patterns we should try to capture them in xml->sxml. I see this bug as

being a search for those patterns, but without the requirement of

solving the problem in its most general form.

Toggle quote (12 lines)>  - It would be super-nice if the XML parser could put that
>    into the same nodes it found it, as described in [1]
>    (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
>    This way we wouldn't have a global mapping, but one
>    that resembles the original XML, even with the same
>    prefixes. Less surprises overall. The round trip
>    xml -> sxml -> xml would be (nearly) the identity.
>
>    With Ricardo's patch it would lump all the namespace
>    declarations up in the top node, which formally is
>    correct, but might scare XML people a bit :-)

ACK.

Toggle quote (9 lines)

> - At sxml->xml time there should be a way to somehow

> generate prefixex for "new" namespaces. I don't know

> at the moment how this would work, that depends on

> how the user is supposed to insert new nodes in the

> SXML. Does she specify the namespace? Both prefix

> (aka namespace-id, under my current assumption) *and*

> namespace? (note that the namespace-id/prefix alone

> wouldn't be sufficient).

ACK.

What do you think the next step is? I am happy to wait FWIW, dunno if

Ricardo has any feelings here.

Enjoy your holiday :)

Andy

tomas wrote on 14 Jul 2016 12:26

Recipients:(name . Andy Wingo)(address . wingo@pobox.com)

Message-ID:20160714102631.GB5611@tuxteam.de

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

On Thu, Jul 14, 2016 at 12:10:17PM +0200, Andy Wingo wrote:

Toggle quote (24 lines)> Hi :)
> 
> On Wed 13 Jul 2016 15:24, tomas@tuxteam.de writes:
> 
> > Referring to Oleg Kiseliov's paper [1], there are actually three
> > things involved:
> 
> This summary is helpful, thanks.
> > What is missing? From my point of view:
> >
> >  - At xml->sxml time, the user doesn't know which namespaces
> >    are in the xml. So it would be nice if the XML parser
> >    could provide that.
> 
> For some documents you do know, of course.
> 
> And for larger perspective, I think that SSAX gives you all the tools
> you need to build specialist and very flexible XML parsers.  So to an
> extent solving the general problem isn't necessary -- we can always
> point people to SSAX.  But that's a bit rude ;) so if there are common
> patterns we should try to capture them in xml->sxml.  I see this bug as
> being a search for those patterns, but without the requirement of
> solving the problem in its most general form.

It's (sxml simple), after all. I too hesitate to stuff too much into

it. For me, a documented "no, we don't do namespaces" would be one

valid pattern.

Toggle quote (13 lines)

> > - It would be super-nice if the XML parser could put that

> > into the same nodes it found it [...]

> ACK.

> > - At sxml->xml time there should be a way to somehow

> > generate prefixex [...]

> ACK.

> What do you think the next step is? I am happy to wait FWIW, dunno if

> Ricardo has any feelings here.

We meet this afternoon anyway. On my side, I'd be happy to try

something along the sketched lines when I'm back. If someone

who cares beats me at it, I'd be as happy.

Toggle quote (2 lines)

> Enjoy your holiday :)

Looking forward to. BTW: if I understood properly the area you're
living in, we'll cycle past you (somewhat to the West) on our
way to the north.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleHaNcACgkQBcgs9XrR2kaQgQCaAzyyBkI3w0XGJ0HUI9Dz/YXa
7yQAni4CWIDE5ezu+x0DwanoAjfH4Wr2
=DEuD
-----END PGP SIGNATURE-----

Ricardo Wurmus wrote on 4 Feb 2019 21:44

Recipients:(name . Andy Wingo)(address . wingo@pobox.com)

Message-ID:87a7jbi8rx.fsf@elephly.net

Hello!

I just looked at this again and I think I came with something useful.

Here’s some context:

Andy Wingo <wingo@pobox.com> writes:

Toggle quote (52 lines)> Hi :)
>
> On Wed 13 Jul 2016 15:24, tomas@tuxteam.de writes:
>
>> Referring to Oleg Kiseliov's paper [1], there are actually three
>> things involved:
>
> This summary is helpful, thanks.
>> What is missing? From my point of view:
>>
>>  - At xml->sxml time, the user doesn't know which namespaces
>>    are in the xml. So it would be nice if the XML parser
>>    could provide that.
>
> For some documents you do know, of course.
>
> And for larger perspective, I think that SSAX gives you all the tools
> you need to build specialist and very flexible XML parsers.  So to an
> extent solving the general problem isn't necessary -- we can always
> point people to SSAX.  But that's a bit rude ;) so if there are common
> patterns we should try to capture them in xml->sxml.  I see this bug as
> being a search for those patterns, but without the requirement of
> solving the problem in its most general form.
>
>>  - It would be super-nice if the XML parser could put that
>>    into the same nodes it found it, as described in [1]
>>    (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
>>    This way we wouldn't have a global mapping, but one
>>    that resembles the original XML, even with the same
>>    prefixes. Less surprises overall. The round trip
>>    xml -> sxml -> xml would be (nearly) the identity.
>>
>>    With Ricardo's patch it would lump all the namespace
>>    declarations up in the top node, which formally is
>>    correct, but might scare XML people a bit :-)
>
> ACK.
>
>>  - At sxml->xml time there should be a way to somehow
>>    generate prefixex for "new" namespaces. I don't know
>>    at the moment how this would work, that depends on
>>    how the user is supposed to insert new nodes in the
>>    SXML. Does she specify the namespace? Both prefix
>>    (aka namespace-id, under my current assumption) *and*
>>    namespace? (note that the namespace-id/prefix alone
>>    wouldn't be sufficient).
>
> ACK.
>
> What do you think the next step is?  I am happy to wait FWIW, dunno if
> Ricardo has any feelings here.

Attached is a patch that does the requested things. The parser

procedures like FINISH-ELEMENT have access to all the namespaces, so we

I changed the FINISH-ELEMENT procedure to return the list of namespaces

in addition to its SXML tree return value.

I changed name->sxml to use only the namespace aliases / abbreviations

instead of the namespace URIs. (This is not very efficient because we

need to traverse the list of namespaces every time. Maybe we could

memoize this. On the other hand, the length of the namespaces list may

not be large enough to affect performance too much.)

In the end we get both namespace list and SXML tree from running the

parser. Before wrapping this up in *TOP* we generate xmlns attributes

for all abbreviations and “patch” the first proper element’s attribute

list (i.e. we skip over a *PI* element if it exists).

The result is an SXML tree that begins with namespace declarations,

mapping abbreviations to URIs. Within the SXML tree we’re only using

abbreviations, so there are no more invalid characters when converting

SXML to a string.

I would be happy if you could test this as I’m not 100% confident that

this is correct. Here are questions I wasn’t able to answer

conclusively:

* Is the value for “namespaces” that’s passed in to the

FINISH-ELEMENT procedure always the same?

* Will the second return value of the final call to FINISH-ELEMENT

really always be the complete list of *all* namespaces that have been

encountered?

* Are there valid XML documents for which the match patterns to inject

namespace declarations would not apply? (e.g. documents with a PI

element and two separate XML trees)

Ricardo

From 83ee9de18a0ecaa237eb73e1b75d0b21e3e8d321 Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <rekado@elephly.net>
Date: Mon, 4 Feb 2019 21:39:06 +0100
Subject: [PATCH] sxml: xml->sxml: Record and use namespace abbreviations.

* module/sxml/simple.scm (xml->sxml): Add namespace declarations to the
attribute list of the first XML element.
[name->sxml]: Accept namespaces argument to look up abbreviation.
Return name with abbreviation prefix.
[parser]: Let FINISH-ELEMENT procedure return namespaces in addition to
SXML tree.
---
 module/sxml/simple.scm | 50 +++++++++++++++++++++++++++++++++---------
 1 file changed, 40 insertions(+), 10 deletions(-)

Toggle diff (95 lines)diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 703ad9137..52dd9af12 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -1,7 +1,8 @@
 ;;;; (sxml simple) -- a simple interface to the SSAX parser
 ;;;;
-;;;; 	Copyright (C) 2009, 2010, 2013  Free Software Foundation, Inc.
+;;;; 	Copyright (C) 2009, 2010, 2013, 2019  Free Software Foundation, Inc.
 ;;;;    Modified 2004 by Andy Wingo <wingo at pobox dot com>.
+;;;;    Modified 2019 by Ricardo Wurmus <rekado@elephly.net>.
 ;;;;    Originally written by Oleg Kiselyov <oleg at pobox dot com> as SXML-to-HTML.scm.
 ;;;; 
 ;;;; This library is free software; you can redistribute it and/or
@@ -30,6 +31,7 @@
   #:use-module (sxml ssax)
   #:use-module (sxml transform)
   #:use-module (ice-9 match)
+  #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-13)
   #:export (xml->sxml sxml->xml sxml->string))
 
@@ -123,10 +125,15 @@ port."
         (acons '*DEFAULT* default-entity-handler entities)
         entities))
 
-  (define (name->sxml name)
+  (define (name->sxml name namespaces)
     (match name
       ((prefix . local-part)
-       (symbol-append prefix (string->symbol ":") local-part))
+       (let ((abbrev (and=> (find (match-lambda
+                                    ((abbrev uri . rest)
+                                     (and (eq? uri prefix) abbrev)))
+                                  namespaces)
+                            first)))
+         (symbol-append abbrev (string->symbol ":") local-part)))
       (_ name)))
 
   (define (doctype-continuation seed)
@@ -152,14 +159,16 @@ port."
                        (ssax:reverse-collect-str seed)))
              (attrs (attlist-fold
                      (lambda (attr accum)
-                       (cons (list (name->sxml (car attr)) (cdr attr))
+                       (cons (list (name->sxml (car attr) namespaces)
+                                   (cdr attr))
                              accum))
                      '() attributes)))
-         (acons (name->sxml elem-gi)
-                (if (null? attrs)
-                    seed
-                    (cons (cons '@ attrs) seed))
-                parent-seed)))
+         (values (acons (name->sxml elem-gi namespaces)
+                        (if (null? attrs)
+                            seed
+                            (cons (cons '@ attrs) seed))
+                        parent-seed)
+                 namespaces)))
 
      CHAR-DATA-HANDLER ; fhere
      (lambda (string1 string2 seed)
@@ -212,7 +221,28 @@ port."
   (let* ((port (if (string? string-or-port)
                    (open-input-string string-or-port)
                    string-or-port))
-         (elements (reverse (parser port '()))))
+         (elements (call-with-values
+                       (lambda () (parser port '()))
+                     (lambda (elements namespaces)
+                       ;; Generate namespace declarations mapping
+                       ;; abbreviations to URLs.
+                       (let ((ns-declarations
+                              (filter-map (match-lambda
+                                            (('*DEFAULT* . _) #f)
+                                            ((abbrev uri . _)
+                                             (list (symbol-append 'xmlns: abbrev)
+                                                   (symbol->string uri))))
+                                          namespaces)))
+                         ;; Inject namespace declarations into the first
+                         ;; proper element.
+                         (match (reverse elements)
+                           (((and pi-elem ('*PI* . _))
+                             (tag ('@ . attrs) . children))
+                            `(,pi-elem (,tag (@ ,@ns-declarations ,attrs)
+                                             ,@children)))
+                           (((tag ('@ . attrs) . children))
+                            `(,tag (@ ,@ns-declarations ,attrs)
+                                   ,@children))))))))
     `(*TOP* ,@elements)))
 
 (define check-name
-- 
2.20.1

John Cowan wrote on 4 Feb 2019 23:55

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:CAD2gp_ScjmURZ7yTFronxyR9r4P4P2L91mXNHguXpZG86chdVA@mail.gmail.com

On Mon, Feb 4, 2019 at 3:45 PM Ricardo Wurmus <rekado@elephly.net> wrote:

\I changed name->sxml to use only the namespace aliases / abbreviations

Toggle quote (3 lines)

> instead of the namespace URIs.

The trouble with that is that XML rnamespaces are lexically scoped, like
Scheme
local variables.  It is perfectly valid to map a prefix to more than one
URL,
as long as the namespace declarations are in either disjoint or nested
elements.  So you don't know what the absolute name of the element
or attribute is from just the prefix and the local part.

Furthermore, it is also legal to define more than one prefix for
the same URL, in which case names using either prefix are normally
treated as equivalent (however, you can't have elements like
<a:foo>...</b:foo>
even if a and b map to the same namespace).

* Is the value for “namespaces” that’s passed in to the

Toggle quote (7 lines)

> FINISH-ELEMENT procedure always the same?

> * Will the second return value of the final call to FINISH-ELEMENT

> really always be the complete list of *all* namespaces that have been

> encountered?

Definitely not, only the namespaces that are currently in scope.

* Are there valid XML documents for which the match patterns to inject

Toggle quote (4 lines)

> namespace declarations would not apply? (e.g. documents with a PI

> element and two separate XML trees)

That's not well-formed: you can only have a single element tree per XML
document, although you can have any number of PIs, comments, and
whitespace (which is normally ignored) before and after.

-- 
John Cowan          http://vrici.lojban.org/~cowan       cowan@ccil.org
If I have seen farther than others, it is because I was looking through a
spyglass with my one good eye, with a parrot standing on my shoulder. --"Y"

Attachment: file

Ricardo Wurmus wrote on 5 Feb 2019 10:12

Recipients:(name . John Cowan)(address . cowan@ccil.org)

Message-ID:874l9iiopl.fsf@elephly.net

Hi John,

Toggle quote (24 lines)> The trouble with that is that XML rnamespaces are lexically scoped, like
> Scheme
> local variables.  It is perfectly valid to map a prefix to more than one
> URL,
> as long as the namespace declarations are in either disjoint or nested
> elements.  So you don't know what the absolute name of the element
> or attribute is from just the prefix and the local part.
>
> Furthermore, it is also legal to define more than one prefix for
> the same URL, in which case names using either prefix are normally
> treated as equivalent (however, you can't have elements like
> <a:foo>...</b:foo>
> even if a and b map to the same namespace).
>
> * Is the value for “namespaces” that’s passed in to the
>>   FINISH-ELEMENT procedure always the same?
>>
>> * Will the second return value of the final call to FINISH-ELEMENT
>>   really always be the complete list of *all* namespaces that have been
>>   encountered?
>>
>
> Definitely not, only the namespaces that are currently in scope.

Thanks for the clarifications!

In that case we coud have FINISH-ELEMENT add all namespace declarations

that are in scope to the current node that is about to be returned. It

would be a little verbose, but more correct.

What do you think?

Ricardo

Ricardo Wurmus wrote on 5 Feb 2019 13:57

Recipients:(name . John Cowan)(address . cowan@ccil.org)(address . 20339@debbugs.gnu.org)

Message-ID:87r2cmgzq0.fsf@elephly.net

Ricardo Wurmus <rekado@elephly.net> writes:

Toggle quote (4 lines)

> In that case we coud have FINISH-ELEMENT add all namespace declarations

> that are in scope to the current node that is about to be returned. It

> would be a little verbose, but more correct.

Like this:

From d44c702718baea4c4557d12ca8dd7dab724c7fb6 Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <rekado@elephly.net>
Date: Mon, 4 Feb 2019 21:39:06 +0100
Subject: [PATCH] sxml: xml->sxml: Record and use namespace abbreviations.

* module/sxml/simple.scm (xml->sxml)
[name->sxml]: Accept namespaces argument to look up abbreviation.
Return name with abbreviation prefix.
[parser]: Let FINISH-ELEMENT procedure return namespaces in addition to
the SXML tree's attributes.
---
 module/sxml/simple.scm | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

Toggle diff (70 lines)diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm
index 703ad9137..2bb332c83 100644
--- a/module/sxml/simple.scm
+++ b/module/sxml/simple.scm
@@ -1,7 +1,8 @@
 ;;;; (sxml simple) -- a simple interface to the SSAX parser
 ;;;;
-;;;; 	Copyright (C) 2009, 2010, 2013  Free Software Foundation, Inc.
+;;;; 	Copyright (C) 2009, 2010, 2013, 2019  Free Software Foundation, Inc.
 ;;;;    Modified 2004 by Andy Wingo <wingo at pobox dot com>.
+;;;;    Modified 2019 by Ricardo Wurmus <rekado@elephly.net>.
 ;;;;    Originally written by Oleg Kiselyov <oleg at pobox dot com> as SXML-to-HTML.scm.
 ;;;; 
 ;;;; This library is free software; you can redistribute it and/or
@@ -30,6 +31,7 @@
   #:use-module (sxml ssax)
   #:use-module (sxml transform)
   #:use-module (ice-9 match)
+  #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-13)
   #:export (xml->sxml sxml->xml sxml->string))
 
@@ -123,10 +125,15 @@ port."
         (acons '*DEFAULT* default-entity-handler entities)
         entities))
 
-  (define (name->sxml name)
+  (define (name->sxml name namespaces)
     (match name
       ((prefix . local-part)
-       (symbol-append prefix (string->symbol ":") local-part))
+       (let ((abbrev (and=> (find (match-lambda
+                                    ((abbrev uri . rest)
+                                     (and (eq? uri prefix) abbrev)))
+                                  namespaces)
+                            first)))
+         (symbol-append abbrev (string->symbol ":") local-part)))
       (_ name)))
 
   (define (doctype-continuation seed)
@@ -150,12 +157,21 @@ port."
        (let ((seed (if trim-whitespace?
                        (ssax:reverse-collect-str-drop-ws seed)
                        (ssax:reverse-collect-str seed)))
-             (attrs (attlist-fold
-                     (lambda (attr accum)
-                       (cons (list (name->sxml (car attr)) (cdr attr))
-                             accum))
-                     '() attributes)))
-         (acons (name->sxml elem-gi)
+             (attrs (append
+                     ;; Namespace declarations
+                     (filter-map (match-lambda
+                                   (('*DEFAULT* . _) #f)
+                                   ((abbrev uri . _)
+                                    (list (symbol-append 'xmlns: abbrev)
+                                          (symbol->string uri))))
+                                 namespaces)
+                     (attlist-fold
+                      (lambda (attr accum)
+                        (cons (list (name->sxml (car attr) namespaces)
+                                    (cdr attr))
+                              accum))
+                      '() attributes))))
+         (acons (name->sxml elem-gi namespaces)
                 (if (null? attrs)
                     seed
                     (cons (cons '@ attrs) seed))
-- 
2.20.1

It’s quite verbose because it doesn’t check if a namespace declaration

is the same in a parent.

Ricardo

tomas wrote on 12 Feb 2019 10:56

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:20190212095602.GD13448@tuxteam.de

On Mon, Feb 04, 2019 at 09:44:02PM +0100, Ricardo Wurmus wrote:

Toggle quote (5 lines)

> Hello!

> I just looked at this again and I think I came with something useful.

> Here’s some context:

[...]

Toggle quote (5 lines)

> Attached is a patch that does the requested things. The parser

> procedures like FINISH-ELEMENT have access to all the namespaces, so we

> I changed the FINISH-ELEMENT procedure to return the list of namespaces

> in addition to its SXML tree return value.

It's great that you pick that up, I'm excited :-)

I have lost a bit of contact to Guile as of late. But I'm preparing

some tooling to give your patches a whirl; in the meantime a couple

of comments from the peanut gallery:

As John has noted, the namespace mappings (i.e. the prefix -> namespace

URI binding) are kind of lexically scoped (I'd call it subtree scoped,

but structurally it is the same). While parsing is "easy" (assuming

well-formed XML), serializing is not unambiguous. In a way, the library

might want to be prepared to take hints from the application (as far

as the XML is to be read by humans, there might be "better" and "worse"

serializations).

It may take me a couple of days to come up to speed.

Thanks a lot & cheers

-- t

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEUEARECAAYFAlximDIACgkQBcgs9XrR2kbicwCWNOloNf1OUTw7vsDBAlmuxDLi

egCffA4PYlxxVDtlzgdSZ4HqlUTN1o4=

=DZql

-----END PGP SIGNATURE-----

Ricardo Wurmus wrote on 12 Feb 2019 21:30

Recipients:(address . tomas@tuxteam.de)

Message-ID:87wom4iwc3.fsf@elephly.net

tomas@tuxteam.de writes:

Toggle quote (5 lines)

> As John has noted, the namespace mappings (i.e. the prefix -> namespace

> URI binding) are kind of lexically scoped (I'd call it subtree scoped,

> but structurally it is the same). While parsing is "easy" (assuming

> well-formed XML), serializing is not unambiguous.

The “fup” handler of the parser visits every element and has a list of

namespaces that are in scope at this point. Its purpose is to return

the SXML representation of that element. At this point we can record

the namespaces as attributes. (That’s what the patch does.)

When baking XML from SXML we don’t need to do anything special — we only

need to convert everything to text, including the recorded namespace

attributes. This isn’t pretty SXML (nor is it pretty XML), but it

appears to be correct as none of the namespace information is lost.

To get a better serialized representation the parser needs to do a

better job of identifying “new” namespaces.

Toggle quote (4 lines)

> In a way, the library might want to be prepared to take hints from the

> application (as far as the XML is to be read by humans, there might be

> "better" and "worse" serializations).

The XML produced when this patch is applied will not be pretty.  To
generate minimal/pretty XML knowledge of the parent elements’ namespaces
is required — knowledge that the parser’s “fup” handler does not have.

We could try to alter the parser so that it not only passes the list of
namespaces that are currently in scope, but also a list of namespaces
that are in scope for the parent node.  This would allow us to determine
the list of *new* namespaces that absolutely must be declared for the
current node.  If there are no new namespaces we can simply ignore them
and produce minimal SXML (and thus minimal XML later when the SXML is
serialized).

--
Ricardo

tomas wrote on 8 Apr 2019 14:14

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:20190408121403.GA781@tuxteam.de

Attachment: file

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlyrOwsACgkQBcgs9XrR2kabQgCeIvJGAfCZb5KnVNe7M7VFapAY

l9kAn110JNoUb3XRLxV8nCAk4ihppgsF

=bnBc

-----END PGP SIGNATURE-----

tomas wrote on 3 May 2019 12:46

Re: bug#20339: Taking a step back (was: sxml simple: sxml->xml mishandles namespaces?)

Recipients:(name . Ricardo Wurmus)(address . rekado@elephly.net)

Message-ID:20190503104627.GE31083@tuxteam.de

Hi,

after mulling over it for a while, I think it's time to take

a step back and think a bit about where we'd like to go with

this.

Note that I'm ignoring technical details (the fact that the

SXML, and thus the XML serialization now has namespace declarations

everywhere down the path instead of just at the corresponding root

node, and the thing with the default namespaces, as noted in [1],

seem to me "fixable" technical details).

Your patch, Ricardo, takes a new approach wrt. the SXML resulting

from an XML parse: the full tag names (the QNAMEs, in XML parlance)

are now composed of <prefix>:<name> (mimicking the XML) instead

of <namespace uri>:<name>, as the former (sxml simple) used to

do. This has upsides and downsides.

I'll call your approach the "prefix" approach (as having the

prefixes to qualify the tag names) and the approach followed

by (sxml simple) up to now the "URI" approach, which haves

the full namespace URI qualifying the name.

In the URI approach, a qualified tag name would look like

"http://example.org/namespaces/myns:node"

whereas in the prefix approach, it'd look like

"myns:root"

plus the knowledge somewhere that the prefix "myns" stands for

myns -> http://example.org/namespaces/myns

Upsides of the prefix approach:

+ it mimics more closely the XML syntax. Since that

is what the XML folks see, that follows the "principle

of least astonishment" (aka POLA)

+ it is forced to keep the prefix -> namespace associations

(it would be semantically incomplete if not, since what

counts semantically is the namespace URI)

Downsides

- it contradicts current documentation

"All namespaces in the XML document must be declared, via

xmlns attributes. SXML elements built from non-default

namespaces will have their tags prefixed with their

URI. Users can specify custom prefixes for certain

namespaces with the #:namespaces keyword argument to

xml->sxml." [2]

This can be changed, of course :-)

But perhaps someone is already relying on it?

- working on the resulting SXML becomes harder, because

to compare two qualified names, we'd have to resolve

the namespace associations.

Upsides of the URI approach

+ it is what the documentation says

+ it follows more closely the XML semantics (the namespace

prefix in itself is irrelevant after all). As a corollary,

working on the SXML becomes easier: a comparison of two

qualified names becomes a simple string comparison, etc.

I think that is why (sxml simple)'s original design followed

this path.

Downsides

Well, negate the "prefix approach" upsides :-)

Let me just say that there seem to be precedents for the

prefix approach out there in the 'net: the Wikipedia

article [3] (yes, there's a wikipedia on that!) follows

the prefix approach. This nice blog post [4] too.

I think I'll stop here. Mi fingers itch with some hacking,

but I think we should pause and ponder before hacking.

Perhaps we should take this to guile-devel? OTOH, if someone

knows The Way Forward (TM), I'm willing to hack in this

direction.

Cheers & thanks

[1] Message ID <20190408121403.GA781@tuxteam.de>

http://lists.gnu.org/archive/html/bug-guile/2019-04/msg00001.html

[2] https://www.gnu.org/software/guile/manual/guile.html#SXML

[3] https://en.wikipedia.org/wiki/SXML

[4] https://www.more-magic.net/posts/lispy-dsl-sxml.html

-- tomás

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlzMHAMACgkQBcgs9XrR2karLACdFBBbZnzvLF3kxFuyGiO1LdFl

7a8An3REZ122yhfCev5iLBMuQTKWSwMH

=m2+q

-----END PGP SIGNATURE-----

Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 20339@debbugs.gnu.org

is:open	open issues
is:done	closed issues
submitter:<who>	search issue submitter
author:<who>	search by message author
date:yesterday..now	search by issue date
mdate:3m..2d	search by message date