Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would want experiment with root structure of XML.

Make it possible to have multiple root elements. One problem is that however big an XML document is, it has to be closed at the end, so it ends up as one atomic element, you can't partially parse an XML document correctly.



XSLT 3.0 allows streaming for large XML files. See e.g. https://www.saxonica.com/html/documentation10/sourcedocs/str...


Isn't that what SAX (event based) and reader (pull based) XML parsing is for? Those allow you to incrementally/partially parse XML.


Well... There is nothing stopping you partially parsing an XML documents. What you can't do is validated it. Which is the same for any other file format. You be sure the file is fully valid without fully parsing the whole file.

However, the only order you can partially parse an XML file is linear order, which clashes badly with the fact that it's a tree based format. Depending on how tree-like your schema is, this might be a massive hinderance.

This flaw isn't unique to XML. All text encoded file formats share this characteristic, and can only be parsed linearly. If you move across to the world of binary file formats, it's extremely common for them to have indexes of offsets so a parser can navigate a tree-like structure in tree order without having to fully parse it, along with other types of non-linear data structures.


> All text encoded file formats share this characteristic, and can only be parsed linearly. If you move across to the world of binary file formats, it's extremely common for them to have indexes of offsets so a parser can navigate a tree-like structure in tree order without having to fully parse it, along with other types of non-linear data structures.

You don't always have the luxury of randomly accessing a file (obvious example: shell pipelines with a producer and consumer exchanging lots of temporary data), so taking advantage of indexing might require saving a temporary file and stalling processing until the file is ready.

Personally, I'm used to parsing large XML files with event-based APIs and throwing away data aggressively, keeping in memory only one unfinished element of interest, the stack of its ancestors, and my collected data (instead of a DOM for the whole document).


Its super common for xml parsers to support partial documents.


and, AIUI, XMPP is based entirely around that very idea so it's not even a niche concept (well, aside from XMPP being arguably a niche protocol)


What do you mean by "partially parse"?


If you imagine a log file that gets lines added to it now and then, its actually quite fiddly to do that with XML because of the closing outer tag


Well, I see the awkwardness of emitting a log like that, but it's not any worse than emitting a single JSON array and waiting for that closing ']'.

Both can be emitted and parsed in a streaming fashion though, but I wouldn't say that either XML or JSON is suitable for logs. Maybe NDJSON, but it's more like a hack around this limitation.


> but I wouldn't say that either XML or JSON is suitable for logs

Of course. Better use a binary format for this.

https://systemd.io/JOURNAL_FILE_FORMAT/

:-)


for json it's common to have multiple records per file. Then it's just one object per line, for example.


Yes, that's NDJSON, as in newline delimited JSON.

http://ndjson.org/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: