Skip to content

Commit

Permalink
Replace sax with @rgrove/parse-xml
Browse files Browse the repository at this point in the history
Closes GH-3.

This commit switches from `sax`, a lax parser, to `parse-xml`, a proper and
fast parser.

This does mean that invalid XML can no longer be passed, so it’s a breaking
change (for example, all documents now require a root element, doctypes need to
be capitalized, and CDATA around the root element is no longer okay).
It also means that whitespace around the root element is no longer present
in the tree: XML requires stripping that.

It does mean that positional info is now much better, processing instructions
are supported, and the size is cut by a lot.
  • Loading branch information
wooorm committed Feb 5, 2023
1 parent 2c32bd5 commit 07a5e57
Show file tree
Hide file tree
Showing 35 changed files with 812 additions and 1,014 deletions.
796 changes: 248 additions & 548 deletions lib/index.js

Large diffs are not rendered by default.

14 changes: 6 additions & 8 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,9 @@
"index.js"
],
"dependencies": {
"@types/sax": "^1.0.0",
"@types/unist": "^2.0.0",
"@rgrove/parse-xml": "^4.1.0",
"@types/xast": "^1.0.0",
"sax": "^1.0.0",
"vfile-location": "^4.0.0",
"vfile-message": "^3.0.0"
},
"devDependencies": {
Expand Down Expand Up @@ -75,26 +74,25 @@
"xo": {
"prettier": true,
"rules": {
"unicorn/prefer-code-point": "off"
"unicorn/prefer-switch": "off"
},
"overrides": [
{
"files": "test/**/*.js",
"rules": {
"no-await-in-loop": 0
"no-await-in-loop": "off"
}
}
]
},
"remarkConfig": {
"plugins": [
"preset-wooorm"
"remark-preset-wooorm"
]
},
"typeCoverage": {
"atLeast": 100,
"detail": true,
"strict": true,
"ignoreCatch": true
"strict": true
}
}
22 changes: 10 additions & 12 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,14 @@

## What is this?

This package is a utility that takes XML input and turns it into a [xast][]
syntax tree.
It uses [`sax`][sax], which turns XML into events, while it turns those events
into nodes.
This package is a utility that takes serialized XML as input and turns it into
a [xast][] syntax tree.
It uses [`@rgrove/parse-xml`][parse-xml], which is a good and fast XML parser,
and turns its results into xast.

## When should I use this?

If you want to handle syntax trees, use this.
Use [`sax`][sax] itself instead when you want to do other things.
If you want to use xast syntax trees, use this.

The utility [`xast-util-to-xml`][xast-util-to-xml] does the inverse of this
utility.
Expand Down Expand Up @@ -84,7 +83,7 @@ import {fromXml} from 'xast-util-from-xml'

const tree = fromXml(await fs.readFile('example.xml'))

console.log(tree)
console.dir(tree, {depth: null})
```

…now running `node example.js` yields (positional info removed for brevity):
Expand Down Expand Up @@ -121,15 +120,14 @@ console.log(tree)
},
{type: 'text', value: '\n'}
]
},
{type: 'text', value: '\n'}
}
]
}
```

## API

This package exports the identifier [`fromXml`][fromxml].
This package exports the identifier [`fromXml`][api-from-xml].
There is no default export.

### `fromXml(value)`
Expand Down Expand Up @@ -236,8 +234,8 @@ abide by its terms.

[root]: https://github.com/syntax-tree/xast#root

[sax]: https://github.com/isaacs/sax-js
[parse-xml]: https://github.com/rgrove/parse-xml

[xast-util-to-xml]: https://github.com/syntax-tree/xast-util-to-xml

[fromxml]: #fromxmlvalue
[api-from-xml]: #fromxmlvalue
28 changes: 12 additions & 16 deletions test/fixtures/attribute/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -109,22 +109,18 @@
"offset": 144
}
}
}
],
"position": {
"start": {
"line": 1,
"column": 1,
"offset": 0
},
{
"type": "text",
"value": "\n",
"position": {
"start": {
"line": 10,
"column": 8,
"offset": 144
},
"end": {
"line": 11,
"column": 1,
"offset": 145
}
}
"end": {
"line": 11,
"column": 1,
"offset": 145
}
]
}
}
Loading

0 comments on commit 07a5e57

Please sign in to comment.