Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: fb55/htmlparser2
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v8.0.1
Choose a base ref
...
head repository: fb55/htmlparser2
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v8.0.2
Choose a head ref
Loading
Showing with 9,357 additions and 8,194 deletions.
  1. +17 −3 .eslintrc.json
  2. +1 −1 .github/workflows/dependabot-automerge.yml
  3. +12 −5 .github/workflows/nodejs-test.yml
  4. +19 −15 README.md
  5. +2,703 −2,740 package-lock.json
  6. +19 −14 package.json
  7. +31 −19 src/FeedHandler.spec.ts
  8. +3 −3 src/Parser.spec.ts
  9. +15 −15 src/Parser.ts
  10. +39 −18 src/Tokenizer.spec.ts
  11. +125 −64 src/Tokenizer.ts
  12. +66 −29 src/WritableStream.spec.ts
  13. +6 −6 src/WritableStream.ts
  14. +0 −42 src/__fixtures__/Events/01-simple.json
  15. +0 −60 src/__fixtures__/Events/02-template.json
  16. +0 −47 src/__fixtures__/Events/03-lowercase_tags.json
  17. +0 −53 src/__fixtures__/Events/04-cdata.json
  18. +0 −30 src/__fixtures__/Events/05-cdata-special.json
  19. +0 −12 src/__fixtures__/Events/06-leading-lt.json
  20. +0 −48 src/__fixtures__/Events/07a-end_slash--void.json
  21. +0 −48 src/__fixtures__/Events/07b-end_slash--void_without.json
  22. +0 −53 src/__fixtures__/Events/07c-end_slash--void_without--xmlMode.json
  23. +0 −48 src/__fixtures__/Events/07d-end_slash--non_void.json
  24. +0 −53 src/__fixtures__/Events/07e-end_slash--non_void--xmlmode.json
  25. +0 −53 src/__fixtures__/Events/07f-end_slash--non_void--recognize_self_closing.json
  26. +0 −54 src/__fixtures__/Events/07g-end_slash--consumed_by_attrib_value_in_void.json
  27. +0 −60 src/__fixtures__/Events/07h-end_slash--consumed_by_attrib_value_in_non_void.json
  28. +0 −492 src/__fixtures__/Events/08-implicit-close-tags.json
  29. +0 −63 src/__fixtures__/Events/09-attributes.json
  30. +0 −49 src/__fixtures__/Events/10-crazy-attrib.json
  31. +0 −48 src/__fixtures__/Events/11-script_in_script.json
  32. +0 −78 src/__fixtures__/Events/12-long-comment-end.json
  33. +0 −83 src/__fixtures__/Events/13-long-cdata-end.json
  34. +0 −108 src/__fixtures__/Events/14-implicit-open-tags.json
  35. +0 −12 src/__fixtures__/Events/15-lt-whitespace.json
  36. +0 −42 src/__fixtures__/Events/16-double_attribs.json
  37. +0 −12 src/__fixtures__/Events/17-numeric_entities.json
  38. +0 −12 src/__fixtures__/Events/18-legacy_entities.json
  39. +0 −12 src/__fixtures__/Events/19-named_entities.json
  40. +0 −17 src/__fixtures__/Events/20-xml_entities.json
  41. +0 −40 src/__fixtures__/Events/21-entity_in_attribute.json
  42. +0 −36 src/__fixtures__/Events/22-double_brackets.json
  43. +0 −12 src/__fixtures__/Events/23-legacy_entity_fail.json
  44. +0 −222 src/__fixtures__/Events/24-special_special.json
  45. +0 −12 src/__fixtures__/Events/25-empty_tag_name.json
  46. +0 −36 src/__fixtures__/Events/26-not-quite-closed.json
  47. +0 −57 src/__fixtures__/Events/27-entities_in_attributes.json
  48. +0 −18 src/__fixtures__/Events/28-cdata_in_html.json
  49. +0 −36 src/__fixtures__/Events/29-comment_edge-cases.json
  50. +0 −53 src/__fixtures__/Events/30-cdata_edge-cases.json
  51. +0 −18 src/__fixtures__/Events/31-comment_false-ending.json
  52. +0 −30 src/__fixtures__/Events/32-script-ending-with-lessthan.json
  53. +0 −29 src/__fixtures__/Events/33-cdata_more-edge-cases.json
  54. +0 −24 src/__fixtures__/Events/34-not-alpha-tags.json
  55. +0 −47 src/__fixtures__/Events/35-non-br-void-close-tag.json
  56. +0 −42 src/__fixtures__/Events/36-entity-in-attrib.json
  57. +0 −30 src/__fixtures__/Events/37-entity-in-title.json
  58. +0 −35 src/__fixtures__/Events/38-entity-in-title-no-decode.json
  59. +0 −30 src/__fixtures__/Events/39-title-in-script.json
  60. +0 −47 src/__fixtures__/Events/40-xml_tags.json
  61. +0 −12 src/__fixtures__/Events/41-trailing-legacy-entity.json
  62. +0 −12 src/__fixtures__/Events/42-trailing-numeric-entity.json
  63. +0 −12 src/__fixtures__/Events/43-multibyte-entity.json
  64. +0 −156 src/__fixtures__/Events/44-indices.json
  65. +0 −65 src/__fixtures__/Events/45-self-closing-indices.json
  66. +0 −13 src/__fixtures__/Events/46-entity-after-lt.json
  67. +0 −5 src/__fixtures__/Feeds/01-rss.json
  68. +0 −5 src/__fixtures__/Feeds/02-atom.json
  69. +0 −5 src/__fixtures__/Feeds/03-rdf.json
  70. +0 −5 src/__fixtures__/Stream/01-basic.json
  71. +0 −6 src/__fixtures__/Stream/02-RSS.json
  72. +0 −6 src/__fixtures__/Stream/03-Atom.json
  73. +0 −6 src/__fixtures__/Stream/04-RDF.json
  74. +0 −5 src/__fixtures__/Stream/05-Attributes.json
  75. +0 −5 src/__fixtures__/Stream/06-Svg.json
  76. +61 −136 src/__fixtures__/test-helper.ts
  77. +28 −80 src/__snapshots__/FeedHandler.spec.ts.snap
  78. +119 −49 src/__snapshots__/Tokenizer.spec.ts.snap
  79. +2,105 −2,105 src/__snapshots__/WritableStream.spec.ts.snap
  80. +3 −3 src/__snapshots__/index.spec.ts.snap
  81. +3,749 −0 src/__tests__/__snapshots__/events.ts.snap
  82. +215 −10 src/__tests__/events.ts
  83. +3 −3 src/index.spec.ts
  84. +18 −15 src/index.ts
20 changes: 17 additions & 3 deletions .eslintrc.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
{
"extends": ["eslint:recommended", "prettier"],
"extends": [
"eslint:recommended",
"prettier",
"plugin:n/recommended",
"plugin:unicorn/recommended"
],
"env": {
"node": true,
"es6": true
@@ -21,7 +26,14 @@
"spaced-comment": 2,
"yoda": [2, "never"],
"curly": [2, "multi-line"],
"no-else-return": 2
"no-else-return": 2,

"unicorn/prefer-module": 0,
"unicorn/filename-case": 0,
"unicorn/no-null": 0,
"unicorn/prefer-code-point": 0,
"unicorn/prefer-string-slice": 0,
"unicorn/prefer-add-event-listener": 0
},
"overrides": [
{
@@ -55,7 +67,9 @@
"@typescript-eslint/prefer-includes": 2,
"@typescript-eslint/no-unnecessary-condition": 2,
"@typescript-eslint/switch-exhaustiveness-check": 2,
"@typescript-eslint/prefer-nullish-coalescing": 2
"@typescript-eslint/prefer-nullish-coalescing": 2,

"n/no-unsupported-features/es-syntax": 0
}
}
]
2 changes: 1 addition & 1 deletion .github/workflows/dependabot-automerge.yml
Original file line number Diff line number Diff line change
@@ -13,7 +13,7 @@ jobs:
steps:
- name: Dependabot metadata
id: metadata
uses: dependabot/fetch-metadata@v1.3.1
uses: dependabot/fetch-metadata@v1.3.6
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
- name: Enable auto-merge for Dependabot PRs
17 changes: 12 additions & 5 deletions .github/workflows/nodejs-test.yml
Original file line number Diff line number Diff line change
@@ -9,7 +9,10 @@ on:
env:
CI: true
FORCE_COLOR: 2
NODE_COV: 16 # The Node.js version to run coveralls on
NODE_COV: lts/* # The Node.js version to run coveralls on

permissions:
contents: read # to fetch code (actions/checkout)

jobs:
lint:
@@ -18,23 +21,27 @@ jobs:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: 16
node-version: lts/*
cache: npm
- run: npm ci
- run: npm run lint

test:
permissions:
contents: read # to fetch code (actions/checkout)
checks: write # to create new checks (coverallsapp/github-action)

name: Node ${{ matrix.node }}
runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
node:
- 10
- 12
- 14
- 16
- 18
- lts/*

steps:
- uses: actions/checkout@v3
@@ -55,7 +62,7 @@ jobs:
if: matrix.node == env.NODE_COV

- name: Run Coveralls
uses: coverallsapp/github-action@1.1.3
uses: coverallsapp/github-action@v2.0.0
if: matrix.node == env.NODE_COV
continue-on-error: true
with:
34 changes: 19 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# htmlparser2

[![NPM version](http://img.shields.io/npm/v/htmlparser2.svg?style=flat)](https://npmjs.org/package/htmlparser2)
[![Downloads](https://img.shields.io/npm/dm/htmlparser2.svg?style=flat)](https://npmjs.org/package/htmlparser2)
[![Build Status](https://img.shields.io/github/workflow/status/fb55/htmlparser2/Node.js%20Test?label=tests&style=flat)](https://github.com/fb55/htmlparser2/actions?query=workflow%3A%22Node.js+Test%22)
[![Coverage](http://img.shields.io/coveralls/fb55/htmlparser2.svg?style=flat)](https://coveralls.io/r/fb55/htmlparser2)
[![NPM version](https://img.shields.io/npm/v/htmlparser2.svg)](https://npmjs.org/package/htmlparser2)
[![Downloads](https://img.shields.io/npm/dm/htmlparser2.svg)](https://npmjs.org/package/htmlparser2)
[![Node.js CI](https://github.com/fb55/htmlparser2/actions/workflows/nodejs-test.yml/badge.svg)](https://github.com/fb55/htmlparser2/actions/workflows/nodejs-test.yml)
[![Coverage](https://img.shields.io/coveralls/fb55/htmlparser2.svg)](https://coveralls.io/r/fb55/htmlparser2)

The fast & forgiving HTML/XML parser.

@@ -13,7 +13,7 @@ _htmlparser2 is [the fastest HTML parser](#performance), and takes some shortcut

npm install htmlparser2

A live demo of `htmlparser2` is available [here](https://astexplorer.net/#/2AmVrGuGVJ).
A live demo of `htmlparser2` is available [on AST Explorer](https://astexplorer.net/#/2AmVrGuGVJ).

## Ecosystem

@@ -31,8 +31,9 @@ A live demo of `htmlparser2` is available [here](https://astexplorer.net/#/2AmVr
`htmlparser2` itself provides a callback interface that allows consumption of documents with minimal allocations.
For a more ergonomic experience, read [Getting a DOM](#getting-a-dom) below.

```javascript
const htmlparser2 = require("htmlparser2");
```js
import * as htmlparser2 from "htmlparser2";

const parser = new htmlparser2.Parser({
onopentag(name, attributes) {
/*
@@ -50,7 +51,7 @@ const parser = new htmlparser2.Parser({
* Fires whenever a section of text was processed.
*
* Note that this can fire at any point within text and you might
* have to stich together multiple pieces.
* have to stitch together multiple pieces.
*/
console.log("-->", text);
},
@@ -68,7 +69,7 @@ const parser = new htmlparser2.Parser({
},
});
parser.write(
"Xyz <script type='text/javascript'>const foo = '<<bar>>';</ script>"
"Xyz <script type='text/javascript'>const foo = '<<bar>>';</script>"
);
parser.end();
```
@@ -90,8 +91,9 @@ Read more about the parser, its events and options in the [wiki](https://github.
While the `Parser` interface closely resembles Node.js streams, it's not a 100% match.
Use the `WritableStream` interface to process a streaming input:

```javascript
const { WritableStream } = require("htmlparser2/lib/WritableStream");
```js
import { WritableStream } from "htmlparser2/lib/WritableStream";

const parserStream = new WritableStream({
ontext(text) {
console.log("Streaming:", text);
@@ -107,7 +109,7 @@ htmlStream.pipe(parserStream).on("finish", () => console.log("done"));
The `DomHandler` produces a DOM (document object model) that can be manipulated using the [`DomUtils`](https://github.com/fb55/DomUtils) helper.

```js
const htmlparser2 = require("htmlparser2");
import * as htmlparser2 from "htmlparser2";

const dom = htmlparser2.parseDocument(htmlString);
```
@@ -149,12 +151,14 @@ html5 : 120.844 ms/file ± 153.944
## How does this module differ from [node-htmlparser](https://github.com/tautologistics/node-htmlparser)?

In 2011, this module started as a fork of the `htmlparser` module.
`htmlparser2` was rewritten multiple times and, while it maintains an API that's mostly compatible with `htmlparser` in most cases, the projects don't share any code anymore.
`htmlparser2` was rewritten multiple times and, while it maintains an API that's mostly compatible with `htmlparser`, the projects don't share any code anymore.

The parser now provides a callback interface inspired by [sax.js](https://github.com/isaacs/sax-js) (originally targeted at [readabilitySAX](https://github.com/fb55/readabilitysax)).
As a result, old handlers won't work anymore.

The `DefaultHandler` and the `RssHandler` were renamed to clarify their purpose (to `DomHandler` and `FeedHandler`). The old names are still available when requiring `htmlparser2`, your code should work as expected.
The `DefaultHandler` was renamed to clarify its purpose (to `DomHandler`). The old name is still available when requiring `htmlparser2` and your code should work as expected.

The `RssHandler` was replaced with a `getFeed` function that takes a `DomHandler` DOM and returns a feed object. There is a `parseFeed` helper function that can be used to parse a feed from a string.

## Security contact information

@@ -163,6 +167,6 @@ Tidelift will coordinate the fix and disclosure.

## `htmlparser2` for enterprise

Available as part of the Tidelift Subscription
Available as part of the Tidelift Subscription.

The maintainers of `htmlparser2` and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source dependencies you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact dependencies you use. [Learn more.](https://tidelift.com/subscription/pkg/npm-htmlparser2?utm_source=npm-htmlparser2&utm_medium=referral&utm_campaign=enterprise&utm_term=repo)
Loading