-
-
Notifications
You must be signed in to change notification settings - Fork 904
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2217 from sparklemotion/2204-merge-nokogumbo
merge nokogumbo history --- **What problem is this PR intended to solve?** This is one step of many to merge Nokogumbo into Nokogiri (see [Epic: merge Nokogumbo into Nokogiri · Issue #2204 · sparklemotion/nokogiri](#2204)). - Commit history for Nokogumbo is preserved in the Nokogiri repository - Nokogumbo contributors are added to the Nokogiri gemspec, README, and copyright declarations - All nokogumbo files should mention they are originally licensed under Apache 2.0 (an interpretation of APL2.0 clause 4.c) and mention that they have been changed (clause 4.b)
- Loading branch information
Showing
86 changed files
with
50,674 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
name: CI Test | ||
|
||
on: | ||
schedule: | ||
- cron: '0 0 * * 5' | ||
push: | ||
branches: | ||
- master | ||
pull_request: | ||
types: [opened, synchronize] | ||
branches: | ||
- '*' | ||
|
||
jobs: | ||
test: | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ubuntu, macos, windows] | ||
ruby: [2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0] | ||
system_libraries: [true, false] | ||
exclude: | ||
- {os: macos, ruby: 2.1} | ||
- {os: macos, ruby: 2.2} | ||
- {os: macos, ruby: 2.3} | ||
- {os: windows, ruby: 2.1} | ||
- {os: windows, ruby: 2.2} | ||
- {os: windows, ruby: 2.3} | ||
runs-on: ${{ matrix.os }}-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Set up Ruby | ||
uses: ruby/setup-ruby@v1 | ||
with: | ||
ruby-version: ${{ matrix.ruby }} | ||
bundler-cache: true | ||
|
||
- name: install html5lib tests | ||
run: git clone --depth 1 --branch all-error-fixes --single-branch https://github.com/stevecheckoway/html5lib-tests.git test/html5lib-tests | ||
|
||
- name: Run tests | ||
env: | ||
USE_SYSTEM_LIBRARIES: ${{ matrix.system_libraries }} | ||
shell: bash | ||
run: ./scripts/ci-test.sh | ||
|
||
package: | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ubuntu, macos] | ||
runs-on: ${{ matrix.os }}-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Set up Ruby | ||
uses: ruby/setup-ruby@v1 | ||
with: | ||
ruby-version: 2.7 | ||
bundler-cache: true | ||
|
||
- name: Install ragel | ||
if: matrix.os == 'ubuntu' | ||
run: sudo apt-get install -y ragel | ||
|
||
- name: Install ragel | ||
if: matrix.os == 'macos' | ||
run: brew install ragel | ||
|
||
- name: Test Gumbo and gem packaging | ||
shell: bash | ||
run: ./scripts/ci-package-test.sh | ||
|
||
gentoo: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Docker pull | ||
run: docker pull stevecheckoway/gentoo-ruby | ||
|
||
- name: Test Gentoo Linux | ||
shell: bash | ||
run: ./scripts/gentoo-test.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Gemfile.lock | ||
ext/nokogumbo/* | ||
!ext/nokogumbo/extconf.rb | ||
!ext/nokogumbo/nokogumbo.c | ||
/lib/nokogumbo/nokogumbo.bundle | ||
/lib/nokogumbo/nokogumbo.so | ||
/lib/nokogumbo/nokogumbo.dll | ||
/pkg | ||
/tmp | ||
/gumbo-parser/googletest | ||
/gumbo-parser/build | ||
/test/html5lib-tests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Changelog | ||
|
||
All notable changes to Nokogumbo will be documented in this file. | ||
|
||
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) | ||
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). | ||
|
||
## [Unreleased] | ||
### Added | ||
### Changed | ||
### Deprecated | ||
### Removed | ||
### Fixed | ||
### Security | ||
|
||
## [2.0.5] - 2021-03-19 | ||
### Fixed | ||
- Support Mageia distros when libxml2/libxslt system libraries are install. #165 (Thank you, | ||
@pterjan!) | ||
|
||
### Added | ||
- Forward-looking support for a version of Nokogiri that will provide HTML5 parsing. #171 | ||
|
||
### Improved | ||
- Update extconf.rb to use Nokogiri v1.11's CPPFLAGS for more reliable installation. #163 | ||
|
||
|
||
## [2.0.4] - 2020-11-27 | ||
### Fixed | ||
- Fixed a bug where `Nokogiri::HTML5.fragment(nil)` would raise an error. Now | ||
it returns an empty `DocumentFragment` like it did in v2.0.2. | ||
- Fixed assertion failure when a tag immediately followed the UTF-8 BOM. | ||
|
||
|
||
## [2.0.3] - 2020-11-21 | ||
### Added | ||
- Limit enforced on number of attributes per element, defaulting to 400 and | ||
configurable with the `:max_attributes` argument. | ||
### Fixed | ||
- Ignore UTF-8 byte order mark at the beginning of the input. | ||
- Fix content sniffing for Unicode strings. | ||
- Fixed crash where Ruby objects constructed in C can be garbage collected. | ||
|
||
## [2.0.2] - 2019-11-19 | ||
### Added | ||
- Support Ruby 2.6 | ||
### Fixed | ||
- Fix assertion failures with nonstandard HTML tags. | ||
- Fix the handling of mis-nested formatting tags (the adoption agency | ||
algorithm). | ||
- Fix crash with zero-length HTML tags. | ||
### Security | ||
- Prevent 1-byte buffer over read when constructing an error message about an | ||
unexpected EOF. | ||
|
||
## [2.0.1] - 2018-11-11 | ||
### Fixed | ||
- Fix line numbers on elements from `#line`. | ||
|
||
## [2.0.0] - 2018-10-04 | ||
### Added | ||
- Experimental support for errors (it was supported in 1.5.0 but | ||
undocumented). | ||
- Added proper HTML5 serialization. | ||
- Added option `:max_errors` to control the maximum number of errors reported | ||
by `#errors`. | ||
- Added option `:max_tree_depth` to control the maximum parse tree depth. | ||
- Line number support via `Nokogiri::XML::Node#line` as long as Nokogumbo has | ||
been compiled with libxml2 support. | ||
|
||
### Changed | ||
- Integrated [Gumbo parser](https://github.com/google/gumbo-parser) into | ||
Nokogumbo. A system version will not be used. | ||
- The undocumented (but publicly mentioned) `:max_parse_errors` renamed to `:max_errors`; | ||
`:max_parse_errors` is deprecated and will go away | ||
- The various `#parse` and `#fragment` (and `Nokogiri.HTML5`) methods return | ||
`Nokogiri::HTML5::Document` and `Nokogiri::HTML5::DocumentFragment` classes | ||
rather than `Nokogiri::HTML::Document` and | ||
`Nokogiri::HTML::DocumentFragment`. | ||
- Changed the top-level API to more closely match Nokogiri's while maintaining | ||
backwards compatibility. The new APIs are | ||
* `Nokogiri::HTML5(html, url = nil, encoding = nil, **options, &block)` | ||
* `Nokogiri::HTML5.parse(html, url = nil, encoding = nil, **options, &block)` | ||
* `Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, **options, &block)` | ||
* `Nokogiri::HTML5.fragment(html, encoding = nil, **options)` | ||
* `Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **options)` | ||
* `Nokogiri::HTML5::DocumentFragment.new(document, html = nil, ctx = nil)` | ||
* `Nokogiri::HTML5::Document#fragment(html = nil)` | ||
* `Nokogiri::XML::Node#fragment(html = nil)` | ||
In all cases, `html` can be a string or an `IO` object (something that | ||
responds to `#read`). The `url` parameter is entirely for error reporting, | ||
as in Nokogiri. The `encoding` parameter only signals what encoding `html` | ||
should have on input; the output `Document` or `DocumentFragment` will be in | ||
UTF-8. Currently, the only options supported are `:max_errors` which controls | ||
the maximum number of reported by `#errors`. | ||
- Minimum supported version of Ruby changed to 2.1. | ||
- Minimum supported version of Nokogiri changed to 1.8.0. | ||
- `Nokogiri::HTML5::DocumentFragment#errors` returns errors for the document | ||
fragment itself, not the underlying document. | ||
- The five XML namespaces described in the HTML spec, MathML, SVG, XLink, XML, | ||
and XMLNS, are now supported. Thus `<svg>` will create an `svg` element in | ||
the SVG namespace and `<math>` will create a `math` element in the MathML | ||
namespace. An attribute `xml:lang=en`, for example, will create a `lang` | ||
attribute in the XML namespace, **but only in foreign elements (i.e., those | ||
in the SVG or MathML namespaces)**. On HTML elements, this creates an | ||
attribute with the name `xml:lang`. This changes the `#xpath` and related | ||
APIs. | ||
- Calling `#to_xml` on a `Nokogiri::HTML5::Document` will produce XML output | ||
rather than HTML. | ||
|
||
### Deprecated | ||
- `:max_parse_errors`; use `:max_errors` | ||
|
||
### Fixed | ||
- Fixed documents failing to serialize (via `to_html`) if they contain certain | ||
`meta` elements that set the `charset`. | ||
- Documents are now properly marked as UTF-8 after parsing. | ||
- Fixed `Nokogiri::HTML5.fragment` reporting an error due to a missing | ||
`<!DOCTYPE html>`. | ||
- Fixed crash when input contains U+0000 NULL bytes and error reporting is | ||
enabled. | ||
|
||
### Security | ||
- The most recent, released version of Gumbo has a [potential security | ||
issue](https://github.com/google/gumbo-parser/pull/375) that could result in | ||
a cross-site scripting vulnerability. This has been fixed by integrating | ||
Gumbo into Nokogumbo. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
source 'https://rubygems.org' | ||
|
||
# Nokogiri depends on pkg-config when built with system libraries but it | ||
# doesn't declare this dependency. Unfortunately, bundler provides no way to | ||
# declare additional dependencies and it will install dependencies in | ||
# alphabetical order so it tries to install Nokogiri before pkg-config and | ||
# this fails. | ||
gem 'fix-dep-order', :path => 'scripts' | ||
gem 'nokogiri', '>= 1.8' | ||
|
||
group :development, :test do | ||
gem 'minitest' | ||
gem 'rake' | ||
gem 'rake-compiler' | ||
end | ||
|
Oops, something went wrong.