Skip to content

Commit

Permalink
Merge pull request #2217 from sparklemotion/2204-merge-nokogumbo
Browse files Browse the repository at this point in the history
merge nokogumbo history

---

**What problem is this PR intended to solve?**

This is one step of many to merge Nokogumbo into Nokogiri (see [Epic: merge Nokogumbo into Nokogiri · Issue #2204 · sparklemotion/nokogiri](#2204)).

- Commit history for Nokogumbo is preserved in the Nokogiri repository
- Nokogumbo contributors are added to the Nokogiri gemspec, README, and copyright declarations
- All nokogumbo files should mention they are originally licensed under Apache 2.0 (an interpretation of APL2.0 clause 4.c) and mention that they have been changed (clause 4.b)
  • Loading branch information
flavorjones authored Apr 8, 2021
2 parents d244fb8 + c1a3d67 commit 8d96a4a
Show file tree
Hide file tree
Showing 86 changed files with 50,674 additions and 1 deletion.
2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License

Copyright 2008 -- 2021 by Mike Dalessio, Aaron Patterson, Yoko Harada, Akinori MUSHA, John Shahid, Karol Bucek, Lars Kanis, Sergio Arbeo, Timothy Elliott, Nobuyoshi Nakada, Charles Nutter, Patrick Mahoney.
Copyright 2008 -- 2021 by Mike Dalessio, Aaron Patterson, Yoko Harada, Akinori MUSHA, John Shahid, Karol Bucek, Sam Ruby, Craig Barnes, Stephen Checkoway, Lars Kanis, Sergio Arbeo, Timothy Elliott, Nobuyoshi Nakada, Charles Nutter, Patrick Mahoney.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,9 @@ Some additional libraries may be distributed with your version of Nokogiri. Plea
- Akinori MUSHA
- John Shahid
- Karol Bucek
- Sam Ruby
- Craig Barnes
- Stephen Checkoway
- Lars Kanis
- Sergio Arbeo
- Timothy Elliott
Expand Down
3 changes: 3 additions & 0 deletions nokogiri.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ Gem::Specification.new do |spec|
"Akinori MUSHA",
"John Shahid",
"Karol Bucek",
"Sam Ruby",
"Craig Barnes",
"Stephen Checkoway",
"Lars Kanis",
"Sergio Arbeo",
"Timothy Elliott",
Expand Down
86 changes: 86 additions & 0 deletions nokogumbo-import/.github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
name: CI Test

on:
schedule:
- cron: '0 0 * * 5'
push:
branches:
- master
pull_request:
types: [opened, synchronize]
branches:
- '*'

jobs:
test:
strategy:
fail-fast: false
matrix:
os: [ubuntu, macos, windows]
ruby: [2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0]
system_libraries: [true, false]
exclude:
- {os: macos, ruby: 2.1}
- {os: macos, ruby: 2.2}
- {os: macos, ruby: 2.3}
- {os: windows, ruby: 2.1}
- {os: windows, ruby: 2.2}
- {os: windows, ruby: 2.3}
runs-on: ${{ matrix.os }}-latest

steps:
- uses: actions/checkout@v2
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: ${{ matrix.ruby }}
bundler-cache: true

- name: install html5lib tests
run: git clone --depth 1 --branch all-error-fixes --single-branch https://github.com/stevecheckoway/html5lib-tests.git test/html5lib-tests

- name: Run tests
env:
USE_SYSTEM_LIBRARIES: ${{ matrix.system_libraries }}
shell: bash
run: ./scripts/ci-test.sh

package:
strategy:
fail-fast: false
matrix:
os: [ubuntu, macos]
runs-on: ${{ matrix.os }}-latest

steps:
- uses: actions/checkout@v2
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: 2.7
bundler-cache: true

- name: Install ragel
if: matrix.os == 'ubuntu'
run: sudo apt-get install -y ragel

- name: Install ragel
if: matrix.os == 'macos'
run: brew install ragel

- name: Test Gumbo and gem packaging
shell: bash
run: ./scripts/ci-package-test.sh

gentoo:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Docker pull
run: docker pull stevecheckoway/gentoo-ruby

- name: Test Gentoo Linux
shell: bash
run: ./scripts/gentoo-test.sh
12 changes: 12 additions & 0 deletions nokogumbo-import/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Gemfile.lock
ext/nokogumbo/*
!ext/nokogumbo/extconf.rb
!ext/nokogumbo/nokogumbo.c
/lib/nokogumbo/nokogumbo.bundle
/lib/nokogumbo/nokogumbo.so
/lib/nokogumbo/nokogumbo.dll
/pkg
/tmp
/gumbo-parser/googletest
/gumbo-parser/build
/test/html5lib-tests
127 changes: 127 additions & 0 deletions nokogumbo-import/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Changelog

All notable changes to Nokogumbo will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]
### Added
### Changed
### Deprecated
### Removed
### Fixed
### Security

## [2.0.5] - 2021-03-19
### Fixed
- Support Mageia distros when libxml2/libxslt system libraries are install. #165 (Thank you,
@pterjan!)

### Added
- Forward-looking support for a version of Nokogiri that will provide HTML5 parsing. #171

### Improved
- Update extconf.rb to use Nokogiri v1.11's CPPFLAGS for more reliable installation. #163


## [2.0.4] - 2020-11-27
### Fixed
- Fixed a bug where `Nokogiri::HTML5.fragment(nil)` would raise an error. Now
it returns an empty `DocumentFragment` like it did in v2.0.2.
- Fixed assertion failure when a tag immediately followed the UTF-8 BOM.


## [2.0.3] - 2020-11-21
### Added
- Limit enforced on number of attributes per element, defaulting to 400 and
configurable with the `:max_attributes` argument.
### Fixed
- Ignore UTF-8 byte order mark at the beginning of the input.
- Fix content sniffing for Unicode strings.
- Fixed crash where Ruby objects constructed in C can be garbage collected.

## [2.0.2] - 2019-11-19
### Added
- Support Ruby 2.6
### Fixed
- Fix assertion failures with nonstandard HTML tags.
- Fix the handling of mis-nested formatting tags (the adoption agency
algorithm).
- Fix crash with zero-length HTML tags.
### Security
- Prevent 1-byte buffer over read when constructing an error message about an
unexpected EOF.

## [2.0.1] - 2018-11-11
### Fixed
- Fix line numbers on elements from `#line`.

## [2.0.0] - 2018-10-04
### Added
- Experimental support for errors (it was supported in 1.5.0 but
undocumented).
- Added proper HTML5 serialization.
- Added option `:max_errors` to control the maximum number of errors reported
by `#errors`.
- Added option `:max_tree_depth` to control the maximum parse tree depth.
- Line number support via `Nokogiri::XML::Node#line` as long as Nokogumbo has
been compiled with libxml2 support.

### Changed
- Integrated [Gumbo parser](https://github.com/google/gumbo-parser) into
Nokogumbo. A system version will not be used.
- The undocumented (but publicly mentioned) `:max_parse_errors` renamed to `:max_errors`;
`:max_parse_errors` is deprecated and will go away
- The various `#parse` and `#fragment` (and `Nokogiri.HTML5`) methods return
`Nokogiri::HTML5::Document` and `Nokogiri::HTML5::DocumentFragment` classes
rather than `Nokogiri::HTML::Document` and
`Nokogiri::HTML::DocumentFragment`.
- Changed the top-level API to more closely match Nokogiri's while maintaining
backwards compatibility. The new APIs are
* `Nokogiri::HTML5(html, url = nil, encoding = nil, **options, &block)`
* `Nokogiri::HTML5.parse(html, url = nil, encoding = nil, **options, &block)`
* `Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, **options, &block)`
* `Nokogiri::HTML5.fragment(html, encoding = nil, **options)`
* `Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **options)`
* `Nokogiri::HTML5::DocumentFragment.new(document, html = nil, ctx = nil)`
* `Nokogiri::HTML5::Document#fragment(html = nil)`
* `Nokogiri::XML::Node#fragment(html = nil)`
In all cases, `html` can be a string or an `IO` object (something that
responds to `#read`). The `url` parameter is entirely for error reporting,
as in Nokogiri. The `encoding` parameter only signals what encoding `html`
should have on input; the output `Document` or `DocumentFragment` will be in
UTF-8. Currently, the only options supported are `:max_errors` which controls
the maximum number of reported by `#errors`.
- Minimum supported version of Ruby changed to 2.1.
- Minimum supported version of Nokogiri changed to 1.8.0.
- `Nokogiri::HTML5::DocumentFragment#errors` returns errors for the document
fragment itself, not the underlying document.
- The five XML namespaces described in the HTML spec, MathML, SVG, XLink, XML,
and XMLNS, are now supported. Thus `<svg>` will create an `svg` element in
the SVG namespace and `<math>` will create a `math` element in the MathML
namespace. An attribute `xml:lang=en`, for example, will create a `lang`
attribute in the XML namespace, **but only in foreign elements (i.e., those
in the SVG or MathML namespaces)**. On HTML elements, this creates an
attribute with the name `xml:lang`. This changes the `#xpath` and related
APIs.
- Calling `#to_xml` on a `Nokogiri::HTML5::Document` will produce XML output
rather than HTML.

### Deprecated
- `:max_parse_errors`; use `:max_errors`

### Fixed
- Fixed documents failing to serialize (via `to_html`) if they contain certain
`meta` elements that set the `charset`.
- Documents are now properly marked as UTF-8 after parsing.
- Fixed `Nokogiri::HTML5.fragment` reporting an error due to a missing
`<!DOCTYPE html>`.
- Fixed crash when input contains U+0000 NULL bytes and error reporting is
enabled.

### Security
- The most recent, released version of Gumbo has a [potential security
issue](https://github.com/google/gumbo-parser/pull/375) that could result in
a cross-site scripting vulnerability. This has been fixed by integrating
Gumbo into Nokogumbo.
16 changes: 16 additions & 0 deletions nokogumbo-import/Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
source 'https://rubygems.org'

# Nokogiri depends on pkg-config when built with system libraries but it
# doesn't declare this dependency. Unfortunately, bundler provides no way to
# declare additional dependencies and it will install dependencies in
# alphabetical order so it tries to install Nokogiri before pkg-config and
# this fails.
gem 'fix-dep-order', :path => 'scripts'
gem 'nokogiri', '>= 1.8'

group :development, :test do
gem 'minitest'
gem 'rake'
gem 'rake-compiler'
end

Loading

0 comments on commit 8d96a4a

Please sign in to comment.