-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Extract CDDL definitions Needed for w3c/webref#1353. With this update, Reffy now looks for and extracts CDDL content defined in `<pre class="cddl">` block. The logic is vastly similar to the logic used for IDL. Shared code was factored out accordingly. Something specific about CDDL: on top of generating text extracts, the goal is also to create one extract per CDDL module that the spec defines. To associate a `<pre>` block with one or more CDDL module, the code looks for a possible `data-cddl-module` module, or for module names in the `class` attribute (prefixed by `cddl-` or suffixed by `-cddl`). The former isn't used by any spec but is the envisioned mechanism in Bikeshed to define the association, the latter is the convention currently used in the WebDriver BiDi specification. When a spec defines modules, CDDL defined in a `<pre>` block with no explicit module annotation is considered to be defined for all modules (not doing so would essentially mean the CDDL would not be defined for any module, which seems weird). When there is CDDL, the extraction produces: 1. an extract that contains all CDDL definitions: `cddl/[shortname].cddl` 2. one extract per CDDL module: `cddl/[shortname]-[modulename].cddl` (I'm going to assume that no one is ever going to define a module name that would make `[shortname]-[modulename]` collide with the shortname of another spec). Note: some specs that define CDDL do not flag the `<pre>` blocks in any way (Open Screen Protocol, WebAuthn). Extraction won't work for them for now. Also, there are a couple of places in the WebDriver BiDi spec that use a `<pre class="cddl">` block to *reference* a CDDL construct defined elsewhere. Extraction will happily include these references as well, leading to CDDL extracts that contain invalid CDDL. These need fixing in the specs. * Change name of "all" extract, allow CDDL defs for it When a spec defines CDDL modules, the union of all CDDL is now written to a file named `[shortname]-all.cddl` instead of simply `[shortname].cddl`. This is meant to make it slightly clearer that the union of all CDDL file is not necessarily the panacea. For example, it may not contain a useful first rule against which a CBOR data item that would match any of the modules may be validated. In other words, when the crawler produces a `[shortname].cddl` file, that means there's no module. If it doesn't, best is to check the module, with "all" being a reserved module name in the spec that gets interpreted to mean "any module". When a spec defines CDDL modules, it may also define CDDL rules that only appear in the "all" file by specifying `data-cddl-module="all"`. This is useful to define a useful first type in the "all" extract. * Integrate review feedback
- Loading branch information
Showing
9 changed files
with
433 additions
and
51 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
import getCodeElements from './get-code-elements.mjs'; | ||
import trimSpaces from './trim-spaces.mjs'; | ||
|
||
/** | ||
* Extract the list of CDDL definitions in the current spec. | ||
* | ||
* A spec may define more that one CDDL module. For example, the WebDriver BiDi | ||
* spec has CDDL definitions that apply to either of both the local end and the | ||
* remote end. The functions returns an array that lists all CDDL modules. | ||
* | ||
* Each CDDL module is represented as an object with the following keys whose | ||
* values are strings: | ||
* - shortname: the CDDL module shortname. Shortname is "" if the spec does not | ||
* define any module, and "all" for the dump of all CDDL definitions. | ||
* - label: A full name for the CDDL module, when defined. | ||
* - cddl: A dump of the CDDL definitions. | ||
* | ||
* If the spec defines more than one module, the first item in the array is the | ||
* "all" module that contains a dump of all CDDL definitions, regardless of the | ||
* module they are actually defined for (the assumption is that looking at the | ||
* union of all CDDL modules defined in a spec will always make sense, and that | ||
* a spec will never reuse the same rule name with a different definition for | ||
* different CDDL modules). | ||
* | ||
* @function | ||
* @public | ||
* @return {Array} A dump of the CDDL definitions per CDDL module, or an empty | ||
* array if the spec does not contain any CDDL. | ||
*/ | ||
export default function () { | ||
// Specs with CDDL are either recent enough that they all use the same | ||
// `<pre class="cddl">` convention, or they don't flag CDDL blocks in any | ||
// way, making it impossible to extract them. | ||
const cddlSelectors = ['pre.cddl:not(.exclude):not(.extract)']; | ||
const excludeSelectors = ['#cddl-index']; | ||
|
||
// Retrieve all elements that contains CDDL content | ||
const cddlEls = getCodeElements(cddlSelectors, { excludeSelectors }); | ||
|
||
// Start by assembling the list of modules | ||
const modules = {}; | ||
for (const el of cddlEls) { | ||
const elModules = getModules(el); | ||
for (const name of elModules) { | ||
// "all" does not create a module on its own, that's the name of | ||
// the CDDL module that contains all CDDL definitions. | ||
if (name !== 'all') { | ||
modules[name] = []; | ||
} | ||
} | ||
} | ||
|
||
// Assemble the CDDL per module | ||
const mergedCddl = []; | ||
for (const el of cddlEls) { | ||
const cddl = trimSpaces(el.textContent); | ||
if (!cddl) { | ||
continue; | ||
} | ||
// All CDDL appears in the "all" module. | ||
mergedCddl.push(cddl); | ||
let elModules = getModules(el); | ||
if (elModules.length === 0) { | ||
// No module means the CDDL is defined for all modules | ||
elModules = Object.keys(modules); | ||
} | ||
for (const name of elModules) { | ||
// CDDL defined for the "all" module is only defined for it | ||
if (name !== 'all') { | ||
if (!modules[name]) { | ||
modules[name] = []; | ||
} | ||
modules[name].push(cddl); | ||
} | ||
} | ||
} | ||
|
||
if (mergedCddl.length === 0) { | ||
return []; | ||
} | ||
|
||
const res = [{ | ||
name: Object.keys(modules).length > 0 ? 'all' : '', | ||
cddl: mergedCddl.join('\n\n') | ||
}]; | ||
for (const [name, cddl] of Object.entries(modules)) { | ||
res.push({ name, cddl: cddl.join('\n\n') }); | ||
} | ||
// Remove trailing spaces and use spaces throughout | ||
for (const cddlModule of res) { | ||
cddlModule.cddl = cddlModule.cddl | ||
.replace(/\s+$/gm, '\n') | ||
.replace(/\t/g, ' ') | ||
.trim(); | ||
} | ||
return res; | ||
} | ||
|
||
|
||
/** | ||
* Retrieve the list of CDDL module shortnames that the element references. | ||
* | ||
* This list of modules is either specified in a `data-cddl-module` attribute | ||
* or directly within the class attribute prefixed by `cddl-` or suffixed by | ||
* `-cddl`. | ||
*/ | ||
function getModules(el) { | ||
const moduleAttr = el.getAttribute('data-cddl-module'); | ||
if (moduleAttr) { | ||
return moduleAttr.split(',').map(str => str.trim()); | ||
} | ||
|
||
const list = []; | ||
const classes = el.classList.values() | ||
for (const name of classes) { | ||
const match = name.match(/^(.*)-cddl$|^cddl-(.*)$/); | ||
if (match) { | ||
const shortname = match[1] ?? match[2]; | ||
if (!list.includes(shortname)) { | ||
list.push(shortname); | ||
} | ||
} | ||
} | ||
return list; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import informativeSelector from './informative-selector.mjs'; | ||
import cloneAndClean from './clone-and-clean.mjs'; | ||
|
||
/** | ||
* Helper function that returns a set of code elements in document order based | ||
* on a given set of selectors, excluding elements that are within an index. | ||
* | ||
* The function excludes elements defined in informative sections. | ||
* | ||
* The code elements are cloned and cleaned before they are returned to strip | ||
* annotations and other asides. | ||
*/ | ||
export default function getCodeElements(codeSelectors, { excludeSelectors = [] }) { | ||
return [...document.querySelectorAll(codeSelectors.join(', '))] | ||
// Skip excluded and elements and those in informative content | ||
.filter(el => !el.closest(excludeSelectors.join(', '))) | ||
.filter(el => !el.closest(informativeSelector)) | ||
|
||
// Clone and clean the elements | ||
.map(cloneAndClean); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
/** | ||
* Helper function that trims individual lines in a code block, removing as | ||
* much space as possible from the beginning of the page while preserving | ||
* indentation. | ||
* | ||
* Typically useful for CDDL and IDL extracts | ||
* | ||
* Rules followed: | ||
* - Always trim the first line | ||
* - Remove whitespaces from the end of each line | ||
* - Replace lines that contain spaces with empty lines | ||
* - Drop same number of leading whitespaces from all other lines | ||
*/ | ||
export default function trimSpaces(code) { | ||
const lines = code.trim().split('\n'); | ||
const toRemove = lines | ||
.slice(1) | ||
.filter(line => line.search(/\S/) > -1) | ||
.reduce( | ||
(min, line) => Math.min(min, line.search(/\S/)), | ||
Number.MAX_VALUE); | ||
return lines | ||
.map(line => { | ||
let firstRealChar = line.search(/\S/); | ||
if (firstRealChar === -1) { | ||
return ''; | ||
} | ||
else if (firstRealChar === 0) { | ||
return line.replace(/\s+$/, ''); | ||
} | ||
else { | ||
return line.substring(toRemove).replace(/\s+$/, ''); | ||
} | ||
}) | ||
.join('\n'); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.