Common EPUB2 data parser for Ridibooks services written in ES6
- Detailed parsing for EPUB2
- Supports package validation, decompression and style extraction with various parsing options
- Extract files within EPUB with various reading options
- Add encryption and decryption function
- Add
readOptions.spine.truncate
andreadOption.spine.truncateMaxLength
options - Add
readOptions.spine.minify
andreadOptions.css.minify
options - Support for EPUB3
- Support for CLI
- Support for other OCF spec (manifest.xml, metadata.xml, signatures.xml, encryption.xml, etc)
npm install @ridi/epub-parser
Basic:
import EpubParser from '@ridi/epub-parser';
const parser = new EpubParser('./foo/bar.epub');
parser.parse().then((book) => {
const results = parser.read(book.spines);
...
});
Various inputs:
import fs from 'fs';
import EpubParser from '@ridi/epub-parser';
// Unzipped path of EPUB file.
new EpubParser('./foo/bar');
// EPUB file buffer.
const buffer = fs.readFileSync('./foo/bar.epub');
new EpubParser(buffer);
Book to Object, Object to Book:
import EpubParser from '@ridi/epub-parser';
const parser = new EpubParser('./foo/bar.epub');
parser.parse().then((book) => {
const rawBook = book.toRaw();
const newBook = new Book(rawBook);
...
});
Returns Promise<Book>
with:
- Book: Instance with metadata, spine list, table of contents, etc.
Or throw exception.
parseOptions: Object
Returns string
or Object
or string[]
or Object[]
with:
-
string
(readOptions.spine.extractBody isfalse
) -
Object
(readOptions.spine.extractAdapter isundefined
):body
: Same reuslt asdocument.body.innerHTML
attrs
: Attributes in body tag.
-
Object
(readOptions.spine.extractAdapter is defaultExtractAdapter):content
:extractBody
output transformed by adapter.
Or throw exception.
target(s): Item
, Item[]
(see: Item Types)
readOptions: Object
- titles: string[]
- creators: Author[]
- subjects: string[]
- description: string?
- publisher: string?
- contributors: Author[]
- dates: DateTime[]
- type: string?
- format: string?
- identifiers: Identifier[]
- source: string?
- language: string?
- relation: string?
- rights: string?
- epubVersion: number?
- metas: Meta[]
- items: Item[]
- ncx: NcxItem?
- spines: SpintItem[]
- fonts: FontItem[]
- cover: ImageItem?
- images: ImageItem[]
- styles: CssItem[]
- guide: Guide[]
- deadItems: DeadItem[]
- name: string?
- role: string (Default: Author.Roles.UNDEFINED)
- value: strung?
- event: string (Default: DateTime.Events.UNDEFINED)
- value: string?
- scheme: string? (Default: Identifier.Schemes.UNDEFINED)
- name: string?
- content: string?
- title: string?
- type: string (Default: Guide.Types.UNDEFINED)
- href: string?
- item: Item?
- id: id?
- href: string?
- mediaType: string?
- size: number?
- isFileExists: boolean (size !== undefined)
- defaultEncoding: string?
- navPoints: NavPoint[]
- spineIndex: number (Default: -1)
- isLinear: boolean (Default: true)
- styles: CssItem[]?
- namespace: string?
InlineCssItem (extend CssItem)
- text: string?
- isCover: boolean (Default: false)
- raw: Object
- id: string?
- label: string?
- src: string?
- anchor: string?
- depth: number (Default: 0)
- children: NavPoint[]
- spine: SpineItem?
- validatePackage
- validateXml
- allowNcxFileMissing
- unzipPath
- createIntermediateDirectories
- removePreviousFile
- ignoreLinear
- useStyleNamespace
- styleNamespacePrefix
If true, validation package specifications in IDPF listed below.
- Zip header should not corrupt.
mimetype
file must be first file in archive.mimetype
file should not compressed.mimetype
file should only contain stringapplication/epub+zip
.- Should not use extra field feature of ZIP format for mimetype file.
Default: false
If true, stop parsing when XML parsing errors occur.
Default: false
If false, stop parsing when NCX file not exists.
Default: true
If specified, uncompress to that path.
Only if input is buffer or file path of EPUB file.
Default: undefined
If true, creates intermediate directories for unzipPath.
Default: true
If true, removes a previous file from unzipPath.
Default: true
If true, ignore spineIndex
difference caused by isLinear
property of SpineItem.
// e.g. If left is false, right is true.
[{ spineIndex: 0, isLinear: true, ... }, [{ spineIndex: 0, isLinear: true, ... },
{ spineIndex: 1, isLinear: true, ... }, { spineIndex: 1, isLinear: true, ... },
{ spineIndex: -1, isLinear: false, ... }, { spineIndex: 2, isLinear: false, ... },
{ spineIndex: 2, isLinear: true, ... }] { spineIndex: 3, isLinear: true, ... }]
Default: true
If true, One namespace is given per CSS file or inline style, and styles used for spine is described.
Otherwise it CssItem.namespace
, SpineItem.styles
is undefined
.
In any list, InlineCssItem is always positioned after CssItem. (Book.styles
, Book.items
, SpineItem.styles
, ...)
Default: false
Prepend given string to namespace for identification.
Default: 'ridi_style'
- encoding
- ignoreEntryNotFoundError
- basePath
- spine.extractBody
- spine.extractAdapter
- css.removeAtrules
- css.removeTags
- css.removeIds
- css.removeClasses
If specified then returns a string. Otherwise it returns a buffer.
If specify default
, use Item.defaultEncoding
.
Item.defaultEncoding // undefined (=buffer)
SpineItem.defaultEncoding // 'utf8'
CssItem.defaultEncoding // 'utf8'
InlineCssItem.defaultEncoding // 'utf8'
ImageItem.defaultEncoding // undefined (=buffer)
Default: 'default'
If false, throw Errors.ITEM_NOT_FOUND.
Default: true
If specified, change base path of paths used by spine and css.
HTML: SpineItem
...
<!-- Before -->
<div>
<img src="../Images/cover.jpg">
</div>
<!-- After -->
<div>
<img src="{basePath}/OEBPS/Images/cover.jpg">
</div>
...
CSS: CssItem, InlineCssItem
/* Before */
@font-face {
font-family: NotoSansRegular;
src: url("../Fonts/NotoSans-Regular.ttf");
}
/* After */
@font-face {
font-family: NotoSansRegular;
src: url("{basePath}/OEBPS/Fonts/NotoSans-Regular.ttf");
}
Default: undefined
If true, extract body. Otherwise it returns a full string.
true:
{
body: '\n <p>Extract style</p>\n <img src=\"../Images/api-map.jpg\"/>\n',
attrs: [
{
key: 'style',
value: 'background-color: #000000;',
},
{ // Only added if useStyleNamespace is true.
key: 'class',
value: '.ridi_style2, .ridi_style3, .ridi_style4, .ridi_style0, .ridi_style1',
},
],
}
false:
'<!doctype><html>\n<head>\n</head>\n<body style="background-color: #000000;">\n <p>Extract style</p>\n <img src=\"../Images/api-map.jpg\"/>\n</body>\n</html>'
Default: false
If specified, transforms output of extractBody.
Define adapter:
const extractAdapter = (body, attrs) => {
let string = '';
attrs.forEach((attr) => {
string += ` ${attr.key}=\"${attr.value}\"`;
});
return {
content: `<article${string}>${body}</article>`,
};
};
Result:
{
content: '<article style=\"background-color: #000000;\" class=\".ridi_style2, .ridi_style3, .ridi_style4, .ridi_style0, .ridi_style1\">\n <p>Extract style</p>\n <img src=\"../Images/api-map.jpg\"/>\n</article>',
}
Default: defaultExtractAdapter
Remove at-rules.
Default: ['charset', 'import', 'keyframes', 'media', 'namespace', 'supports']
Remove selector that point to specified tags.
Default: []
Remove selector that point to specified ids.
Default: []
Remove selector that point to specified classes.
Default: []