Tags: iipc/jwarc
Tags
Release 0.31.1 Bugs fixed * Fixed URIs.parseLeniently() returning a different value to new URI() if the path was empty or the input contained percent encoded characters #90 #91 * Replaced some internal usages of record.targetURI() with record.target() to reduce the chance of runtime exceptions and preserve the exact original value
Release 0.31.0 New features * Added optional support for brotli content encoding #88 (Sabastian Nagel) * Added HttpMessage.bodyDecoded() #88 (Sabastian Nagel) * WarcTool: Added `dedupe` subcommand * DedupeTool: Added --verbose option and silenced default logging Bugs fixed * GunzipChannel: Fixed incorrect record length calculation when gzip footer aligns with the end of the buffer * ValidateTool: Fixed digest validation #87 (Sabastian Nagel) * DedupeTool: Used matchType=exact to properly handle CDX queries for URLs ending with `*` * DedupeTool: Fixed record copying when transferTo copies fewer bytes than requested * DedupeTool: Prevented appending of an empty gzip member when no records were deduplicated * DedupeTool: Fixed exception when input files are in the current working directory
Release 0.30.0 New features * WarcReader and WarcParser gained a lenient parsing mode which: - permits ASCII control characters in header field names and values - allows lines to end with LF instead of CRLF - permits multi-digit WARC minor versions like "0.18"
Release 0.28.6 Bugs fixed * Improved compatibility with ARC variants (version-block length off by one, v2 version-block, spurious linefeeds) #82 * WarcParser: Context in parse error messages was incorrectly using the parser (file) position instead of buffer position
Release 0.28.4 Bugs fixed * CDX formatting now percent encodes spaces, newlines and null characters in all string fields. This is non-standard but at least prevents us outputting invalid CDX lines. * CdxRequestEncoder now handles requests with an invalid content-type header
Release 0.28.1 Bugs fixed: * Fixed output truncation with the CDX CLI tool due to OutputStreamWriter buffer not being flushed or closed before exit * CdxWriter.process(files, useAbsolutePaths) ignored the useAbsolutePaths=false option case was always outputting absolute path * CdxRequestEncoder: Improved pywb compatiblity for non-ASCII characters in url encoded request bodies * CdxRequestEncoder: Fixed URLDecoder exception for large request bodies or those including invalid percent encoding * WarcWriter.fetch: Fixed bug where maxTime limit accidentally used the value of maxLength option instead
PreviousNext