Skip to content

Tags: iipc/jwarc

Tags

v0.31.1

Toggle v0.31.1's commit message
Release 0.31.1

Bugs fixed

* Fixed URIs.parseLeniently() returning a different value to new URI() if the path was empty or the input contained percent encoded characters #90 #91
* Replaced some internal usages of record.targetURI() with record.target() to reduce the chance of runtime exceptions and preserve the exact original value

v0.31.0

Toggle v0.31.0's commit message
Release 0.31.0

New features

* Added optional support for brotli content encoding #88  (Sabastian Nagel)
* Added HttpMessage.bodyDecoded() #88  (Sabastian Nagel)
* WarcTool: Added `dedupe` subcommand
* DedupeTool: Added --verbose option and silenced default logging

Bugs fixed

* GunzipChannel: Fixed incorrect record length calculation when gzip footer aligns with the end of the buffer
* ValidateTool: Fixed digest validation #87 (Sabastian Nagel)
* DedupeTool: Used matchType=exact to properly handle CDX queries for URLs ending with `*`
* DedupeTool: Fixed record copying when transferTo copies fewer bytes than requested
* DedupeTool: Prevented appending of an empty gzip member when no records were deduplicated
* DedupeTool: Fixed exception when input files are in the current working directory

v0.30.0

Toggle v0.30.0's commit message
Release 0.30.0

New features

* WarcReader and WarcParser gained a lenient parsing mode which:
   - permits ASCII control characters in header field names and values
   - allows lines to end with LF instead of CRLF
   - permits multi-digit WARC minor versions like "0.18"

v0.29.0

Toggle v0.29.0's commit message
Release 0.29.0

New features

* Added MediaType.parseLeniently() and .isValid()

Changes

* Message.contentType() and other methods that internally call it now use the lenient MediaType parser instead of throwing IllegalArgumentException #83

v0.28.6

Toggle v0.28.6's commit message
Release 0.28.6

Bugs fixed

* Improved compatibility with ARC variants (version-block length off by one, v2 version-block, spurious linefeeds) #82
* WarcParser: Context in parse error messages was incorrectly using the parser (file) position instead of buffer position

v0.28.5

Toggle v0.28.5's commit message
Release 0.28.5

Bugs fixed

* Fixed ClosedChannelException when reading a WarcRevisit body
  after closing a previous one due to reuse of empty MessageBody. #80

v0.28.4

Toggle v0.28.4's commit message
Release 0.28.4

Bugs fixed

* CDX formatting now percent encodes spaces, newlines and null characters in all string fields. This is non-standard but at least prevents us outputting invalid CDX lines.
* CdxRequestEncoder now handles requests with an invalid content-type header

v0.28.3

Toggle v0.28.3's commit message
Release 0.28.3

Bugs fixed:

* Fixed multithreading issue on GzipChannel write header #69

v0.28.2

Toggle v0.28.2's commit message
Release 0.28.2

Changes:

* HttpRequest+HttpResponse in lenient mode recover when parsing the Content-Length header throws NumberFormatException
* WarcParser now tries to leniently parse ARC records containing corrupt dates

v0.28.1

Toggle v0.28.1's commit message
Release 0.28.1

Bugs fixed:

* Fixed output truncation with the CDX CLI tool due to
  OutputStreamWriter buffer not being flushed or closed before exit
* CdxWriter.process(files, useAbsolutePaths) ignored the
  useAbsolutePaths=false option case was always outputting absolute path
* CdxRequestEncoder: Improved pywb compatiblity for non-ASCII characters
  in url encoded request bodies
* CdxRequestEncoder: Fixed URLDecoder exception for large request bodies
  or those including invalid percent encoding
* WarcWriter.fetch: Fixed bug where maxTime limit accidentally used
  the value of maxLength option instead