Tags · iipc/jwarc

v0.31.1

Release 0.31.1

Bugs fixed

* Fixed URIs.parseLeniently() returning a different value to new URI() if the path was empty or the input contained percent encoded characters #90 #91
* Replaced some internal usages of record.targetURI() with record.target() to reduce the chance of runtime exceptions and preserve the exact original value

Nov 20, 2024
f207143
zip
tar.gz
Notes
Downloads

v0.31.0

Release 0.31.0

New features

* Added optional support for brotli content encoding #88  (Sabastian Nagel)
* Added HttpMessage.bodyDecoded() #88  (Sabastian Nagel)
* WarcTool: Added `dedupe` subcommand
* DedupeTool: Added --verbose option and silenced default logging

Bugs fixed

* GunzipChannel: Fixed incorrect record length calculation when gzip footer aligns with the end of the buffer
* ValidateTool: Fixed digest validation #87 (Sabastian Nagel)
* DedupeTool: Used matchType=exact to properly handle CDX queries for URLs ending with `*`
* DedupeTool: Fixed record copying when transferTo copies fewer bytes than requested
* DedupeTool: Prevented appending of an empty gzip member when no records were deduplicated
* DedupeTool: Fixed exception when input files are in the current working directory

Nov 14, 2024
14b80be
zip
tar.gz
Notes
Downloads

v0.30.0

Release 0.30.0

New features

* WarcReader and WarcParser gained a lenient parsing mode which:
   - permits ASCII control characters in header field names and values
   - allows lines to end with LF instead of CRLF
   - permits multi-digit WARC minor versions like "0.18"

Jun 28, 2024
20d2971
zip
tar.gz
Notes
Downloads

v0.29.0

Release 0.29.0

New features

* Added MediaType.parseLeniently() and .isValid()

Changes

* Message.contentType() and other methods that internally call it now use the lenient MediaType parser instead of throwing IllegalArgumentException #83

Feb 14, 2024
e4ff0fe
zip
tar.gz
Notes
Downloads

v0.28.6

Release 0.28.6

Bugs fixed

* Improved compatibility with ARC variants (version-block length off by one, v2 version-block, spurious linefeeds) #82
* WarcParser: Context in parse error messages was incorrectly using the parser (file) position instead of buffer position

Feb 9, 2024
29afed7
zip
tar.gz
Notes
Downloads

v0.28.5

Release 0.28.5

Bugs fixed

* Fixed ClosedChannelException when reading a WarcRevisit body
  after closing a previous one due to reuse of empty MessageBody. #80

Dec 13, 2023
e50593f
zip
tar.gz
Notes
Downloads

v0.28.4

Release 0.28.4

Bugs fixed

* CDX formatting now percent encodes spaces, newlines and null characters in all string fields. This is non-standard but at least prevents us outputting invalid CDX lines.
* CdxRequestEncoder now handles requests with an invalid content-type header

Dec 6, 2023
a6846ae
zip
tar.gz
Notes

v0.28.3

Release 0.28.3

Bugs fixed:

* Fixed multithreading issue on GzipChannel write header #69

Sep 28, 2023
0c2503f
zip
tar.gz
Notes

v0.28.2

Release 0.28.2

Changes:

* HttpRequest+HttpResponse in lenient mode recover when parsing the Content-Length header throws NumberFormatException
* WarcParser now tries to leniently parse ARC records containing corrupt dates

Sep 15, 2023
f1f8470
zip
tar.gz
Notes

v0.28.1

Release 0.28.1

Bugs fixed:

* Fixed output truncation with the CDX CLI tool due to
  OutputStreamWriter buffer not being flushed or closed before exit
* CdxWriter.process(files, useAbsolutePaths) ignored the
  useAbsolutePaths=false option case was always outputting absolute path
* CdxRequestEncoder: Improved pywb compatiblity for non-ASCII characters
  in url encoded request bodies
* CdxRequestEncoder: Fixed URLDecoder exception for large request bodies
  or those including invalid percent encoding
* WarcWriter.fetch: Fixed bug where maxTime limit accidentally used
  the value of maxLength option instead

Aug 2, 2023
7088551
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.31.1

v0.31.0

v0.30.0

v0.29.0

v0.28.6

v0.28.5

v0.28.4

v0.28.3

v0.28.2

v0.28.1

Tags: iipc/jwarc