Skip to content

Commit

Permalink
convert from runtime sigs to storing them in an rbi/ directory
Browse files Browse the repository at this point in the history
Shipping the RBI file with the gem like this means downstream users of
pdf-reader who also use sorbet will know the types pdf-reader expects
for its public API.

Storing the sigs outside the source files also means there's no need to
add sorbet-runtime as a dependency of the gem, and downstream users of
pdf-reader who do not use sorbet will see no change.

I generated the RBI file with parlour:

    bundle exec parlour

I assume I'll have to regenerate it over time as the methods on each
class evolve. That sounds like a huge hassle, but this is an experiment
so lets see how it goes.

The parlour config file ignore lib/pdf/reader/parser.rb because I get a
NoMethodError if I leave it in.
  • Loading branch information
yob committed Nov 19, 2021
1 parent d7f30ef commit a765212
Show file tree
Hide file tree
Showing 16 changed files with 1,695 additions and 46 deletions.
9 changes: 9 additions & 0 deletions .parlour
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
output_file:
rbi: rbi/pdf-reader.rbi


parser:
included_paths:
- lib
excluded_paths:
- lib/pdf/reader/parser.rb
4 changes: 0 additions & 4 deletions lib/pdf/reader.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,6 @@
################################################################################

require 'stringio'
require 'sorbet-runtime'


module PDF
################################################################################
Expand Down Expand Up @@ -95,7 +93,6 @@ module PDF
# reader = PDF::Reader.new("somefile.pdf", :password => "apples")
#
class Reader
extend T::Sig

# lowlevel hash-like access to all objects in the underlying PDF
attr_reader :objects
Expand Down Expand Up @@ -235,7 +232,6 @@ def doc_strings_to_utf8(obj)
end
end

sig { params(str: String).returns(T::Boolean)}
def has_utf16_bom?(str)
first_bytes = str[0,2]

Expand Down
8 changes: 2 additions & 6 deletions lib/pdf/reader/cmap.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# coding: utf-8
# typed: true
# typed: false
# frozen_string_literal: true

################################################################################
Expand Down Expand Up @@ -33,7 +33,6 @@ class PDF::Reader
# extracting various useful information.
#
class CMap # :nodoc:
extend T::Sig

CMAP_KEYWORDS = {
"begincodespacerange" => 1,
Expand All @@ -49,16 +48,14 @@ class CMap # :nodoc:

attr_reader :map

sig { params(data: String).void}
def initialize(data)
@map = {}
process_data(data)
end

sig { params(data: String).void}
def process_data(data)
parser = build_parser(data)
mode = T.let(:none, Symbol)
mode = :none
instructions = []

while token = parser.parse_token(CMAP_KEYWORDS)
Expand All @@ -80,7 +77,6 @@ def process_data(data)
end
end

sig { returns(Integer) }
def size
@map.size
end
Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/filter/ascii85.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ class PDF::Reader
module Filter # :nodoc:
# implementation of the Ascii85 filter
class Ascii85
extend T::Sig

def initialize(options = {})
@options = options
Expand All @@ -18,7 +17,6 @@ def initialize(options = {})
# Decode the specified data using the Ascii85 algorithm. Relies on the AScii85
# rubygem.
#
sig {params(data: String).returns(String)}
def filter(data)
data = "<~#{data}" unless data.to_s[0,2] == "<~"
if defined?(::Ascii85Native)
Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/filter/ascii_hex.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ class PDF::Reader
module Filter # :nodoc:
# implementation of the AsciiHex stream filter
class AsciiHex
extend T::Sig

def initialize(options = {})
@options = options
Expand All @@ -16,7 +15,6 @@ def initialize(options = {})
################################################################################
# Decode the specified data using the AsciiHex algorithm.
#
sig {params(data: String).returns(String)}
def filter(data)
data.chop! if data[-1,1] == ">"
data = data[1,data.size] if data[0,1] == "<"
Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/filter/depredict.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ module Filter # :nodoc:
# some filter implementations support preprocessing of the data to
# improve compression
class Depredict
extend T::Sig

def initialize(options = {})
@options = options || {}
Expand All @@ -17,7 +16,6 @@ def initialize(options = {})
# Streams can be preprocessed to improve compression. This reverses the
# preprocessing
#
sig {params(data: String).returns(String)}
def filter(data)
predictor = @options[:Predictor].to_i

Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/filter/flate.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ class PDF::Reader
module Filter # :nodoc:
# implementation of the Flate (zlib) stream filter
class Flate
extend T::Sig

ZLIB_AUTO_DETECT_ZLIB_OR_GZIP = 47 # Zlib::MAX_WBITS + 32
ZLIB_RAW_DEFLATE = -15 # Zlib::MAX_WBITS * -1
Expand All @@ -20,7 +19,6 @@ def initialize(options = {})

################################################################################
# Decode the specified data with the Zlib compression algorithm
sig {params(data: String).returns(String)}
def filter(data)
deflated = zlib_inflate(data) || zlib_inflate(data[0, data.bytesize-1])

Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/filter/lzw.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,13 @@ class PDF::Reader
module Filter # :nodoc:
# implementation of the LZW stream filter
class Lzw
extend T::Sig

def initialize(options = {})
@options = options
end

################################################################################
# Decode the specified data with the LZW compression algorithm
sig {params(data: String).returns(String)}
def filter(data)
data = PDF::Reader::LZW.decode(data)
Depredict.new(@options).filter(data)
Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/filter/run_length.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,13 @@ class PDF::Reader # :nodoc:
module Filter # :nodoc:
# implementation of the run length stream filter
class RunLength
extend T::Sig

def initialize(options = {})
@options = options
end

################################################################################
# Decode the specified data with the RunLengthDecode compression algorithm
sig {params(data: String).returns(String)}
def filter(data)
pos = 0
out = "".dup
Expand Down
15 changes: 1 addition & 14 deletions lib/pdf/reader/overlapping_runs_filter.rb
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
# typed: strict
# typed: true
# coding: utf-8
# frozen_string_literal: true

class PDF::Reader
# remove duplicates from a collection of TextRun objects. This can be helpful when a PDF
# uses slightly offset overlapping characters to achieve a fake 'bold' effect.
class OverlappingRunsFilter
extend T::Sig

# This should be between 0 and 1. If TextRun B obscures this much of TextRun A (and they
# have identical characters) then one will be discarded
OVERLAPPING_THRESHOLD = 0.5

sig {params(runs: T::Array[PDF::Reader::TextRun]).returns(T::Array[PDF::Reader::TextRun])}
def self.exclude_redundant_runs(runs)
sweep_line_status = Array.new
event_point_schedule = Array.new
Expand Down Expand Up @@ -40,12 +38,6 @@ def self.exclude_redundant_runs(runs)
runs - to_exclude
end

sig {
params(
sweep_line_status: T::Array[PDF::Reader::TextRun],
event_point: EventPoint
).returns(T::Boolean)
}
def self.detect_intersection(sweep_line_status, event_point)
sweep_line_status.each do |open_text_run|
if event_point.x >= open_text_run.x &&
Expand All @@ -61,21 +53,16 @@ def self.detect_intersection(sweep_line_status, event_point)
# Utility class used to avoid modifying the underlying TextRun objects while we're
# looking for duplicates
class EventPoint
extend T::Sig

sig { returns(Numeric) }
attr_reader :x

sig { returns(PDF::Reader::TextRun) }
attr_reader :run

sig {params(x: Numeric, run: PDF::Reader::TextRun).void }
def initialize(x, run)
@x = x
@run = run
end

sig { returns(T::Boolean) }
def start?
@x == @run.x
end
Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/page_layout.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ class PDF::Reader
# media box should be a 4 number array that describes the dimensions of the
# page to be rendered as described by the page's MediaBox attribute
class PageLayout
extend T::Sig

DEFAULT_FONT_SIZE = 12

Expand Down Expand Up @@ -109,7 +108,6 @@ def merge_runs(runs)
}.flatten.sort
end

sig {params(chars: T::Array[PDF::Reader::TextRun]).returns(T::Array[PDF::Reader::TextRun])}
def group_chars_into_runs(chars)
chars.each_with_object([]) do |char, runs|
if runs.empty?
Expand Down
1 change: 0 additions & 1 deletion lib/pdf/reader/resource_methods.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ class Reader
# mixin for common methods in Page and FormXobjects
#
module ResourceMethods
extend T::Helpers

# Returns a Hash of color spaces that are available to this page
#
Expand Down
3 changes: 0 additions & 3 deletions lib/pdf/reader/text_run.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
class PDF::Reader
# A value object that represents one or more consecutive characters on a page.
class TextRun
extend T::Sig
include Comparable

attr_reader :x, :y, :width, :font_size, :text
Expand Down Expand Up @@ -48,12 +47,10 @@ def mean_character_width
@width / character_count
end

sig {params(other: PDF::Reader::TextRun).returns(T::Boolean)}
def mergable?(other)
y.to_i == other.y.to_i && font_size == other.font_size && mergable_range.include?(other.x)
end

sig {params(other: PDF::Reader::TextRun).returns(PDF::Reader::TextRun)}
def +(other)
raise ArgumentError, "#{other} cannot be merged with this run" unless mergable?(other)

Expand Down
2 changes: 0 additions & 2 deletions lib/pdf/reader/zero_width_runs_filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@
class PDF::Reader
# There's no point rendering zero-width characters
class ZeroWidthRunsFilter
extend T::Sig

sig {params(runs: T::Array[PDF::Reader::TextRun]).returns(T::Array[PDF::Reader::TextRun])}
def self.exclude_zero_width_runs(runs)
runs.reject { |run| run.width == 0 }
end
Expand Down
4 changes: 2 additions & 2 deletions pdf-reader.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Gem::Specification.new do |spec|
spec.summary = "A library for accessing the content of PDF files"
spec.description = "The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe"
spec.license = "MIT"
spec.files = Dir.glob("{examples,lib}/**/**/*") + ["Rakefile"]
spec.files = Dir.glob("{examples,lib,rbi}/**/**/*") + ["Rakefile"]
spec.executables << "pdf_object"
spec.executables << "pdf_text"
spec.executables << "pdf_callbacks"
Expand Down Expand Up @@ -34,7 +34,7 @@ Gem::Specification.new do |spec|
spec.add_development_dependency("pry")
spec.add_development_dependency("rdoc")
spec.add_development_dependency("sorbet")
spec.add_development_dependency("sorbet-runtime")
spec.add_development_dependency('parlour')

spec.add_dependency('Ascii85', '~> 1.0')
spec.add_dependency('ruby-rc4')
Expand Down
Loading

0 comments on commit a765212

Please sign in to comment.