forked from rouge-ruby/rouge
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
factor out the guessing infra to Guesser classes
- Loading branch information
http://jneen.net/
committed
Jun 7, 2016
1 parent
7db5e04
commit b2d086a
Showing
6 changed files
with
139 additions
and
85 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
module Rouge | ||
class Guesser | ||
def self.guess(guessers, lexers) | ||
original_size = lexers.size | ||
|
||
guessers.each do |g| | ||
new_lexers = g.filter(lexers) | ||
lexers = new_lexers.any? ? new_lexers : lexers | ||
end | ||
|
||
# if we haven't filtered the input at *all*, | ||
# then we have no idea what language it is, | ||
# so we bail and return []. | ||
lexers.size < original_size ? lexers : [] | ||
end | ||
|
||
def filter(lexers) | ||
raise 'abstract' | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
module Rouge | ||
module Guessers | ||
class Filename < Guesser | ||
attr_reader :fname | ||
def initialize(filename) | ||
@filename = filename | ||
@basename = File.basename(filename) | ||
end | ||
|
||
# returns a list of lexers that match the given filename with | ||
# equal specificity (i.e. number of wildcards in the pattern). | ||
# This helps disambiguate between, e.g. the Nginx lexer, which | ||
# matches `nginx.conf`, and the Conf lexer, which matches `*.conf`. | ||
# In this case, nginx will win because the pattern has no wildcards, | ||
# while `*.conf` has one. | ||
def filter(lexers) | ||
out = [] | ||
best_seen = nil | ||
lexers.each do |lexer| | ||
score = lexer.filenames.map do |pattern| | ||
if File.fnmatch?(pattern, @basename, File::FNM_DOTMATCH) | ||
# specificity is better the fewer wildcards there are | ||
pattern.scan(/[*?\[]/).size | ||
end | ||
end.compact.min | ||
|
||
next unless score | ||
|
||
if best_seen.nil? || score < best_seen | ||
best_seen = score | ||
out = [lexer] | ||
elsif score == best_seen | ||
out << lexer | ||
end | ||
end | ||
|
||
out.any? ? out : lexers | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
module Rouge | ||
module Guessers | ||
class Mimetype < Guesser | ||
attr_reader :mimetype | ||
def initialize(mimetype) | ||
@mimetype = mimetype | ||
end | ||
|
||
def filter(lexers) | ||
lexers.select { |lexer| lexer.mimetypes.include? @mimetype } | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
module Rouge | ||
module Guessers | ||
class Source < Guesser | ||
attr_reader :source | ||
def initialize(source) | ||
@source = source | ||
end | ||
|
||
def filter(lexers) | ||
# don't bother reading the input if | ||
# we've already filtered to 1 | ||
return lexers if lexers.size == 1 | ||
|
||
# If we're filtering against *all* lexers, we only use confident return | ||
# values from analyze_text. But if we've filtered down already, we can trust | ||
# the analysis more. | ||
threshold = lexers.size < 10 ? 0 : 0.5 | ||
|
||
source_text = case @source | ||
when String | ||
@source | ||
when ->(s){ s.respond_to? :read } | ||
@source.read | ||
else | ||
raise 'invalid source' | ||
end | ||
|
||
Lexer.assert_utf8!(source_text) | ||
|
||
source_text = TextAnalyzer.new(source_text) | ||
|
||
best_result = threshold | ||
best_match = nil | ||
lexers.each do |lexer| | ||
result = lexer.analyze_text(source_text) || 0 | ||
return [lexer] if result == 1 | ||
|
||
if result > best_result | ||
best_match = lexer | ||
best_result = result | ||
end | ||
end | ||
|
||
[best_match] | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters