Skip to content

Improve the performance of line regexes #244

Closed
@KevinHock

Description

Due to the way we pass a file to every single plugin, rather than a line, we end up regex.searching the same line P times, where P is the number of plugins. This holds true for both ALLOWLIST_REGEXES and --exclude-lines. For large diffs on a tightly provisioned box this can be quite inefficient.

The relevant control flow is as follows

try:
log.info('Checking file: %s', filename)
for results, plugin in self._results_accumulator(filename):
results.update(plugin.analyze(f, filename))
f.seek(0)

def analyze(self, file, filename):
"""
:param file: The File object itself.
:param filename: string; filename of File object, used for creating
PotentialSecret objects
:returns dictionary representation of set (for random access by hash)
{ detect_secrets.core.potential_secret.__hash__:
detect_secrets.core.potential_secret }
"""
potential_secrets = {}
file_lines = tuple(file.readlines())
for line_num, line in enumerate(file_lines, start=1):
results = self.analyze_string(line, line_num, filename)

def analyze_string(self, string, line_num, filename):
"""
:param string: string; the line to analyze
:param line_num: integer; line number that is currently being analyzed
:param filename: string; name of file being analyzed
:returns: dictionary
NOTE: line_num and filename are used for PotentialSecret creation only.
"""
if (
any(
allowlist_regex.search(string) for allowlist_regex in ALLOWLIST_REGEXES
)
or (
self.exclude_lines_regex and
self.exclude_lines_regex.search(string)

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions