Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor chunking algorithm changes #1676

Merged
merged 1 commit into from
Dec 28, 2024

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Dec 22, 2024

Description:

This is a follow-up to #1665.

Changes include:

  • Moving the logic into a separate function and adding test cases
  • Also implementing the changes in reader.go.
  • Using bufio.Reader in directory.go to speed up execution

Checklist:

  • Does your PR pass tests?
  • Have you written new tests for your changes?
  • Have you lint your code locally prior to submission?

@rgmz rgmz force-pushed the feat/chunking-tests branch 5 times, most recently from fe0d9b5 to 78f67c2 Compare December 22, 2024 18:09
Comment on lines 54 to 55
// Buffer to hold file chunks
reader = bufio.NewReader(f)
Copy link
Contributor Author

@rgmz rgmz Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using #1432 as a test case. @ahrav @zricethezav any obvious errors with my use of bufio.Reader? It does feel strange to mix os.File, bufio.NewReader, and bytes.NewBuffer.

Before

12:33PM INF scanned ~104857630 bytes (104.86 MB) in 5.63s
12:33PM WRN leaks found: 1

After

1:05PM INF scanned ~104857630 bytes (104.86 MB) in 1.42s
1:05PM WRN leaks found: 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. I agree—it does feel a bit odd to use all three functions, but if I’m understanding this correctly, they each serve different purposes, so it actually makes sense.

Copy link
Contributor Author

@rgmz rgmz Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that f.Read and r.ReadByte may not place nicely. E.g., after the first iteration does calling f.Read also progress the internal reader used by bufio? Perhaps I should replace the f.Read call with reader.Read. (Done)

The number of bytes read (from #1670) appears to be consistent. 🤷

// https://pkg.go.dev/io#Reader
if n > 0 {
// Only check the filetype at the start of file.
if totalLines == 0 {
Copy link
Contributor Author

@rgmz rgmz Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be run after the first chunk. This is a rough way to check if it's the first chunk.

Comment on lines +139 to +165
} else if lastChar == '\r' || lastChar == ' ' || lastChar == '\t' {
// The presence of other whitespace characters (`\r`, ` `, `\t`) shouldn't reset the count.
// (Intentionally do nothing.)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles #1665 (comment)

@rgmz rgmz force-pushed the feat/chunking-tests branch 6 times, most recently from f84fcc7 to dda1f02 Compare December 24, 2024 02:13
@rgmz rgmz force-pushed the feat/chunking-tests branch from dda1f02 to 19bede8 Compare December 24, 2024 02:14
@zricethezav zricethezav merged commit 4c3da6e into gitleaks:master Dec 28, 2024
1 check passed
@rgmz rgmz deleted the feat/chunking-tests branch December 28, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants