Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust detection of bzip2 magic number #6308

Merged
merged 3 commits into from
Jul 24, 2024
Merged

Conversation

MichaelChirico
Copy link
Member

Closes #6304

Also took the opportunity to factor out some helpers for easier maintenance/readability. That could be split to its own precursor PR if so desired.

We could extend is_bzip() to also look for "raw pi" (as.raw(c(0x31, 0x41, 0x59, 0x26, 0x53, 0x59))) in the 5th-10th bytes, but skip that for now. The example seems pretty pathological already.

Also, I don't think that any of our test_R.utils tests have been checking the "infer from magic number" functionality from fread(), so this PR also adds some nice regression coverage for this behavior.

Copy link

github-actions bot commented Jul 23, 2024

Comparison Plot

Generated via commit 2db0960

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 11 minutes and 35 seconds

Time taken to run atime::atime_pkg on the tests: 3 minutes and 20 seconds

@ben-schwen
Copy link
Member

Not sure we really need the fread(header=TRUE) test here, but I guess it can't hurt.

Besides that, LGTM

R/fread.R Show resolved Hide resolved
@MichaelChirico
Copy link
Member Author

MichaelChirico commented Jul 23, 2024

Not sure we really need the fread(header=TRUE) test here, but I guess it can't hurt.

yea for the problem as diagnosed it's kinda silly. but wanted to keep some examples close to OP's report, that way we're ironclad sure we've closed the issue & it'll stay closed.

@ben-schwen ben-schwen merged commit 0c022e2 into master Jul 24, 2024
5 checks passed
@ben-schwen ben-schwen deleted the fread-bz2-digit branch July 24, 2024 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fread bug in v1.15.4 when reading a CSV with no headers and first variable is BZh
2 participants