-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread bug in v1.15.4 when reading a CSV with no headers and first variable is BZh #6304
Comments
Thanks for the report. Not reproducable on Ubuntu, but can reproduce on current master and Windows. library(data.table)
dt = data.table(c1="BZh")
f = tempfile()
fwrite(dt, f, col.names=FALSE)
fread(f)
#> Fehler in fread(f) :
#> File is empty: C:\Users\~\AppData\Local\Temp\Rtmp4Cc6ek\file1ff86e2d449f |
|
From stepping through
|
Out of cursiosity, is this a real file starting with "BZh"? Interestingly, readLines also seems to have certain problems here f = tempfile()
fwrite(data.table(c1=c("BZh")), f, col.names=FALSE)
readLines(f)
#> [1] "BZh"
fwrite(data.table(c1=c("BZh", "x")), f, col.names=FALSE)
readLines(f)
#> character(0) |
Per here, we could make our detection more safe from false positives like this by also checking the 4th byte for a digit 1-9: |
@ben-schwen yes we found the issue in a real csv with no headers. It contains many more variables and rows, but the BZh in first entry is enough to trigger the false positive. |
Apparently this is also a problem for At R source: |
Thanks @grainnemcguire, can you share |
Went ahead and filed https://bugs.r-project.org/show_bug.cgi?id=18768 before I forget |
Thanks for the rapid response to this. Just FYI, the data field was a character field more than 3 characters long - the next 7 characters after the BZh were a mixture of letters and numbers. Thanks in general for maintaining data.table. It's such a useful package. |
fread produces an error or incorrect results when the first variable in a csv file with no headers is
BZh
.fread()
thinks that the file is a bgz file and attempts to process it accordingly via R.utils package.fread runs as expected if the data have headers. data.table v1.14.8 also runs as expected.
This looks to be related to the issue in #5461 and the related changes in PR #5474 and is triggered by matching against
bz2_signature
infread()
.Reproducible example
Output of sessionInfo()
The text was updated successfully, but these errors were encountered: