Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to differentate file that ends without a newline vs a line that was truncated to the buffer size #3

Open
dimo414 opened this issue Oct 3, 2022 · 0 comments

Comments

@dimo414
Copy link

dimo414 commented Oct 3, 2022

When next_batch() returns a slice that doesn't end in the delimiter there's no easy way to tell whether this is because there's nothing more to read or because the line is larger than the buffer.

  • You can call next_*() once more to see if it returns None, but you need to copy the previously returned line(s) before you can do so. This is tedious to get right (e.g. bstr does something like this).
  • You can configure the reader's capacity (since there isn't a capacity() method on LineReader) and then check if the returned string is the same size as that capacity. This is roundabout and can still have false-positives.

It would be great if it was apparent from the API whether the returned slice was incomplete or not, such as by returning an error or a different type that contained this bit.

Taking this a step further, would it be feasible/welcome to eliminate this limitation of LineReader (possibly as optional behavior)? For a caller that wants to support arbitrarily long lines there's really no option other than allocating enough memory to fit the whole line, so it seems like LineReader could just do this for the caller by resizing its buffer in response to overly-long lines.

I might be able to contribute some of these changes if there's interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant