Skip to content

Commit

Permalink
Add a next_batch() call that returns all complete lines from the buffer.
Browse files Browse the repository at this point in the history
  • Loading branch information
Freaky committed Apr 22, 2018
1 parent 42a6e4b commit 1d35ab1
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 0 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@ Lines are limited to the size of the internal buffer (default 1MB).
// line is a &[u8] owned by reader.
}

Lines can also be read in batches for group processing - e.g. in threads:

while let Some(lines) = reader.next_batch() {
send(&chan, lines.unwrap().to_vec());
}

This should be more efficient than finding each intermediate delimiter in the main
thread, and allocating and sending each individual line.

## Performance

Comparison with using typical BufReader methods against pwned-passwords-2.0.txt:
Expand Down
47 changes: 47 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,39 @@ impl<R: io::Read> LineReader<R> {
}
}

/// Return a slice of complete lines, up to the size of the internal buffer.
///
/// This is functionally identical to next_line, only instead of getting up
/// to the *first* instance of the delimiter, you get up to the *last*.
///
/// This is anticipated to be used in multithreaded processing; take a batch
/// of complete lines, copy the slice and pass it onto workers having done the
/// minimum work in the input thread; a `read()`, a `memrchr()` to find the last
/// delimiter, and a copy.
///
/// The copy is left up to you in case you have other reasons for wanting
/// batches. Something higher level, like an iterator, will be forthcoming.
pub fn next_batch(&mut self) -> Option<io::Result<&[u8]>> {
if self.pos < self.end_of_complete {
let ret = &self.buf[self.pos..self.end_of_complete];
self.pos = self.end_of_complete;
return Some(Ok(ret));
}

match self.refill() {
Ok(true) => self.next_batch(),
Ok(false) => {
if self.end_of_buffer == self.pos {
None
} else {
self.pos = self.end_of_buffer;
Some(Ok(&self.buf[..self.end_of_buffer]))
}
}
Err(e) => Some(Err(e)),
}
}

fn refill(&mut self) -> io::Result<bool> {
assert!(self.pos == self.end_of_complete);
assert!(self.end_of_complete <= self.end_of_buffer);
Expand Down Expand Up @@ -262,6 +295,20 @@ mod tests {
assert!(reader.next_line().is_none());
}

#[test]
fn test_next_batch() {
let buf: &[u8] = b"0a0\n1bb1\n2ccc2\n3dddd3\n4eeeee4\n5ffffffff5\n6ggggg6\n7hhhhhh7";
let mut reader = LineReader::with_capacity(19, buf);

assert_eq!(b"0a0\n1bb1\n2ccc2\n", reader.next_batch().unwrap().unwrap());
assert_eq!(b"3dddd3\n4eeeee4\n", reader.next_batch().unwrap().unwrap());
assert_eq!(
b"5ffffffff5\n6ggggg6\n",
reader.next_batch().unwrap().unwrap()
);
assert_eq!(b"7hhhhhh7", reader.next_batch().unwrap().unwrap());
}

extern crate rand;
use std::io::BufRead;
use std::io::{Cursor, Read};
Expand Down

0 comments on commit 1d35ab1

Please sign in to comment.