-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added parallelization support to StreamIterator #234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I like that it didn't break existing pattern. One minor suggestion in the code and test. Regarding testing this - one possibility could be to try to reproduce a case where accessing some stream in parallel will yield different results across multiple iterations of the same code and see if it happens with your changes. Example: reproduce a race condition when multiple threads are trying to update the same variable, like updating a single total when processing various additions/subtractions to that total. Running that code in parallel would likely yield different results run over run. Maybe more trouble than it's worth, but that's the only thing I can think of :(
@Test | ||
public void testAllMatchParallel() throws Exception | ||
{ | ||
final StreamIterable<Integer> streamIterable = new StreamIterable<>(asList(1, 2, 3, 4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs the true
flag to be parallel, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops. put the true
flag in the wrong test.
*/ | ||
public void disableParallelization() | ||
{ | ||
this.parallel = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on withParallelization()
and withoutParallelization()
that toggles the flag and return the StreamIterable
object back? Might be easier to read inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also planning on mentioning this, but you beat me to it. I do have one thought about this solution - it debilitates thread safety for shared instances of StreamIterable
. The safest possible form of this API would be to keep the this.parallel
variable but completely remove the enableParallelization
and disableParallelization
methods.
All that being said, I suppose it does not make too much sense for multiple threads to ever share a StreamIterable
instance in practice - and the race condition that is introduced would simply involve one thread unintentionally using the parallel version of a StreamIterable
method when it preempts another thread that just called enableParallelization()
on the shared StreamIterable
instance.
If we are fine with keeping this mutability (which I ultimately am, sorry for being pedantic), I would also prefer @MikeGost 's version for ease of use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. it's also pretty easy to just have it return a new StreamIterable object instead of changing the switch on the current object-- slightly worse for memory usage, but probably nbd in the grand scheme of things, and definitely more thread safe. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep it as is, echoing @lucaspcram's point above about having multiple threads sharing a StreamIterable
not making much sense. However, if we have a fairly large StreamIterable
, that could potentially be a significant memory usage impact. Might be worth putting a note about this not being thread-safe in the parallelization case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @MikeGost . Let's leave as is, but just note in the docstring that the class is not completely safe when shared between multiple threads (which of course, would be very strange way to use the class anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 cool, I'm on board with that. I added in a note to the docstring-- i feel like a shorter warning is more likely to be seen than a long explanation, so I kept it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes. We've been needing this for awhile.
@MikeGost I thought about the tests for a little while longer, and realized we can just construct a test where performance will be substantially better in parallel vs sequential. Performance as a measure of correctness is extremely iffy IMO, but that applies more when applied generally and not to carefully chosen specific cases. I added an example test, but I'm definitely ok removing it if we'd rather not rely on that. Here's some example output from it:
|
Good call @adahn6 - your approach is better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great set of changes, thanks @adahn6!
Description:
Per issue #124, adds in a new method
Iterables.parallelStream()
that replicates the functionality ofIterables.stream()
but with parallelization enabled.Potential Impact:
Minimal. While StreamIterable was changed, the default value for the parallelization is set to false, and is only changed through either explicit calls to
enableParallelization()
or through a different constructor.Unit Test Approach:
It's difficult in unit tests to verify that parallelization was actually used (open to suggestions!!). As such, the existing tests were duplicated but with parallelization enabled; correct expected results were considered sufficient.
Manual debug checking did show thread pools, so parallelization has been confirmed / sanity tested.
Test Results:
Describe other (non-unit) test results here.
In doubt: Contributing Guidelines