Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added parallelization support to StreamIterator #234

Merged
merged 3 commits into from
Oct 10, 2018
Merged

Conversation

adahn6
Copy link
Contributor

@adahn6 adahn6 commented Oct 9, 2018

Description:

Per issue #124, adds in a new method Iterables.parallelStream() that replicates the functionality of Iterables.stream() but with parallelization enabled.

Potential Impact:

Minimal. While StreamIterable was changed, the default value for the parallelization is set to false, and is only changed through either explicit calls to enableParallelization() or through a different constructor.

Unit Test Approach:

It's difficult in unit tests to verify that parallelization was actually used (open to suggestions!!). As such, the existing tests were duplicated but with parallelization enabled; correct expected results were considered sufficient.

Manual debug checking did show thread pools, so parallelization has been confirmed / sanity tested.

Test Results:

Describe other (non-unit) test results here.


In doubt: Contributing Guidelines

Copy link
Contributor

@MikeGost MikeGost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I like that it didn't break existing pattern. One minor suggestion in the code and test. Regarding testing this - one possibility could be to try to reproduce a case where accessing some stream in parallel will yield different results across multiple iterations of the same code and see if it happens with your changes. Example: reproduce a race condition when multiple threads are trying to update the same variable, like updating a single total when processing various additions/subtractions to that total. Running that code in parallel would likely yield different results run over run. Maybe more trouble than it's worth, but that's the only thing I can think of :(

@Test
public void testAllMatchParallel() throws Exception
{
final StreamIterable<Integer> streamIterable = new StreamIterable<>(asList(1, 2, 3, 4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the true flag to be parallel, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops. put the true flag in the wrong test.

*/
public void disableParallelization()
{
this.parallel = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on withParallelization() and withoutParallelization() that toggles the flag and return the StreamIterable object back? Might be easier to read inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed!

Copy link
Contributor

@lucaspcram lucaspcram Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also planning on mentioning this, but you beat me to it. I do have one thought about this solution - it debilitates thread safety for shared instances of StreamIterable. The safest possible form of this API would be to keep the this.parallel variable but completely remove the enableParallelization and disableParallelization methods.

All that being said, I suppose it does not make too much sense for multiple threads to ever share a StreamIterable instance in practice - and the race condition that is introduced would simply involve one thread unintentionally using the parallel version of a StreamIterable method when it preempts another thread that just called enableParallelization() on the shared StreamIterable instance.

If we are fine with keeping this mutability (which I ultimately am, sorry for being pedantic), I would also prefer @MikeGost 's version for ease of use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. it's also pretty easy to just have it return a new StreamIterable object instead of changing the switch on the current object-- slightly worse for memory usage, but probably nbd in the grand scheme of things, and definitely more thread safe. Opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it as is, echoing @lucaspcram's point above about having multiple threads sharing a StreamIterable not making much sense. However, if we have a fairly large StreamIterable, that could potentially be a significant memory usage impact. Might be worth putting a note about this not being thread-safe in the parallelization case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @MikeGost . Let's leave as is, but just note in the docstring that the class is not completely safe when shared between multiple threads (which of course, would be very strange way to use the class anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 cool, I'm on board with that. I added in a note to the docstring-- i feel like a shorter warning is more likely to be seen than a long explanation, so I kept it simple.

lucaspcram
lucaspcram previously approved these changes Oct 10, 2018
Copy link
Contributor

@lucaspcram lucaspcram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes. We've been needing this for awhile.

lucaspcram
lucaspcram previously approved these changes Oct 10, 2018
@adahn6
Copy link
Contributor Author

adahn6 commented Oct 10, 2018

@MikeGost I thought about the tests for a little while longer, and realized we can just construct a test where performance will be substantially better in parallel vs sequential. Performance as a measure of correctness is extremely iffy IMO, but that applies more when applied generally and not to carefully chosen specific cases. I added an example test, but I'm definitely ok removing it if we'd rather not rely on that.

Here's some example output from it:

2018-10-10 09:41:42 ERROR [main] StreamIterableTest:83 - Sequential duration was 5 ms
2018-10-10 09:41:42 ERROR [main] StreamIterableTest:89 - Parallel duration was 1 ms

@MikeGost
Copy link
Contributor

Good call @adahn6 - your approach is better.

lucaspcram
lucaspcram previously approved these changes Oct 10, 2018
Copy link
Contributor

@MikeGost MikeGost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great set of changes, thanks @adahn6!

@MikeGost MikeGost merged commit ec8d9f8 into osmlab:dev Oct 10, 2018
@matthieun matthieun added this to the 5.1.14 milestone Oct 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants