Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactors
MonteCarloProfile
so that it can be used with both sklearn'sParallel
and an Apache Spark/pysparksparkContext
. Spark version has been tested on Databricks' platform.In particular:
Each parallel spark worker (think: cpu core per cluster machine) needs to use a separately seeded instance of np.random.RandomState or the results will be the same on each worker. This is ok right? The seeds are automatically derived from an initial user-seeded value in the class.
The non-spark version remains unchanged (only refactored).
r @mnarayan ?