Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/spark monte carlo #68

Merged
merged 9 commits into from
Dec 9, 2016
Merged

Feature/spark monte carlo #68

merged 9 commits into from
Dec 9, 2016

Conversation

jasonlaska
Copy link
Member

Refactors MonteCarloProfile so that it can be used with both sklearn's Parallel and an Apache Spark/pyspark sparkContext. Spark version has been tested on Databricks' platform.

In particular:

  • picks graphs and learn all lambdas (model selection step) as an independent step first
  • map over all graphs, lambdas, and trials as a giant iteration step; this latter part follows much more of a map-reduce style vs. previously.

Each parallel spark worker (think: cpu core per cluster machine) needs to use a separately seeded instance of np.random.RandomState or the results will be the same on each worker. This is ok right? The seeds are automatically derived from an initial user-seeded value in the class.

The non-spark version remains unchanged (only refactored).

r @mnarayan ?

@jasonlaska
Copy link
Member Author

Screenshot showing this thing running. Once the initial model selection step has been completed, the monte carlo trials step runs super fast on spark compared to regular (we can easily do a side by side).

screen shot 2016-12-04 at 1 44 18 am

@jasonlaska
Copy link
Member Author

Usage: pass the parameter sc=spark.sparkContext where spark is a spark session (this is the default name on Databricks platform).

@jasonlaska jasonlaska merged commit c5df0be into develop Dec 9, 2016
@jasonlaska jasonlaska deleted the feature/spark_monte_carlo branch December 9, 2016 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant