Feature/spark monte carlo #68

jasonlaska · 2016-12-04T09:43:52Z

Refactors MonteCarloProfile so that it can be used with both sklearn's Parallel and an Apache Spark/pyspark sparkContext. Spark version has been tested on Databricks' platform.

In particular:

picks graphs and learn all lambdas (model selection step) as an independent step first
map over all graphs, lambdas, and trials as a giant iteration step; this latter part follows much more of a map-reduce style vs. previously.

Each parallel spark worker (think: cpu core per cluster machine) needs to use a separately seeded instance of np.random.RandomState or the results will be the same on each worker. This is ok right? The seeds are automatically derived from an initial user-seeded value in the class.

The non-spark version remains unchanged (only refactored).

r @mnarayan ?

…t more and packs data in such a way that it can be used with a spark context.

…g out)

…but this thing is working with expected behavior now.

jasonlaska · 2016-12-04T09:46:22Z

Screenshot showing this thing running. Once the initial model selection step has been completed, the monte carlo trials step runs super fast on spark compared to regular (we can easily do a side by side).

jasonlaska · 2016-12-04T18:42:56Z

Usage: pass the parameter sc=spark.sparkContext where spark is a spark session (this is the default name on Databricks platform).

…hrow errors if something is wrong)

jasonlaska added 8 commits December 3, 2016 17:42

Major refactor to monte_carlo_profile that enables paralellizing a lo…

b307767

…t more and packs data in such a way that it can be used with a spark context.

add support for doing map operation via spark_context.parallelize

5fd9e75

Bump version

7da14ee

Fix _spark_map. Works on databricks (but not sure of the values comin…

77ef6c5

…g out)

Fix big bug, must pass in correct lambda before fitting model!

1946241

Minor edits

dcba9b2

Checkin

d0c04e2

It was tricky to get the random number generator to work in parallel …

f1049e5

…but this thing is working with expected behavior now.

Only partial the parameters that are shared (to avoid confusion and t…

0f733f3

…hrow errors if something is wrong)

jasonlaska merged commit c5df0be into develop Dec 9, 2016

jasonlaska deleted the feature/spark_monte_carlo branch December 9, 2016 04:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/spark monte carlo #68

Feature/spark monte carlo #68

jasonlaska commented Dec 4, 2016

jasonlaska commented Dec 4, 2016

jasonlaska commented Dec 4, 2016

Feature/spark monte carlo #68

Feature/spark monte carlo #68

Conversation

jasonlaska commented Dec 4, 2016

jasonlaska commented Dec 4, 2016

jasonlaska commented Dec 4, 2016