pyspark

The Python API is almost identical to the scala version, and it would map ndarray to tensor for the training samples, so basically user only need to care about how to manipulate ndarray.

This Python binding has been tested with Python 2.7 and Spark 1.6.0 / Spark 2.0.0.

Run python test

Package Scala code by: $BigDL_HOME/make-dist.sh
Set SPARK_HOME and then run: $BigDL_HOME/pyspark/test/dev/run-all.sh

Installing on Ubuntu

Build BigDL Build Page
- With Spark 1.6: $BIGDL_HOME/make-dist.sh -P spark_1.6
- With Spark 2.0 or later: $BIGDL_HOME/make-dist.sh -P spark_2.x
Install python dependensies(if you're running cluster mode, you need to install them on client and each worker node):

Installing Numpy: sudo apt-get install python-numpy
Installing Python setuptools: sudo apt-get install -y python-setuptools python-pip
Install Jupyter on client node (only if you need to use BigDL within Jupyter notebook): sudo pip install jupyter
Install other python dependency libs if you need to use them in your python application

Run a Lenet example on standalone cluster

   BigDL_HOME=...
   SPARK_HOME=...
   MASTER=...
   PYTHON_API_ZIP_PATH=${BigDL_HOME}/dist/lib/bigdl-VERSION-python-api.zip
   BigDL_JAR_PATH=${BigDL_HOME}/dist/lib/bigdl-VERSION-jar-with-dependencies.jar
   export PYTHONPATH=${PYTHON_API_ZIP_PATH}:$PYTHONPATH
   
   ${SPARK_HOME}/bin/spark-submit \
       --master ${MASTER} \
       --driver-cores 5  \
      --driver-memory 10g  \
      --total-executor-cores 80  \
      --executor-cores 10  \
      --executor-memory 20g \
       --py-files ${PYTHON_API_ZIP_PATH},${BigDL_HOME}/pyspark/dl/models/lenet/lenet5.py  \
       --properties-file ${BigDL_HOME}/dist/conf/spark-bigdl.conf \
       --jars ${BigDL_JAR_PATH} \
       --conf spark.driver.extraClassPath=${BigDL_JAR_PATH} \
       --conf spark.executor.extraClassPath=bigdl-VERSION-jar-with-dependencies.jar \
       ${BigDL_HOME}/pyspark/dl/models/lenet/lenet5.py

details can be found at: LeNet5.

Launch Jupyter on standalone cluster

   BigDL_HOME=...                                                                                         
   SPARK_HOME=...
   MASTER=...
   PYTHON_API_ZIP_PATH=${BigDL_HOME}/dist/lib/bigdl-VERSION-python-api.zip
   BigDL_JAR_PATH=${BigDL_HOME}/dist/lib/bigdl-VERSION-jar-with-dependencies.jar

   export PYTHONPATH=${PYTHON_API_ZIP_PATH}:$PYTHONPATH
   export PYSPARK_DRIVER_PYTHON=jupyter
   export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=./  --ip=* --no-browser"
   
   ${SPARK_HOME}/bin/pyspark \
       --master ${MASTER} \
       --properties-file ${BigDL_HOME}/dist/conf/spark-bigdl.conf \
       --driver-cores 5  \
      --driver-memory 10g  \
      --total-executor-cores 8  \
      --executor-cores 1  \
      --executor-memory 20g \
      --conf spark.akka.frameSize=64 \
       --py-files ${PYTHON_API_ZIP_PATH} \
       --jars ${BigDL_JAR_PATH} \
       --conf spark.driver.extraClassPath=${BigDL_JAR_PATH} \
       --conf spark.executor.extraClassPath=bigdl-VERSION-jar-with-dependencies.jar

Run a CNN/LSTM/GRU Text Classifier example on standalone cluster

Please refer to the page python text classifier.

Name		Name	Last commit message	Last commit date
parent directory ..
bigdl		bigdl
docs		docs
test		test
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark

pyspark

README.md

Run python test

Installing on Ubuntu

Run a Lenet example on standalone cluster

Launch Jupyter on standalone cluster

Run a CNN/LSTM/GRU Text Classifier example on standalone cluster

Files

pyspark

Directory actions

More options

Directory actions

More options

Latest commit

History

pyspark

Folders and files

parent directory

README.md

Run python test

Installing on Ubuntu

Run a Lenet example on standalone cluster

Launch Jupyter on standalone cluster

Run a CNN/LSTM/GRU Text Classifier example on standalone cluster