Scala for Machine Learning Version 0.99.1
Copyright Patrick Nicolas All rights reserved 2013-2016
=================================================================
Overview
Latest release
Documentation
Minimum requirements
History
Project
installation
build
Run examples
Persistent models and configurations
Appendix
Source code guidelines are defined in the companion document SourceCodeGuide.html
The examples are related to investment portfolio management and trading strategies. For the readers interested either in mathematics or the techniques implemented in this library, I strongly recommend the following readings:
- "Machine Learning: A Probabilistic Perspective" K. Murphy - MIT Press - 2012
- "The Elements of Statistical Learning" T. Hastie, R. Tibshirani, J. Friedman - Springer - 2001
- "Pattern Recognition and Machine Learning" C. Bishop - Springer 2006
The Appendix contains an introduction to the basic concepts of investment and trading strategies as well as technical analysis of financial markets. Here is the list of changes introduced in version 0.99.1 of "Scala for Machine Learning"
- Add description of some algorithms in Scaladoc class header
- Resolve issues with shadowing types and variables
- Rename some file names to match class names (SingleLinearRegressionEval, NaiveBayesLikelihood, TensorFunctor...)
- Read the appropriate chapter (i.e. Chapter 5: Naive Bayes modelsM)
- Review source code guidelines used in the book SourceCodeGuide.html
- Review scaladoc in scala_2.11-0.99-sources.jar or scala_2.10-0.99-sources.jar depending on the version of Scala you are using.
- Look at the examples related to the chapter (i.e. org/scalaml/app/chap5/Chap5)
- Browse through the implementation code (i.e. org/scalaml/supervised/bayes)
4 CPU Core and 8+ Gbytes RAM for datasets of size 75,000 or larger and/or with 50 features set or larger
Operating system: None
Software: JDK 1.7.0_45 or 1.8.0_25, Scala 2.10.4 (for Apache Spark) or 2.11.2 (for Akka) and SBT 0.13+ (see installation section for deployment.
See Latest release
- Broader uses of higher order method such as aggregate, collect, partition, groupBy ...
- Strict monadic encoding of data transformation from an explicit model, and data transformation from a model derived from a training set.
- Correction and update of documentation for some statistical formulas.
- Reimplementation of training of logistic regression, Q-Learning and hidden Markov and execution of genetic algorithm using tail recursion
- Implementation of magnet pattern for overloaded methods with different return types
- Definition of covariant and contravariant functors
- Fix bugs in training of Multilayer perceptron
- Generic monitoring class for profiling execution of optimizers
- Introduction of monadic kernel functions with a test case
- Introduction to manifolds
- Introduction to Convolution Neural Networks
- Fisher-Yates shuffle for stochastic and batched gradient descent
- Implementation of 1-fold and K-fold cross-validation
- Standardization of the application of tail recursion for dynamic programming algorithms
- Uses of views to reduce uncessary generation of intermediate objects in processing pipeline
- Introduction to streams in Chapter 12 with example and test code
- Stricter adherence to coding convention for implicits, traits, abstract classes
- Improved scaladoc documentation
- Added support for Scala 2.11.2, Akka 2.3.4 and Apache Spark 1.5.0 (with Scala 2.10.4)
Added function minimization as a test case for Genetic algorithms
Added monitoring callback for reproduction cycle of the genetic algorithm and update implementation of trading signals
Standardized string representation of collection using mkString
Added plots to the performance benchmark of parallel collection (Chap. 12)
Simplified and re-implemented the Viterbi algorithm (HMM - decoding) as a tail recursion and normalize lambda probabilities matrices
Expanded scaladocs with reference to the chapters of "Scala for Machine Learning"
Replace some enumeration by case classes
Added scalastyle options
Added comments to test cases
Added Scala source guide Wrapped Scalatest routines into futures
Expand the number of test/evaluations from 39 to 60
Initial implementation
Directory structure of the source code library for Scala for Machine Learning:
Directory structure of the source code of the examples for Scala for Machine Learning:
The installation and build workflow is described in the following diagram:
Eclipse The Scala for Machine Learning library is compatible with Eclipse Scala IDE 3.0
Specify link to the source in Project/properties/Java Build Path/Source. The two links should be project_name/src/main/scala and project_name/src/test/scala
Add the jars required to build and execute the code within Eclipse Project/properties/Java Build Path/Add External Jarsas declared in the project_name/.classpath
Update the JVM heap parameters in eclipse.ini file as -Xms512m -Xmx8192m or the maximum allowed on your specific machine.
The Simple Build Too (SBT) has to be used to build the library from the source code using the build.sbt file in the root directory
Executing the examples/test in Scala for Machine Learning require sufficient JVM Heap memory (~2G):
in sbt/conf/sbtconfig.text set Xmx to 2058m or higher, -XX:MaxPermSize to 512m or higher i.e. -Xmx4096m -Xms512m -XX:MaxPermSize=512m
Build script for Scala for Machine Learning:
To build the Scala for Machine Learning library package
$(ROOT)/sbt clean publish-local
To build the package including test and resource files
$(ROOT)/sbt clean package
To generate scala doc for the library
$(ROOT)/sbt doc
To generate scala doc for the examples
$(ROOT)/sbt test:doc
To generate report for compliance to Scala style guide:
$(ROOT)/sbt scalastyle
To compile all examples:
$(ROOT)/sbt test:compile
A simple pom.xml is available to build the library and execute the test cases:
$(ROOT)/mvn compile to compile the library
$(ROOT)/mvn test to compile and run the examples To run the examples of a particular chapter (i.e. Chapter 4)
$(ROOT)/$sbt
>test-only org.scalaml.app.chap4.Chap4 To run all examples with output configuration:
$(ROOT)/sbt "test:run options" where options is a list of possible outputs
- console to output results onto standard output
- logger to output results into a log file (log4j)
- chart to plot results using jFreeChart
$(ROOT)/sbt test:run write test results into the standard output and the charts.
$(ROOT)/mvn test to compile and run the examples
The package object org.scalaml.core.Design provide the trait (or skeleton implementation) of the persistent model Design.Model and configuration Design.Config.
The persistency mechanisms is implemented for a couple of supervised learning models only for illustration purpose. The reader should be able to implement the persistency for configuration and models for all relevant learning algorithms using the template operator << and >>
The examples have been built and tested with the following libraries:
Java libraries
CRF-Trove_3.0.2.jar
LBFGS.jar
colt.jar
CRF-1.1.jar
commons-math3-3.5.jar
libsvm_sml-3.18.jar
jfreechart-1.0.17/lib/jcommon-1.0.21.jar
jfreechart-1.0.17/lib/servlets.jar
junit-4.11.jar
jfreechart-1.0.17/lib/jfreechart-1.0.17.jar
Scala 2.10 related libraries
com.typesafe/config/1.2.1/bundles/config.jar
akka-actor_2.10-2.2.3.jar
scalatest_2.1.16.jar
spark-assembly-1.5.0-hadoop2.4.0.jar
Scala 2.11. related libraries
com.typesafe/config/1.2.2/bundles/config.jar
scalatest_2.2.2.jar
akka-actor_2.11-2.3.4.jar
spark-assembly-1.5.0-hadoop2.4.0.jar