-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vaex-ml: package centred around machine learning related tasks #254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I'll fix travis.
packages/vaex-ml/vaex/ml/pycache/ should not go in
@@ -21,7 +21,8 @@ | |||
'vaex-server==0.2', | |||
'vaex-hdf5==0.4', | |||
'vaex-astro==0.4', | |||
'vaex-arrow==0.3' | |||
'vaex-arrow==0.3', | |||
'vaex-ml==0.4' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think vaex-ml should be installed by default when you install vaex, like vaex-ui, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially since vaex-ml now still depends on numba.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? I thought it is good to come as a default. I am afraid it will not get noticed or add extra complexity otherwise.
For production environments, people can of course choose what to install.
@@ -0,0 +1 @@ | |||
!coverage.py: This is a private format, don't read it directly!{"lines":{}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this file should go into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I tried to exclude it.. i will check the gitignore file again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the pycache/ like things.
bacb96b
to
1ed152d
Compare
548aff4
to
9acd38c
Compare
…aring vaex.ml to pure sklearn
…to test similarity of results up to a certain precision.
…th the binaries, we do run on windows/osx
🎉 yeah! |
This is a big PR in which we are introducing
vaex-ml
, avaex
package centred around machine learning related tasks and applications. The following describes the contents:vaex.ml.transformations
: methods related to preprocessing: scalers, categorical encoders, PCAvaex.ml.cluster
: provides an efficient KMeans clustering algorithmvaex.ml.ui
: provides means to construct anipywidget
for nearly any transformer in this packagevaex.ml.xgboost
: a binding to thexgboost
libraryvaex.ml.lightgbm
: a binding to thelightgbm
libraryvaex.ml.catboost
: a binding to thecatboost
libraryvaex.ml.sklearn
: a binding to thescikit-learn
library. At the moment, only the estimators are supported.vaex.ml.incubator
: a module housing various machine learning models. The bindings in the incubator are considered experimental and are under testing. The API, implementation or support may change without notice.vaex.ml.datasets
: contains datasets for experimentation and training. Currently contains the titanic and the iris classical datasets. It also contains methods for replicating the iris dataset such that it contains a total of 10^9 samples, creating a "big data" example.vaex.ml.generate
: module that auto-generates an alternative API for the transformers and ML models.vaex.ml.pipeline
: provides a pipeline object for thevaex-ml
transformers and estimatorsvaex.ml.state
: methods for serialisation ofvaex
objectsvaex.ml.linear_model
: provides an implementation of linear models that operate on a grid (binned data) instead of individual samples.Everything comes with a full suite of tests handled by
pytest
.NB: The contents above and this description itself may change slightly as
vaex-ml
is integrated withvaex
.