-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dask Integration: vineyard as the data source for dask #409
Conversation
Codecov Report
@@ Coverage Diff @@
## main #409 +/- ##
==========================================
+ Coverage 69.64% 69.68% +0.04%
==========================================
Files 63 63
Lines 5452 5453 +1
==========================================
+ Hits 3797 3800 +3
+ Misses 1655 1653 -2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a package called vineyard-dask? under vineyard/contrib/dask.
It is quite strange to put it under ml/distributed....
def get_partition(socket, obj_id): | ||
client = vineyard.connect(socket) | ||
np_value = client.get(obj_id) | ||
return da.from_array(np_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a copy? (I'm not sure.
Seems the resolver assumes both the tensor and dataframe are chunked along the We haven't put enough efforts in other ML integrations, I know it is not trivial, but it does need to be fixed. As the dask already natively supports split chunks along both the |
Andy @andydiwenzhu I didn't mean closing the issue. We could
Then we could get this pull request landed. |
OK. That was a bad click, don't worry :) |
Signed-off-by: Diwen Zhu <diwen.zdw@alibaba-inc.com>
Fixes for v6d-io#409. Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Add two resolvers for dask: - GlobalTensor --> dask.Array - GlobalDataFrame --> dask.dataframe Signed-off-by: Diwen Zhu <diwen.zdw@alibaba-inc.com> Signed-off-by: Sijie <lsjrosej@gmail.com>
…ge. (v6d-io#417) * Add missing __init__.py to dask module otherwise it is an empty package, fixes for v6d-io#409. * Revisit the CI script. * Set LD_LIBRARY_PATH for running tests. Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: Sijie <lsjrosej@gmail.com>
Signed-off-by: Diwen Zhu diwen.zdw@alibaba-inc.com
What do these changes do?
Add two resolvers for dask: GlobalTensor --> dask.Array, GlobalDataFrame --> dask.dataframe
Related issue number
Part of #412.