Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement to_arrow in C++ for JS/Python #850

Merged
merged 15 commits into from
Dec 29, 2019
Merged

Implement to_arrow in C++ for JS/Python #850

merged 15 commits into from
Dec 29, 2019

Conversation

sc1f
Copy link
Contributor

@sc1f sc1f commented Dec 16, 2019

This PR uses the C++ Apache Arrow library to serialize Perspective views into the Arrow format. It includes:

  • Upgrade Arrow to 0.15.1, which requires a rebuild of Docker images.
  • Serialize to arrow for 0, 1, and 2-sided contexts, respecting start/end row and column.
  • Generate row delta (changed rows after update()) using C++ to_arrow, and implemented in Python.
  • Fix issue with WebsocketClosedError breaking in streaming example - clients closing websocket should no longer break lingering callbacks.
  • Refactor PerspectiveManager to use _PerspectiveCallbackCache instead of a dictionary.
  • add remote.py example in Python, which uses to_arrow and row_delta to stream changed rows to clients every millisecond.
  • Splitting arrow.cpp into arrow_loader and arrow_writer for cleanliness.
  • Removes scalar_vec_to_val and scalar_vec_to_string methods in C++ - use scalar_to_val instead.

@timkpaine timkpaine added C++ enhancement Feature requests or improvements labels Dec 19, 2019
@texodus
Copy link
Member

texodus commented Dec 29, 2019

to_arrow() performance in Javascript on this branch is nearly a 2x speedup! Very nice!

Screen Shot 2019-12-28 at 4 50 29 AM

Copy link
Member

@texodus texodus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is a really cool feature, great performance improvement, better portability, smaller asset size! I've done a review of this offline and added 2 commits:

  • Remove @arrow/es5-esm now that this is entirely implemented in C++, saving ~200k in addition to the runtime performance improvements.
  • Adds PSP_CHECK_ARROW_STATUS to wrap Status::invalid() in a PSP_COMPLAIN_AND_ABORT to squash the build warnings and provide better error messages.

@texodus
Copy link
Member

texodus commented Dec 29, 2019

Also fixed longstanding test flap, which was due to incorrect Javascript Date API usage.

@texodus texodus merged commit a59f5c7 into master Dec 29, 2019
@texodus texodus deleted the to-arrow branch December 29, 2019 11:05
@RandomFractals
Copy link

This looks great! Can't wait to try it in JS. Are all the arrow date types properly handled in this update?

@sc1f
Copy link
Contributor Author

sc1f commented Dec 30, 2019

Yes - date32 and date64 are handled as date types, while timestamp is handled as a datetime.

@RandomFractals
Copy link

sweet! will wait for the next rc to bundle it in my data ext. that uses this lib.

would be a nice way to close this year & thank you guys for this awesome data analytics lib & contribs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ enhancement Feature requests or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants