Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add PyO3(Hence CPython as a Optional Backend #976

Merged
merged 44 commits into from
Mar 1, 2023
Merged

Conversation

discord9
Copy link
Contributor

@discord9 discord9 commented Feb 13, 2023

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Add PyO3 as a Optional Backend
Pros: C API(numpy etc.), faster python execute speed
Cons: GIL might have concurrent problem, i.e. too many parallel python thread may slow script execution down

Please explain IN DETAIL what the changes are in this PR and why they are needed:

Refactor scripts into three folder: ffi_types, rspython and pyo3, where ffi_types store common type defination(Yes those types can be used both in RustPython and PyO3) and other two store respective impl details

Major Changes:

  • Added cpython PyVector impls & cpython coprocessor impls

  • Added cpython greptime python modules i.e. import greptime as gt

  • Paired tests for both RustPython and CPython

  • Refactor of old RustPython codes, now backend-independ code is in ffi_types folder with RustPython detailed impls in rspython folder and cpython relevant methods in cpython folder

  • Summarize your change (mandatory)

  • How does this PR work? Need a brief introduction for the changed logic (optional)

  • Describe clearly one logical change and avoid lazy messages (optional)

  • Describe any limitations of the current code (optional)
    TODO: Refactor(move into greptime module)&Unit tests in PyO3 for PyDataFrame&SQL query in python&More allow_thread so that it release GIL as much as possible for maximizing parallelism

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.(TODO: PyDataFrame&SQL Query's Unit tests

Refer to a related PR or issue link (optional)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 752 files.

Valid Invalid Ignored Fixed
583 5 164 0
Click to see the invalid file list
  • src/script/src/ffi_types.rs
  • src/script/src/pyo3.rs
  • src/script/src/pyo3/vector_impl.rs
  • src/script/src/rspython.rs
  • src/script/src/rspython/tests.rs

src/script/src/ffi_types.rs Outdated Show resolved Hide resolved
src/script/src/pyo3.rs Outdated Show resolved Hide resolved
src/script/src/pyo3/vector_impl.rs Outdated Show resolved Hide resolved
src/script/src/rspython.rs Outdated Show resolved Hide resolved
src/script/src/rspython/tests.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 758 files.

Valid Invalid Ignored Fixed
587 7 164 0
Click to see the invalid file list
  • src/script/src/ffi_types.rs
  • src/script/src/ffi_types/vector/tests.rs
  • src/script/src/pyo3.rs
  • src/script/src/pyo3/corp_impl.rs
  • src/script/src/pyo3/vector_impl.rs
  • src/script/src/rspython.rs
  • src/script/src/rspython/tests.rs

src/script/src/ffi_types/vector/tests.rs Outdated Show resolved Hide resolved
src/script/src/pyo3/corp_impl.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 773 files.

Valid Invalid Ignored Fixed
593 8 172 0
Click to see the invalid file list
  • src/script/src/ffi_types.rs
  • src/script/src/ffi_types/vector/tests.rs
  • src/script/src/pyo3.rs
  • src/script/src/pyo3/copr_impl.rs
  • src/script/src/pyo3/vector_impl.rs
  • src/script/src/rspython.rs
  • src/script/src/rspython/copr_impl.rs
  • src/script/src/rspython/tests.rs

src/script/src/pyo3/copr_impl.rs Outdated Show resolved Hide resolved
src/script/src/rspython/copr_impl.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 792 files.

Valid Invalid Ignored Fixed
606 8 178 0
Click to see the invalid file list
  • src/script/src/python/ffi_types/testcases_builtins.rs
  • src/script/src/python/ffi_types/utils.rs
  • src/script/src/python/pyo3/builtins.rs
  • src/script/src/python/pyo3/copr_impl.rs
  • src/script/src/python/pyo3/utils.rs
  • src/script/src/python/pyo3/vector_impl.rs
  • src/script/src/python/rspython.rs
  • src/script/src/python/rspython/tests.rs

src/script/src/python/ffi_types/testcases_builtins.rs Outdated Show resolved Hide resolved
src/script/src/python/ffi_types/utils.rs Show resolved Hide resolved
src/script/src/python/pyo3/builtins.rs Show resolved Hide resolved
src/script/src/python/pyo3/copr_impl.rs Show resolved Hide resolved
src/script/src/python/pyo3/utils.rs Show resolved Hide resolved
src/script/src/python/pyo3/vector_impl.rs Outdated Show resolved Hide resolved
src/script/src/python/rspython.rs Show resolved Hide resolved
src/script/src/python/rspython/tests.rs Outdated Show resolved Hide resolved
@discord9 discord9 marked this pull request as ready for review February 17, 2023 09:19
@v0y4g3r v0y4g3r requested a review from killme2008 February 20, 2023 02:40
@sunng87
Copy link
Member

sunng87 commented Feb 20, 2023

As we have two backends, I suggest to add feature gate for the optional one and ci to check correctness for both.

@killme2008
Copy link
Contributor

@discord9 A lot of conflicts, please fix them.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 795 files.

Valid Invalid Ignored Fixed
615 2 178 0
Click to see the invalid file list
  • src/script/src/python/ffi_types/pair_tests/sample_testcases.rs
  • src/script/src/python/pyo3/dataframe_impl.rs

@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Merging #976 (12fce34) into develop (50d2685) will decrease coverage by 0.39%.
The diff coverage is 69.79%.

@@             Coverage Diff             @@
##           develop     #976      +/-   ##
===========================================
- Coverage    85.33%   84.95%   -0.39%     
===========================================
  Files          459      471      +12     
  Lines        66885    68513    +1628     
===========================================
+ Hits         57078    58202    +1124     
- Misses        9807    10311     +504     

@killme2008
Copy link
Contributor

killme2008 commented Feb 22, 2023

Test failure on mac:

---- python::ffi_types::pair_tests::pyo3_rspy_test_in_pairs stdout ----
thread 'python::ffi_types::pair_tests::pyo3_rspy_test_in_pairs' panicked at '(RsPy)code:
from greptime import *
ret = tan(values)
ret
Real: PyVector { vector: PrimitiveVector { array: PrimitiveArray<Float64>
[
  1.557407724654902,
  -2.185039863261519,
  -0.1425465430742778,
] } }!=Expected: PrimitiveVector { array: PrimitiveArray<Float64>
[
  1.5574077246549023,
  -2.185039863261519,
  -0.1425465430742778,
] }', src/script/src/python/ffi_types/pair_tests.rs:81:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    python::ffi_types::pair_tests::pyo3_rspy_test_in_pairs

Reply: fixed, used epsilon for float array comparsion

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is huge. Plesase let me take a break.

src/script/src/python/engine.rs Outdated Show resolved Hide resolved
src/script/src/python/ffi_types/copr.rs Outdated Show resolved Hide resolved
src/script/src/python/ffi_types/copr.rs Outdated Show resolved Hide resolved
src/script/Cargo.toml Outdated Show resolved Hide resolved
src/script/src/python/ffi_types/copr.rs Show resolved Hide resolved
src/script/src/python/pyo3/utils.rs Outdated Show resolved Hide resolved
src/script/src/python/pyo3/utils.rs Outdated Show resolved Hide resolved
src/script/src/python/pyo3/vector_impl.rs Outdated Show resolved Hide resolved
src/script/src/python/pyo3/vector_impl.rs Outdated Show resolved Hide resolved
src/script/src/python/pyo3/vector_impl.rs Outdated Show resolved Hide resolved
Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we gain the ability to interact with cpython via pyo3, I'm wondering if it's possible to directly forward array compute functions to Arrow's python (which is from C or CPP) native implementation? This may reduce lots of FFI calls and make our code slimmer.

src/script/src/python/ffi_types/vector.rs Outdated Show resolved Hide resolved
src/script/src/python/ffi_types/vector.rs Outdated Show resolved Hide resolved
src/script/src/python/ffi_types/vector.rs Outdated Show resolved Hide resolved
@discord9
Copy link
Contributor Author

As we gain the ability to interact with cpython via pyo3, I'm wondering if it's possible to directly forward array compute functions to Arrow's python (which is from C or CPP) native implementation? This may reduce lots of FFI calls and make our code slimmer.

Wouldn't that complex our dependecy management? Like have to maintance both cargo and python pip or conda to get python arrow?

@killme2008
Copy link
Contributor

@discord9 There is a conflict file.

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM. But we stil have to refactor this crate, make it more clear and clean.

src/script/src/python/rspython/testcases.ron Outdated Show resolved Hide resolved
@waynexia
Copy link
Member

Almost LGTM. But we stil have to refactor this crate, make it more clear and clean.

One thing I'm thinking of while reviewing is, the ffi_types folder contains the major structs' definition and its methods of two backends. Do you think it's necessary to also split those implementations into different places like pyo3 and rspython?

@discord9
Copy link
Contributor Author

Almost LGTM. But we stil have to refactor this crate, make it more clear and clean.

One thing I'm thinking of while reviewing is, the ffi_types folder contains the major structs' definition and its methods of two backends. Do you think it's necessary to also split those implementations into different places like pyo3 and rspython?

Some types&function are way too similar to put in same file(i.e. dataframe&builtins functions), which are basically the same expect error handling and some impl details, so at least them should be put into different folder I guess?

src/script/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give my thumb up to this brilliant work 👍

@killme2008 killme2008 merged commit c5c6494 into develop Mar 1, 2023
@killme2008 killme2008 deleted the pyo3 branch March 1, 2023 02:45
paomian pushed a commit to paomian/greptimedb that referenced this pull request Oct 19, 2023
* refactor: ffi_types

* style: fmt

* refactor: use `String` for return when possible

* todo: vector_impl

* feat: pyobj_try_typed_val

* refactor: more backend indep function

* feat: +-*/ magic methods

* refactor: copr

* style: fmt

* feat: add paired tests

* refactor: more

* refactor: move inside `python` folder

* refactor: all but test code

* feat: builtins for PyO3

* chore: add licenses

* chore: remove unused&add todos

* refactor: remove old files

* chore: mark unused

* chore: fmt

* chore: license

* feat: query in PyO3

* test: paired testcases for rspy&pyo3

* feat: PyDataFrame(Untested)

* feat: some allow_threads

* style: fmt

* style: add license

* feat: rebase manually of GreptimeTeam#962

* feat: more `allow_threads`

* chore: typo

* chore: remove some `TODO`

* test: allow margin of epsilon

* chore: code review advices

* chore: more CR adjust

* chore: more adjust

* feat: kwargs&its test

* chore: remove some `dbg!`

* chore: allow params

* fix: put `dataframe` into scope

* chore: newline

* fix: adjust after rebase

* fix: test serde skip attr

* style: taplo

* feat: add `pyo3_backend` feature

* doc: update CI&readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants