Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from kuzudb:master #2

Merged
merged 174 commits into from
Apr 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
174 commits
Select commit Hold shift + click to select a range
0d84c73
Support Polars DataFrame export from QueryResult (#2985)
alexander-beedie Mar 4, 2024
89598fd
clean up transaction pointer in physical operator
ray6080 Mar 4, 2024
1131621
Merge pull request #2990 from kuzudb/clean-operator-transaction
ray6080 Mar 4, 2024
3415ff1
Store a stable reference instead of a duplicate string in the ColumnC…
benjaminwinger Mar 4, 2024
c1b2220
fix reset empty heap overflow
ray6080 Mar 2, 2024
93e6b3e
Merge pull request #2994 from kuzudb/string-column-chunk-index
benjaminwinger Mar 5, 2024
1b858cf
Merge pull request #2996 from kuzudb/fix-reset-empty
ray6080 Mar 6, 2024
9e23995
Rework CSV_TO_PARQUET testing feature
manh9203 Feb 29, 2024
ad24bf7
Avoid moving DictionaryChunks
benjaminwinger Mar 6, 2024
74c2f80
Merge pull request #2999 from kuzudb/dictionary-memory-fix
ray6080 Mar 7, 2024
5e598ec
update links to website (#3000)
ray6080 Mar 7, 2024
38e4398
Re-write partitioner to use ColumnChunks instead of ValueVectors
benjaminwinger Feb 27, 2024
b2f50ac
Abstract client config
andyfengHKU Mar 8, 2024
b7e3bc7
Merge pull request #2979 from kuzudb/rel-memory-fix
benjaminwinger Mar 8, 2024
c554a20
Merge pull request #3010 from kuzudb/add-client-config
andyfengHKU Mar 8, 2024
45c5aa9
Support use of QueryResult as a context manager (#3009)
alexander-beedie Mar 9, 2024
a3a6c2a
Pass client context to binder
andyfengHKU Mar 8, 2024
4692c54
Merge pull request #3015 from kuzudb/pass-client-context-binder
andyfengHKU Mar 10, 2024
020c09b
Refactor cast functions
andyfengHKU Mar 9, 2024
dc9771f
Merge pull request #3016 from kuzudb/refactor-cast-function-binding
andyfengHKU Mar 10, 2024
c149349
Combine append(ValueVector) with appendOne
ray6080 Mar 10, 2024
2a5948f
clean up unique_ptr of LogicalType in NodeGroup and BatchInsert
ray6080 Mar 10, 2024
f4a95ab
Merge pull request #3018 from kuzudb/clean-unique-ptr
ray6080 Mar 11, 2024
6c01c80
handle multiple database instantiations for import caching
mxwli Feb 28, 2024
3f84585
Revert "Revert "Implement Python Import Caching""
mxwli Mar 11, 2024
67e9204
Merge pull request #3017 from kuzudb/remove-append-one
ray6080 Mar 11, 2024
8642ccb
Merge pull request #3025 from kuzudb/import-cache-fix-and-revert-revert
mxwli Mar 11, 2024
7c25a3b
Rewrite the Hash Index overflow file to support multiple copies and f…
benjaminwinger Mar 7, 2024
2397c02
Fix issue-2984
andyfengHKU Mar 11, 2024
a6c7e21
Merge pull request #3026 from kuzudb/issue-2984
andyfengHKU Mar 11, 2024
8f5f64a
Add multiplaform test report bot (#3027)
mewim Mar 12, 2024
3bdc752
Python API typing, lint, config/makefile (#3023)
alexander-beedie Mar 12, 2024
18c2c8f
Fix unicode conversion for pandas dataframe (#3029)
mewim Mar 12, 2024
339a471
Update LICENSE
semihsalihoglu-uw Mar 12, 2024
1d5df7f
Merge pull request #3031 from kuzudb/semihsalihoglu-uw-patch-1
semihsalihoglu-uw Mar 12, 2024
0c26056
Merge pull request #3012 from kuzudb/multi-copy-overflow-file
benjaminwinger Mar 12, 2024
bdd650e
Add copy from subquery
andyfengHKU Mar 4, 2024
af50489
Insert into the hash index builder one chunk at a time
benjaminwinger Mar 5, 2024
7a3ff60
Merge pull request #3020 from kuzudb/copy-from-subquery
andyfengHKU Mar 12, 2024
d9d277f
Fix issue-3004
andyfengHKU Mar 12, 2024
891c115
Merge pull request #3036 from kuzudb/issue-3004
andyfengHKU Mar 13, 2024
7da8a62
Optimise Python unit test runtime (~7x speedup) (#3032)
alexander-beedie Mar 13, 2024
7c16897
Add more parameter types for Node.js API (#3037)
mewim Mar 13, 2024
d8487a0
Merge pull request #2997 from kuzudb/hash-index-builder-chunks
benjaminwinger Mar 13, 2024
b304389
Remove the constraint on the HashIndexBuilder template parameter
benjaminwinger Mar 12, 2024
0dbcef6
Allow CI workflow to be manually dispatched (#3043)
mewim Mar 13, 2024
2dfe495
Bump extensions version to 0.2.0 (#3041)
mewim Mar 13, 2024
7110f91
First-pass lint/format for Python `shell` tests (#3034)
alexander-beedie Mar 13, 2024
930ba45
Bump master branch version to 0.3.2.1 (#3044)
mewim Mar 14, 2024
1b6f741
Fixed failing shell tests (#3045)
MSebanc Mar 14, 2024
ff186a5
Add shell tests to CI (#3039)
mewim Mar 14, 2024
77489a5
fix issue 3042
ray6080 Mar 13, 2024
0b7adb9
fix sliding out-of-place commit and null strings
ray6080 Mar 12, 2024
f8efa2a
Merge pull request #3055 from kuzudb/fix-rel-insert-bug
ray6080 Mar 14, 2024
1f88b3f
rework local storage: separate the storage of insertions and updates
ray6080 Mar 14, 2024
4f06cf1
Merge pull request #2982 from kuzudb/multi-copy-rel-s1
ray6080 Mar 14, 2024
d348228
Merge pull request #3046 from kuzudb/fix-3042
ray6080 Mar 14, 2024
4a7b109
Update Debian version in build workflows (#3056)
mewim Mar 14, 2024
a9454b3
Implement duckdb scanner extension
acquamarin Mar 4, 2024
c3556e2
Merge pull request #3052 from kuzudb/duckdb-scanner
acquamarin Mar 15, 2024
2a3012c
Fix Hash index split slot ID when reserving a number of slots which a…
benjaminwinger Mar 15, 2024
28bd03b
Copy table function instead of passing raw pointer
andyfengHKU Mar 16, 2024
a612c0f
Merge pull request #3067 from kuzudb/table-function-copy
andyfengHKU Mar 16, 2024
1d7b9f3
Add replace func
andyfengHKU Mar 13, 2024
a0ee10e
Merge pull request #3069 from kuzudb/replace-func
andyfengHKU Mar 18, 2024
3db0f95
Merge pull request #3030 from kuzudb/hash_index_template_types
benjaminwinger Mar 18, 2024
f12e5e7
Remove unnecessary components for pip package (#3074)
mewim Mar 18, 2024
826927e
Merge pull request #3066 from kuzudb/hash-index-reserve-fix
benjaminwinger Mar 18, 2024
cc93226
Implement catalog cache for postgres scanner
acquamarin Mar 18, 2024
bd963c1
Merge pull request #3071 from kuzudb/catalog-cache
acquamarin Mar 18, 2024
35b9438
Rework Fixed-list
manh9203 Mar 11, 2024
e7c6d73
Merge pull request #3057 from kuzudb/fixed-list-rework
manh9203 Mar 18, 2024
e963df1
Implemented Progress Bar for ScanNodeID Operator (#3051)
MSebanc Mar 18, 2024
775d2e6
replace ValueVector with ColumnChunk in LocalStorage
ray6080 Mar 14, 2024
8854ebd
Merge pull request #3028 from kuzudb/refactor-local-storage
ray6080 Mar 19, 2024
3ce3b1f
fix rel insert and append sanityCheck for column chunk
ray6080 Mar 15, 2024
0531afe
Exclude extension files from the rust crate (#3076)
benjaminwinger Mar 19, 2024
907d831
Remove unnecessary components for pip package (#3085)
mewim Mar 19, 2024
c3decc2
Merge pull request #3081 from kuzudb/fix-rel-insert
ray6080 Mar 19, 2024
c39704d
fix deadlock issue due to bm no frame to claim exception and fix used…
ray6080 Mar 19, 2024
8b2c768
Merge pull request #3082 from kuzudb/fix-node-insert
ray6080 Mar 19, 2024
efdc1e4
Refactor arithmetic functions
manh9203 Mar 18, 2024
8fa40d6
Merge pull request #3079 from kuzudb/arithmetic-functions-refactor
manh9203 Mar 19, 2024
0ced885
Allowed for progress bar to be configurable by CALL (#3080)
MSebanc Mar 19, 2024
7a3ca59
Implement array functions
acquamarin Mar 19, 2024
04fcdec
Merge pull request #3087 from kuzudb/array-functions
acquamarin Mar 19, 2024
6860af0
Remove underscore from the badges in README (#3094)
mewim Mar 20, 2024
f69ad02
Fix python prepared statement null value
acquamarin Mar 20, 2024
e49bb30
Merge pull request #3098 from kuzudb/python-prepared-statement
acquamarin Mar 20, 2024
568e08e
Refactor string functions
manh9203 Mar 19, 2024
f8fe205
Merge pull request #3091 from kuzudb/string-functions-refactor
manh9203 Mar 20, 2024
05359c7
Arrow chunk_size as keyword argument (#3084)
prrao87 Mar 21, 2024
3c90c16
Update rustdoc to show how to enable parallel compilation (#3099)
prrao87 Mar 21, 2024
f6b1d6a
Improve copy-to-parquet perf
acquamarin Mar 21, 2024
7817cc9
Merge pull request #3105 from kuzudb/copy-to-parquet-perf
acquamarin Mar 21, 2024
68c2856
Refactor list functions
manh9203 Mar 19, 2024
96d9a91
Merge pull request #3100 from kuzudb/list-functions-refactor
manh9203 Mar 22, 2024
9effbb1
Refactor cast functions
manh9203 Mar 20, 2024
bdae55f
Merge pull request #3107 from kuzudb/cast-functions-refactor
manh9203 Mar 22, 2024
f9e1b12
Update `get_as_pl` (should always return a single chunk) (#3110)
alexander-beedie Mar 22, 2024
3f817f2
Add standard Python module __version__ attr (#3111)
alexander-beedie Mar 22, 2024
6d39076
Fix DuckDB build for macOS ARM and 32-bit (#3115)
mewim Mar 22, 2024
6e52e22
Add external object scan replacement
andyfengHKU Mar 16, 2024
d65c2b8
clean
andyfengHKU Mar 18, 2024
8f976e4
clean
andyfengHKU Mar 18, 2024
23144c3
pyarrow backend scanning for pandas
mxwli Feb 27, 2024
f0507b0
CLANG-TIDY
mxwli Mar 21, 2024
b97aab5
clang fix
mxwli Mar 21, 2024
cb4d757
clang
mxwli Mar 21, 2024
d4b261b
Merge pull request #3058 from kuzudb/pandas-pyarrow-backend
mxwli Mar 22, 2024
003a706
Add pull request template (#3118)
andyfengHKU Mar 22, 2024
8f37501
Added customizable delay before displaying progress bar (#3092)
MSebanc Mar 22, 2024
c8e4d5b
Hash index cleanup (#3088)
benjaminwinger Mar 22, 2024
167bb87
Fix launch database using homedir (#3108)
acquamarin Mar 22, 2024
7ec590a
remove dummy transactions (#3106)
hououou Mar 22, 2024
9247fd2
fix import database path (#3063)
hououou Mar 22, 2024
f9bc0c6
enable compression for INTERNAL_ID (#3116)
ray6080 Mar 23, 2024
e60e8cd
close 1646 (#3122)
ray6080 Mar 23, 2024
365815b
Refactor Partitioner to use ChunkedNodeGroupCollection (#3123)
ray6080 Mar 23, 2024
3a6bd7e
Replace with client context (#3121)
hououou Mar 23, 2024
599b80f
Rework var list storage layout (#3093)
hououou Mar 24, 2024
3ce064d
Fix 3127 (#3130)
acquamarin Mar 24, 2024
3813eed
Fix issue-3129 (#3131)
andyfengHKU Mar 24, 2024
53ef58e
Refactor scalar function registration (#3119)
manh9203 Mar 25, 2024
b208d15
Support multiple COPY statements on rel tables (#2989)
ray6080 Mar 25, 2024
ad31f02
initialize readfds via FD_ZERO before use (#3132)
neeraj9 Mar 25, 2024
a8b15dc
table scan/update/insert/delete state (#3072)
ray6080 Mar 25, 2024
4d21128
Support read after update (#3126)
andyfengHKU Mar 25, 2024
80b3e94
Factor out benchmark workflow and enable manual trigger for it (#3144)
mewim Mar 26, 2024
3237e6f
Implement postgres-scanner (#3139)
acquamarin Mar 26, 2024
de72fc9
Python List and Map Parameter Support (#3090)
mxwli Mar 26, 2024
a85f4fe
Cache DiskArray write header in-memory (#3109)
benjaminwinger Mar 26, 2024
fc3b4a7
Fix postgres scanner issues (#3148)
acquamarin Mar 26, 2024
c1f68cd
Refactor path functions and RDF functions (#3134)
manh9203 Mar 26, 2024
9ea80ec
Refactor aggregate functions (#3136)
manh9203 Mar 27, 2024
73ed1ea
Pandas Pyarrow Backend Bugfix and Tests (#3152)
mxwli Mar 27, 2024
677d35e
List Auxiliary Buffer NullMask Fix (#3156)
mxwli Mar 27, 2024
c747899
Add support to compute hash on list of struct (#3157)
acquamarin Mar 27, 2024
015bf23
Prepare Statement Improvement (#3140)
hououou Mar 28, 2024
6c82aad
resolve weird ANY resolution (#3160)
mxwli Mar 28, 2024
20bde3a
fix export test (#3164)
hououou Mar 28, 2024
956b3e3
Implement initcap/concat functions (#3161)
acquamarin Mar 28, 2024
2ec13b2
Fix issue 3070: Support extend from unwind node (#3153)
andyfengHKU Mar 28, 2024
08fd180
Add Pyarrow Map Scanning (#3158)
mxwli Mar 28, 2024
293b4e6
Fix export database regression (#3171)
andyfengHKU Mar 28, 2024
37b58bb
Fix hash aggregate edge case (#3172)
andyfengHKU Mar 28, 2024
20e5cbb
Added progress for in_query_call operators (#3120)
MSebanc Mar 28, 2024
cf71770
Fixed shell incorrect command seg fault (#3173)
MSebanc Mar 29, 2024
fb8f4c7
Cache files when replaying WAL (#3137)
benjaminwinger Mar 29, 2024
f80a6eb
Support join hash table on aggregate types (#3174)
acquamarin Mar 29, 2024
fa528c1
Fix delete then scan bug (#3176)
andyfengHKU Mar 30, 2024
4e406a1
Refactor sel vector interface (#3177)
andyfengHKU Mar 31, 2024
6f0d8f8
Fix issue 3151: disable null on internalID columns (#3165)
ray6080 Mar 31, 2024
6b1d45a
Rework DDL operators (#3178)
ray6080 Apr 1, 2024
ac9cbf3
Refactor table functions (#3155)
manh9203 Apr 1, 2024
a99ff6c
Rename VAR_LIST to LIST (#3170)
manh9203 Apr 1, 2024
add8473
Remove unused keywords in test runner (#3193)
hououou Apr 1, 2024
94fd5eb
Split extension tests as separate jobs (#2987)
mewim Apr 2, 2024
a95b29e
Added progress for aggregate scan and order by scan (#3192)
MSebanc Apr 2, 2024
0ad815e
Fix is null executor bug (#3197)
andyfengHKU Apr 2, 2024
f62e7c8
Fix order by radix sort bug (#3201)
acquamarin Apr 3, 2024
1f03f5a
Updated shell result truncation (#3206)
MSebanc Apr 3, 2024
1aaa21f
Fix-3200 (#3203)
prrao87 Apr 3, 2024
b3c6dc9
skip empty history file line (#3184)
neeraj9 Apr 4, 2024
fa0ef79
Merge duplicate key fix (#3207)
acquamarin Apr 4, 2024
37de692
Implemented progress for in memory RDF scan (#3208)
MSebanc Apr 4, 2024
ec6e309
Rework multiple query result (#3191)
hououou Apr 4, 2024
2100fa3
Fix constant compression in-place check for bools (#3211)
benjaminwinger Apr 5, 2024
8923c7f
Replace Slack link with Discord in contributing guideline (#3217)
mewim Apr 5, 2024
33111c8
fix pyarrow segfaulting on fedora 39 (#3213)
mxwli Apr 5, 2024
b3917d9
Bump clang-format to v18 and enable auto format (#3222)
mewim Apr 6, 2024
d946982
Check for format changes on master branch (#3223)
mewim Apr 6, 2024
8006723
CMAKE_CXX_FLAGS handling fails when variable is empty (#3228)
zaddach Apr 7, 2024
c6897b4
Remove extension test from `clang-build-test` job (#3231)
mewim Apr 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update get_as_pl (should always return a single chunk) (kuzudb#3110)
  • Loading branch information
alexander-beedie authored Mar 22, 2024
commit f9e1b12f5a0d36b216551e41f78d6cda97291046
33 changes: 13 additions & 20 deletions tools/python_api/src_py/query_result.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,20 +123,10 @@ def get_as_df(self) -> pd.DataFrame:

return self._query_result.getAsDF()

def get_as_pl(self, chunk_size: int | None = None) -> pl.DataFrame:
def get_as_pl(self) -> pl.DataFrame:
"""
Get the query result as a Polars DataFrame.

Parameters
----------
chunk_size : Number of rows to include in each chunk.
None
The chunk size is adaptive and depends on the number of columns in the query result.
-1 or 0
The entire result is returned as a single chunk.
> 0
The chunk size is the number of elements specified.

See Also
--------
get_as_df : Get the query result as a Pandas DataFrame.
Expand All @@ -151,7 +141,11 @@ def get_as_pl(self, chunk_size: int | None = None) -> pl.DataFrame:

self.check_for_query_result_close()

return pl.from_arrow(data=self.get_as_arrow(chunk_size=chunk_size))
# note: polars should always export just a single chunk,
# (eg: "-1") otherwise it will just need to rechunk anyway
return pl.from_arrow( # type: ignore[return-value]
data=self.get_as_arrow(chunk_size=-1),
)

def get_as_arrow(self, chunk_size: int | None = None) -> pa.Table:
"""
Expand All @@ -165,7 +159,7 @@ def get_as_arrow(self, chunk_size: int | None = None) -> pa.Table:
-1 or 0
The entire result is returned as a single chunk.
> 0
The chunk size is the number of elements specified.
The chunk size is the number of rows specified.

See Also
--------
Expand All @@ -180,16 +174,15 @@ def get_as_arrow(self, chunk_size: int | None = None) -> pa.Table:
self.check_for_query_result_close()

if chunk_size is None:
# Adaptive chunk_size; target number of elements per chunk_size
target_chunk_size = max(1_000_000 // len(self.get_column_names()), 10)
# Adaptive; target 10m total elements in each chunk.
# (eg: if we had 10 cols, this would result in a 1m row chunk_size).
target_n_elems = 10_000_000
chunk_size = max(target_n_elems // len(self.get_column_names()), 10)
elif chunk_size <= 0:
# No chunking: return the entire result as a single chunk
target_chunk_size = self.get_num_tuples()
else:
# Chunk size is the number of elements specified
target_chunk_size = chunk_size
chunk_size = self.get_num_tuples()

return self._query_result.getAsArrow(target_chunk_size)
return self._query_result.getAsArrow(chunk_size)

def get_column_data_types(self) -> list[str]:
"""
Expand Down
7 changes: 5 additions & 2 deletions tools/python_api/test/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,11 @@


def get_result(query_result: kuzu.QueryResult, result_type: str, chunk_size: int | None) -> Any:
sz = [] if chunk_size is None else [chunk_size]
return getattr(query_result, f"get_as_{result_type}")(*sz)
sz = [] if (chunk_size is None or result_type == "pl") else [chunk_size]
res = getattr(query_result, f"get_as_{result_type}")(*sz)
if result_type == "arrow" and chunk_size:
assert res[0].num_chunks == max((len(res) // chunk_size), 1)
return res


def assert_column_equals(data: Any, col_name: str, return_type: str, expected_values: list[Any]) -> None:
Expand Down