Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from kuzudb:master #2

Merged
merged 174 commits into from
Apr 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
174 commits
Select commit Hold shift + click to select a range
0d84c73
Support Polars DataFrame export from QueryResult (#2985)
alexander-beedie Mar 4, 2024
89598fd
clean up transaction pointer in physical operator
ray6080 Mar 4, 2024
1131621
Merge pull request #2990 from kuzudb/clean-operator-transaction
ray6080 Mar 4, 2024
3415ff1
Store a stable reference instead of a duplicate string in the ColumnC…
benjaminwinger Mar 4, 2024
c1b2220
fix reset empty heap overflow
ray6080 Mar 2, 2024
93e6b3e
Merge pull request #2994 from kuzudb/string-column-chunk-index
benjaminwinger Mar 5, 2024
1b858cf
Merge pull request #2996 from kuzudb/fix-reset-empty
ray6080 Mar 6, 2024
9e23995
Rework CSV_TO_PARQUET testing feature
manh9203 Feb 29, 2024
ad24bf7
Avoid moving DictionaryChunks
benjaminwinger Mar 6, 2024
74c2f80
Merge pull request #2999 from kuzudb/dictionary-memory-fix
ray6080 Mar 7, 2024
5e598ec
update links to website (#3000)
ray6080 Mar 7, 2024
38e4398
Re-write partitioner to use ColumnChunks instead of ValueVectors
benjaminwinger Feb 27, 2024
b2f50ac
Abstract client config
andyfengHKU Mar 8, 2024
b7e3bc7
Merge pull request #2979 from kuzudb/rel-memory-fix
benjaminwinger Mar 8, 2024
c554a20
Merge pull request #3010 from kuzudb/add-client-config
andyfengHKU Mar 8, 2024
45c5aa9
Support use of QueryResult as a context manager (#3009)
alexander-beedie Mar 9, 2024
a3a6c2a
Pass client context to binder
andyfengHKU Mar 8, 2024
4692c54
Merge pull request #3015 from kuzudb/pass-client-context-binder
andyfengHKU Mar 10, 2024
020c09b
Refactor cast functions
andyfengHKU Mar 9, 2024
dc9771f
Merge pull request #3016 from kuzudb/refactor-cast-function-binding
andyfengHKU Mar 10, 2024
c149349
Combine append(ValueVector) with appendOne
ray6080 Mar 10, 2024
2a5948f
clean up unique_ptr of LogicalType in NodeGroup and BatchInsert
ray6080 Mar 10, 2024
f4a95ab
Merge pull request #3018 from kuzudb/clean-unique-ptr
ray6080 Mar 11, 2024
6c01c80
handle multiple database instantiations for import caching
mxwli Feb 28, 2024
3f84585
Revert "Revert "Implement Python Import Caching""
mxwli Mar 11, 2024
67e9204
Merge pull request #3017 from kuzudb/remove-append-one
ray6080 Mar 11, 2024
8642ccb
Merge pull request #3025 from kuzudb/import-cache-fix-and-revert-revert
mxwli Mar 11, 2024
7c25a3b
Rewrite the Hash Index overflow file to support multiple copies and f…
benjaminwinger Mar 7, 2024
2397c02
Fix issue-2984
andyfengHKU Mar 11, 2024
a6c7e21
Merge pull request #3026 from kuzudb/issue-2984
andyfengHKU Mar 11, 2024
8f5f64a
Add multiplaform test report bot (#3027)
mewim Mar 12, 2024
3bdc752
Python API typing, lint, config/makefile (#3023)
alexander-beedie Mar 12, 2024
18c2c8f
Fix unicode conversion for pandas dataframe (#3029)
mewim Mar 12, 2024
339a471
Update LICENSE
semihsalihoglu-uw Mar 12, 2024
1d5df7f
Merge pull request #3031 from kuzudb/semihsalihoglu-uw-patch-1
semihsalihoglu-uw Mar 12, 2024
0c26056
Merge pull request #3012 from kuzudb/multi-copy-overflow-file
benjaminwinger Mar 12, 2024
bdd650e
Add copy from subquery
andyfengHKU Mar 4, 2024
af50489
Insert into the hash index builder one chunk at a time
benjaminwinger Mar 5, 2024
7a3ff60
Merge pull request #3020 from kuzudb/copy-from-subquery
andyfengHKU Mar 12, 2024
d9d277f
Fix issue-3004
andyfengHKU Mar 12, 2024
891c115
Merge pull request #3036 from kuzudb/issue-3004
andyfengHKU Mar 13, 2024
7da8a62
Optimise Python unit test runtime (~7x speedup) (#3032)
alexander-beedie Mar 13, 2024
7c16897
Add more parameter types for Node.js API (#3037)
mewim Mar 13, 2024
d8487a0
Merge pull request #2997 from kuzudb/hash-index-builder-chunks
benjaminwinger Mar 13, 2024
b304389
Remove the constraint on the HashIndexBuilder template parameter
benjaminwinger Mar 12, 2024
0dbcef6
Allow CI workflow to be manually dispatched (#3043)
mewim Mar 13, 2024
2dfe495
Bump extensions version to 0.2.0 (#3041)
mewim Mar 13, 2024
7110f91
First-pass lint/format for Python `shell` tests (#3034)
alexander-beedie Mar 13, 2024
930ba45
Bump master branch version to 0.3.2.1 (#3044)
mewim Mar 14, 2024
1b6f741
Fixed failing shell tests (#3045)
MSebanc Mar 14, 2024
ff186a5
Add shell tests to CI (#3039)
mewim Mar 14, 2024
77489a5
fix issue 3042
ray6080 Mar 13, 2024
0b7adb9
fix sliding out-of-place commit and null strings
ray6080 Mar 12, 2024
f8efa2a
Merge pull request #3055 from kuzudb/fix-rel-insert-bug
ray6080 Mar 14, 2024
1f88b3f
rework local storage: separate the storage of insertions and updates
ray6080 Mar 14, 2024
4f06cf1
Merge pull request #2982 from kuzudb/multi-copy-rel-s1
ray6080 Mar 14, 2024
d348228
Merge pull request #3046 from kuzudb/fix-3042
ray6080 Mar 14, 2024
4a7b109
Update Debian version in build workflows (#3056)
mewim Mar 14, 2024
a9454b3
Implement duckdb scanner extension
acquamarin Mar 4, 2024
c3556e2
Merge pull request #3052 from kuzudb/duckdb-scanner
acquamarin Mar 15, 2024
2a3012c
Fix Hash index split slot ID when reserving a number of slots which a…
benjaminwinger Mar 15, 2024
28bd03b
Copy table function instead of passing raw pointer
andyfengHKU Mar 16, 2024
a612c0f
Merge pull request #3067 from kuzudb/table-function-copy
andyfengHKU Mar 16, 2024
1d7b9f3
Add replace func
andyfengHKU Mar 13, 2024
a0ee10e
Merge pull request #3069 from kuzudb/replace-func
andyfengHKU Mar 18, 2024
3db0f95
Merge pull request #3030 from kuzudb/hash_index_template_types
benjaminwinger Mar 18, 2024
f12e5e7
Remove unnecessary components for pip package (#3074)
mewim Mar 18, 2024
826927e
Merge pull request #3066 from kuzudb/hash-index-reserve-fix
benjaminwinger Mar 18, 2024
cc93226
Implement catalog cache for postgres scanner
acquamarin Mar 18, 2024
bd963c1
Merge pull request #3071 from kuzudb/catalog-cache
acquamarin Mar 18, 2024
35b9438
Rework Fixed-list
manh9203 Mar 11, 2024
e7c6d73
Merge pull request #3057 from kuzudb/fixed-list-rework
manh9203 Mar 18, 2024
e963df1
Implemented Progress Bar for ScanNodeID Operator (#3051)
MSebanc Mar 18, 2024
775d2e6
replace ValueVector with ColumnChunk in LocalStorage
ray6080 Mar 14, 2024
8854ebd
Merge pull request #3028 from kuzudb/refactor-local-storage
ray6080 Mar 19, 2024
3ce3b1f
fix rel insert and append sanityCheck for column chunk
ray6080 Mar 15, 2024
0531afe
Exclude extension files from the rust crate (#3076)
benjaminwinger Mar 19, 2024
907d831
Remove unnecessary components for pip package (#3085)
mewim Mar 19, 2024
c3decc2
Merge pull request #3081 from kuzudb/fix-rel-insert
ray6080 Mar 19, 2024
c39704d
fix deadlock issue due to bm no frame to claim exception and fix used…
ray6080 Mar 19, 2024
8b2c768
Merge pull request #3082 from kuzudb/fix-node-insert
ray6080 Mar 19, 2024
efdc1e4
Refactor arithmetic functions
manh9203 Mar 18, 2024
8fa40d6
Merge pull request #3079 from kuzudb/arithmetic-functions-refactor
manh9203 Mar 19, 2024
0ced885
Allowed for progress bar to be configurable by CALL (#3080)
MSebanc Mar 19, 2024
7a3ca59
Implement array functions
acquamarin Mar 19, 2024
04fcdec
Merge pull request #3087 from kuzudb/array-functions
acquamarin Mar 19, 2024
6860af0
Remove underscore from the badges in README (#3094)
mewim Mar 20, 2024
f69ad02
Fix python prepared statement null value
acquamarin Mar 20, 2024
e49bb30
Merge pull request #3098 from kuzudb/python-prepared-statement
acquamarin Mar 20, 2024
568e08e
Refactor string functions
manh9203 Mar 19, 2024
f8fe205
Merge pull request #3091 from kuzudb/string-functions-refactor
manh9203 Mar 20, 2024
05359c7
Arrow chunk_size as keyword argument (#3084)
prrao87 Mar 21, 2024
3c90c16
Update rustdoc to show how to enable parallel compilation (#3099)
prrao87 Mar 21, 2024
f6b1d6a
Improve copy-to-parquet perf
acquamarin Mar 21, 2024
7817cc9
Merge pull request #3105 from kuzudb/copy-to-parquet-perf
acquamarin Mar 21, 2024
68c2856
Refactor list functions
manh9203 Mar 19, 2024
96d9a91
Merge pull request #3100 from kuzudb/list-functions-refactor
manh9203 Mar 22, 2024
9effbb1
Refactor cast functions
manh9203 Mar 20, 2024
bdae55f
Merge pull request #3107 from kuzudb/cast-functions-refactor
manh9203 Mar 22, 2024
f9e1b12
Update `get_as_pl` (should always return a single chunk) (#3110)
alexander-beedie Mar 22, 2024
3f817f2
Add standard Python module __version__ attr (#3111)
alexander-beedie Mar 22, 2024
6d39076
Fix DuckDB build for macOS ARM and 32-bit (#3115)
mewim Mar 22, 2024
6e52e22
Add external object scan replacement
andyfengHKU Mar 16, 2024
d65c2b8
clean
andyfengHKU Mar 18, 2024
8f976e4
clean
andyfengHKU Mar 18, 2024
23144c3
pyarrow backend scanning for pandas
mxwli Feb 27, 2024
f0507b0
CLANG-TIDY
mxwli Mar 21, 2024
b97aab5
clang fix
mxwli Mar 21, 2024
cb4d757
clang
mxwli Mar 21, 2024
d4b261b
Merge pull request #3058 from kuzudb/pandas-pyarrow-backend
mxwli Mar 22, 2024
003a706
Add pull request template (#3118)
andyfengHKU Mar 22, 2024
8f37501
Added customizable delay before displaying progress bar (#3092)
MSebanc Mar 22, 2024
c8e4d5b
Hash index cleanup (#3088)
benjaminwinger Mar 22, 2024
167bb87
Fix launch database using homedir (#3108)
acquamarin Mar 22, 2024
7ec590a
remove dummy transactions (#3106)
hououou Mar 22, 2024
9247fd2
fix import database path (#3063)
hououou Mar 22, 2024
f9bc0c6
enable compression for INTERNAL_ID (#3116)
ray6080 Mar 23, 2024
e60e8cd
close 1646 (#3122)
ray6080 Mar 23, 2024
365815b
Refactor Partitioner to use ChunkedNodeGroupCollection (#3123)
ray6080 Mar 23, 2024
3a6bd7e
Replace with client context (#3121)
hououou Mar 23, 2024
599b80f
Rework var list storage layout (#3093)
hououou Mar 24, 2024
3ce064d
Fix 3127 (#3130)
acquamarin Mar 24, 2024
3813eed
Fix issue-3129 (#3131)
andyfengHKU Mar 24, 2024
53ef58e
Refactor scalar function registration (#3119)
manh9203 Mar 25, 2024
b208d15
Support multiple COPY statements on rel tables (#2989)
ray6080 Mar 25, 2024
ad31f02
initialize readfds via FD_ZERO before use (#3132)
neeraj9 Mar 25, 2024
a8b15dc
table scan/update/insert/delete state (#3072)
ray6080 Mar 25, 2024
4d21128
Support read after update (#3126)
andyfengHKU Mar 25, 2024
80b3e94
Factor out benchmark workflow and enable manual trigger for it (#3144)
mewim Mar 26, 2024
3237e6f
Implement postgres-scanner (#3139)
acquamarin Mar 26, 2024
de72fc9
Python List and Map Parameter Support (#3090)
mxwli Mar 26, 2024
a85f4fe
Cache DiskArray write header in-memory (#3109)
benjaminwinger Mar 26, 2024
fc3b4a7
Fix postgres scanner issues (#3148)
acquamarin Mar 26, 2024
c1f68cd
Refactor path functions and RDF functions (#3134)
manh9203 Mar 26, 2024
9ea80ec
Refactor aggregate functions (#3136)
manh9203 Mar 27, 2024
73ed1ea
Pandas Pyarrow Backend Bugfix and Tests (#3152)
mxwli Mar 27, 2024
677d35e
List Auxiliary Buffer NullMask Fix (#3156)
mxwli Mar 27, 2024
c747899
Add support to compute hash on list of struct (#3157)
acquamarin Mar 27, 2024
015bf23
Prepare Statement Improvement (#3140)
hououou Mar 28, 2024
6c82aad
resolve weird ANY resolution (#3160)
mxwli Mar 28, 2024
20bde3a
fix export test (#3164)
hououou Mar 28, 2024
956b3e3
Implement initcap/concat functions (#3161)
acquamarin Mar 28, 2024
2ec13b2
Fix issue 3070: Support extend from unwind node (#3153)
andyfengHKU Mar 28, 2024
08fd180
Add Pyarrow Map Scanning (#3158)
mxwli Mar 28, 2024
293b4e6
Fix export database regression (#3171)
andyfengHKU Mar 28, 2024
37b58bb
Fix hash aggregate edge case (#3172)
andyfengHKU Mar 28, 2024
20e5cbb
Added progress for in_query_call operators (#3120)
MSebanc Mar 28, 2024
cf71770
Fixed shell incorrect command seg fault (#3173)
MSebanc Mar 29, 2024
fb8f4c7
Cache files when replaying WAL (#3137)
benjaminwinger Mar 29, 2024
f80a6eb
Support join hash table on aggregate types (#3174)
acquamarin Mar 29, 2024
fa528c1
Fix delete then scan bug (#3176)
andyfengHKU Mar 30, 2024
4e406a1
Refactor sel vector interface (#3177)
andyfengHKU Mar 31, 2024
6f0d8f8
Fix issue 3151: disable null on internalID columns (#3165)
ray6080 Mar 31, 2024
6b1d45a
Rework DDL operators (#3178)
ray6080 Apr 1, 2024
ac9cbf3
Refactor table functions (#3155)
manh9203 Apr 1, 2024
a99ff6c
Rename VAR_LIST to LIST (#3170)
manh9203 Apr 1, 2024
add8473
Remove unused keywords in test runner (#3193)
hououou Apr 1, 2024
94fd5eb
Split extension tests as separate jobs (#2987)
mewim Apr 2, 2024
a95b29e
Added progress for aggregate scan and order by scan (#3192)
MSebanc Apr 2, 2024
0ad815e
Fix is null executor bug (#3197)
andyfengHKU Apr 2, 2024
f62e7c8
Fix order by radix sort bug (#3201)
acquamarin Apr 3, 2024
1f03f5a
Updated shell result truncation (#3206)
MSebanc Apr 3, 2024
1aaa21f
Fix-3200 (#3203)
prrao87 Apr 3, 2024
b3c6dc9
skip empty history file line (#3184)
neeraj9 Apr 4, 2024
fa0ef79
Merge duplicate key fix (#3207)
acquamarin Apr 4, 2024
37de692
Implemented progress for in memory RDF scan (#3208)
MSebanc Apr 4, 2024
ec6e309
Rework multiple query result (#3191)
hououou Apr 4, 2024
2100fa3
Fix constant compression in-place check for bools (#3211)
benjaminwinger Apr 5, 2024
8923c7f
Replace Slack link with Discord in contributing guideline (#3217)
mewim Apr 5, 2024
33111c8
fix pyarrow segfaulting on fedora 39 (#3213)
mxwli Apr 5, 2024
b3917d9
Bump clang-format to v18 and enable auto format (#3222)
mewim Apr 6, 2024
d946982
Check for format changes on master branch (#3223)
mewim Apr 6, 2024
8006723
CMAKE_CXX_FLAGS handling fails when variable is empty (#3228)
zaddach Apr 7, 2024
c6897b4
Remove extension test from `clang-build-test` job (#3231)
mewim Apr 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
rework local storage: separate the storage of insertions and updates
  • Loading branch information
ray6080 committed Mar 14, 2024
commit 1f88b3f1a2160d416bfce6a0693b2f094ce94479
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cmake_minimum_required(VERSION 3.15)

project(Kuzu VERSION 0.3.2.1 LANGUAGES CXX C)
project(Kuzu VERSION 0.3.2.2 LANGUAGES CXX C)

find_package(Threads REQUIRED)

Expand Down
41 changes: 24 additions & 17 deletions src/catalog/catalog_entry/rel_table_catalog_entry.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@

#include "catalog/catalog.h"

using namespace kuzu::common;

namespace kuzu {
namespace catalog {

RelTableCatalogEntry::RelTableCatalogEntry(std::string name, common::table_id_t tableID,
RelTableCatalogEntry::RelTableCatalogEntry(std::string name, table_id_t tableID,
common::RelMultiplicity srcMultiplicity, common::RelMultiplicity dstMultiplicity,
common::table_id_t srcTableID, common::table_id_t dstTableID)
table_id_t srcTableID, table_id_t dstTableID)
: TableCatalogEntry{CatalogEntryType::REL_TABLE_ENTRY, std::move(name), tableID},
srcMultiplicity{srcMultiplicity}, dstMultiplicity{dstMultiplicity}, srcTableID{srcTableID},
dstTableID{dstTableID} {}
Expand All @@ -20,27 +22,32 @@ RelTableCatalogEntry::RelTableCatalogEntry(const RelTableCatalogEntry& other)
dstTableID = other.dstTableID;
}

bool RelTableCatalogEntry::isParent(common::table_id_t tableID) {
bool RelTableCatalogEntry::isParent(table_id_t tableID) {
return srcTableID == tableID || dstTableID == tableID;
}

bool RelTableCatalogEntry::isSingleMultiplicity(common::RelDataDirection direction) const {
column_id_t RelTableCatalogEntry::getColumnID(property_id_t propertyID) const {
auto it = std::find_if(properties.begin(), properties.end(),
[&propertyID](const auto& property) { return property.getPropertyID() == propertyID; });
// Skip the first column in the rel table, which is reserved for nbrID.
return it == properties.end() ? common::INVALID_COLUMN_ID :
std::distance(properties.begin(), it) + 1;
}

bool RelTableCatalogEntry::isSingleMultiplicity(RelDataDirection direction) const {
return getMultiplicity(direction) == common::RelMultiplicity::ONE;
}
common::RelMultiplicity RelTableCatalogEntry::getMultiplicity(
common::RelDataDirection direction) const {
return direction == common::RelDataDirection::FWD ? dstMultiplicity : srcMultiplicity;
common::RelMultiplicity RelTableCatalogEntry::getMultiplicity(RelDataDirection direction) const {
return direction == RelDataDirection::FWD ? dstMultiplicity : srcMultiplicity;
}
common::table_id_t RelTableCatalogEntry::getBoundTableID(
common::RelDataDirection relDirection) const {
return relDirection == common::RelDataDirection::FWD ? srcTableID : dstTableID;
table_id_t RelTableCatalogEntry::getBoundTableID(RelDataDirection relDirection) const {
return relDirection == RelDataDirection::FWD ? srcTableID : dstTableID;
}
common::table_id_t RelTableCatalogEntry::getNbrTableID(
common::RelDataDirection relDirection) const {
return relDirection == common::RelDataDirection::FWD ? dstTableID : srcTableID;
table_id_t RelTableCatalogEntry::getNbrTableID(RelDataDirection relDirection) const {
return relDirection == RelDataDirection::FWD ? dstTableID : srcTableID;
}

void RelTableCatalogEntry::serialize(common::Serializer& serializer) const {
void RelTableCatalogEntry::serialize(Serializer& serializer) const {
TableCatalogEntry::serialize(serializer);
serializer.write(srcMultiplicity);
serializer.write(dstMultiplicity);
Expand All @@ -49,11 +56,11 @@ void RelTableCatalogEntry::serialize(common::Serializer& serializer) const {
}

std::unique_ptr<RelTableCatalogEntry> RelTableCatalogEntry::deserialize(
common::Deserializer& deserializer) {
Deserializer& deserializer) {
common::RelMultiplicity srcMultiplicity;
common::RelMultiplicity dstMultiplicity;
common::table_id_t srcTableID;
common::table_id_t dstTableID;
table_id_t srcTableID;
table_id_t dstTableID;
deserializer.deserializeValue(srcMultiplicity);
deserializer.deserializeValue(dstMultiplicity);
deserializer.deserializeValue(srcTableID);
Expand Down
42 changes: 16 additions & 26 deletions src/common/data_chunk/data_chunk_collection.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,35 +7,35 @@ DataChunkCollection::DataChunkCollection(storage::MemoryManager* mm) : mm{mm} {}

void DataChunkCollection::append(DataChunk& chunk) {
auto numTuplesToAppend = chunk.state->selVector->selectedSize;
auto chunkToAppendInfo = chunks.empty() ? allocateChunk(chunk) : chunks.back().get();
auto numTuplesAppended = 0u;
while (numTuplesAppended < numTuplesToAppend) {
if (chunkToAppendInfo->state->selVector->selectedSize == DEFAULT_VECTOR_CAPACITY) {
chunkToAppendInfo = allocateChunk(chunk);
if (chunks.empty() ||
chunks.back().state->selVector->selectedSize == DEFAULT_VECTOR_CAPACITY) {
allocateChunk(chunk);
}
auto& chunkToAppend = chunks.back();
auto numTuplesToCopy = std::min(numTuplesToAppend - numTuplesAppended,
DEFAULT_VECTOR_CAPACITY - chunkToAppendInfo->state->selVector->selectedSize);
DEFAULT_VECTOR_CAPACITY - chunkToAppend.state->selVector->selectedSize);
for (auto vectorIdx = 0u; vectorIdx < chunk.getNumValueVectors(); vectorIdx++) {
for (auto i = 0u; i < numTuplesToCopy; i++) {
auto srcPos = chunk.state->selVector->selectedPositions[numTuplesAppended + i];
auto dstPos = chunkToAppendInfo->state->selVector->selectedSize + i;
chunkToAppendInfo->getValueVector(vectorIdx)->copyFromVectorData(
auto dstPos = chunkToAppend.state->selVector->selectedSize + i;
chunkToAppend.getValueVector(vectorIdx)->copyFromVectorData(
dstPos, chunk.getValueVector(vectorIdx).get(), srcPos);
}
}
chunkToAppendInfo->state->selVector->selectedSize += numTuplesToCopy;
chunkToAppend.state->selVector->selectedSize += numTuplesToCopy;
numTuplesAppended += numTuplesToCopy;
}
}

void DataChunkCollection::append(std::unique_ptr<DataChunk> chunk) {
KU_ASSERT(chunk);
void DataChunkCollection::merge(DataChunk chunk) {
if (chunks.empty()) {
initTypes(*chunk);
initTypes(chunk);
}
KU_ASSERT(chunk->getNumValueVectors() == types.size());
for (auto vectorIdx = 0u; vectorIdx < chunk->getNumValueVectors(); vectorIdx++) {
KU_ASSERT(chunk->getValueVector(vectorIdx)->dataType == types[vectorIdx]);
KU_ASSERT(chunk.getNumValueVectors() == types.size());
for (auto vectorIdx = 0u; vectorIdx < chunk.getNumValueVectors(); vectorIdx++) {
KU_ASSERT(chunk.getValueVector(vectorIdx)->dataType == types[vectorIdx]);
}
chunks.push_back(std::move(chunk));
}
Expand All @@ -47,28 +47,18 @@ void DataChunkCollection::initTypes(DataChunk& chunk) {
}
}

std::vector<common::DataChunk*> DataChunkCollection::getChunks() const {
std::vector<common::DataChunk*> ret;
ret.reserve(chunks.size());
for (auto& chunk : chunks) {
ret.push_back(chunk.get());
}
return ret;
}

DataChunk* DataChunkCollection::allocateChunk(DataChunk& chunk) {
void DataChunkCollection::allocateChunk(DataChunk& chunk) {
if (chunks.empty()) {
types.reserve(chunk.getNumValueVectors());
for (auto vectorIdx = 0u; vectorIdx < chunk.getNumValueVectors(); vectorIdx++) {
types.push_back(chunk.getValueVector(vectorIdx)->dataType);
}
}
auto newChunk = std::make_unique<DataChunk>(types.size(), std::make_shared<DataChunkState>());
DataChunk newChunk(types.size(), std::make_shared<DataChunkState>());
for (auto i = 0u; i < types.size(); i++) {
newChunk->insert(i, std::make_shared<ValueVector>(types[i], mm));
newChunk.insert(i, std::make_shared<ValueVector>(types[i], mm));
}
chunks.push_back(std::move(newChunk));
return chunks.back().get();
}

} // namespace common
Expand Down
9 changes: 9 additions & 0 deletions src/common/types/types.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,15 @@ std::vector<LogicalType> LogicalType::copy(const std::vector<LogicalType>& types
return typesCopy;
}

std::vector<LogicalType> LogicalType::copy(const std::vector<LogicalType*>& types) {
std::vector<LogicalType> typesCopy;
typesCopy.reserve(types.size());
for (auto& type : types) {
typesCopy.push_back(*type->copy());
}
return typesCopy;
}

PhysicalTypeID LogicalType::getPhysicalType(LogicalTypeID typeID) {
switch (typeID) {
case LogicalTypeID::ANY: {
Expand Down
8 changes: 3 additions & 5 deletions src/function/table/call/storage_info.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ struct StorageInfoSharedState final : public CallFuncSharedState {
columns.push_back(relTable->getCSRLengthColumn(RelDataDirection::FWD));
columns.push_back(relTable->getCSROffsetColumn(RelDataDirection::BWD));
columns.push_back(relTable->getCSRLengthColumn(RelDataDirection::BWD));
columns.push_back(relTable->getAdjColumn(RelDataDirection::FWD));
columns.push_back(relTable->getAdjColumn(RelDataDirection::BWD));
for (auto columnID = 0u; columnID < relTable->getNumColumns(); columnID++) {
auto column = relTable->getColumn(columnID, RelDataDirection::FWD);
auto collectedColumns = collectColumns(column);
Expand Down Expand Up @@ -167,10 +165,10 @@ static common::offset_t tableFunc(TableFuncInput& input, TableFuncOutput& output
while (true) {
if (localState->currChunkIdx < localState->dataChunkCollection->getNumChunks()) {
// Copy from local state chunk.
auto chunk = localState->dataChunkCollection->getChunk(localState->currChunkIdx);
auto numValuesToOutput = chunk->state->selVector->selectedSize;
auto& chunk = localState->dataChunkCollection->getChunkUnsafe(localState->currChunkIdx);
auto numValuesToOutput = chunk.state->selVector->selectedSize;
for (auto columnIdx = 0u; columnIdx < dataChunk.getNumValueVectors(); columnIdx++) {
auto localVector = chunk->getValueVector(columnIdx);
auto localVector = chunk.getValueVector(columnIdx);
auto outputVector = dataChunk.getValueVector(columnIdx);
for (auto i = 0u; i < numValuesToOutput; i++) {
outputVector->copyFromVectorData(i, localVector.get(), i);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class RelTableCatalogEntry final : public TableCatalogEntry {
//===--------------------------------------------------------------------===//
bool isParent(common::table_id_t tableID) override;
common::TableType getTableType() const override { return common::TableType::REL; }
common::column_id_t getColumnID(common::property_id_t propertyID) const override;
common::table_id_t getSrcTableID() const { return srcTableID; }
common::table_id_t getDstTableID() const { return dstTableID; }
bool isSingleMultiplicity(common::RelDataDirection direction) const;
Expand Down
4 changes: 2 additions & 2 deletions src/include/catalog/catalog_entry/table_catalog_entry.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class TableCatalogEntry : public CatalogEntry {
bool containProperty(const std::string& propertyName) const;
common::property_id_t getPropertyID(const std::string& propertyName) const;
const Property* getProperty(common::property_id_t propertyID) const;
common::column_id_t getColumnID(common::property_id_t propertyID) const;
virtual common::column_id_t getColumnID(common::property_id_t propertyID) const;
bool containPropertyType(const common::LogicalType& logicalType) const;
void addProperty(std::string propertyName, std::unique_ptr<common::LogicalType> dataType);
void dropProperty(common::property_id_t propertyID);
Expand All @@ -52,7 +52,7 @@ class TableCatalogEntry : public CatalogEntry {
static std::unique_ptr<TableCatalogEntry> deserialize(
common::Deserializer& deserializer, CatalogEntryType type);

private:
protected:
common::table_id_t tableID;
std::string comment;
common::property_id_t nextPID;
Expand Down
2 changes: 1 addition & 1 deletion src/include/common/column_data_format.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ namespace common {

enum class ColumnDataFormat : uint8_t { REGULAR = 0, CSR = 1 };

}
} // namespace common
} // namespace kuzu
24 changes: 14 additions & 10 deletions src/include/common/data_chunk/data_chunk_collection.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,36 +5,40 @@
namespace kuzu {
namespace common {

// TODO(Guodong/Ziyi): We should extend this to ColumnDataCollection, which takes ResultSet into
// consideration for storage and scan.
// TODO(Guodong): Should rework this to use ColumnChunk.
class DataChunkCollection {
public:
explicit DataChunkCollection(storage::MemoryManager* mm);
DELETE_COPY_DEFAULT_MOVE(DataChunkCollection);

void append(DataChunk& chunk);
void append(std::unique_ptr<DataChunk> chunk);
std::vector<common::DataChunk*> getChunks() const;

inline const std::vector<common::DataChunk>& getChunks() const { return chunks; }
inline std::vector<common::DataChunk>& getChunksUnsafe() { return chunks; }
inline uint64_t getNumChunks() const { return chunks.size(); }
inline DataChunk* getChunk(uint64_t idx) const {
inline const DataChunk& getChunk(uint64_t idx) const {
KU_ASSERT(idx < chunks.size());
return chunks[idx];
}
inline DataChunk& getChunkUnsafe(uint64_t idx) {
KU_ASSERT(idx < chunks.size());
return chunks[idx].get();
return chunks[idx];
}
inline void merge(DataChunkCollection* other) {
for (auto& chunk : other->chunks) {
append(std::move(chunk));
merge(std::move(chunk));
}
}
void merge(DataChunk chunk);

private:
DataChunk* allocateChunk(DataChunk& chunk);
void allocateChunk(DataChunk& chunk);

void initTypes(DataChunk& chunk);

private:
storage::MemoryManager* mm;
std::vector<LogicalType> types;
std::vector<std::unique_ptr<DataChunk>> chunks;
std::vector<DataChunk> chunks;
};

} // namespace common
Expand Down
1 change: 1 addition & 0 deletions src/include/common/types/types.h
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ class LogicalType {
static std::vector<std::unique_ptr<LogicalType>> copy(
const std::vector<std::unique_ptr<LogicalType>>& types);
static std::vector<LogicalType> copy(const std::vector<LogicalType>& types);
static std::vector<LogicalType> copy(const std::vector<LogicalType*>& types);

static std::unique_ptr<LogicalType> ANY() {
return std::make_unique<LogicalType>(LogicalTypeID::ANY);
Expand Down
5 changes: 4 additions & 1 deletion src/include/processor/operator/partitioner.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,10 +112,13 @@ class Partitioner : public Sink {
std::vector<common::partition_idx_t> numPartitions);

private:
common::DataChunk constructDataChunk(const std::vector<DataPos>& columnPositions,
const std::vector<common::LogicalType>& columnTypes, const ResultSet& resultSet,
const std::shared_ptr<common::DataChunkState>& state);
// TODO: For now, RelBatchInsert will guarantee all data are inside one data chunk. Should be
// generalized to resultSet later if needed.
void copyDataToPartitions(
common::partition_idx_t partitioningIdx, common::DataChunk* chunkToCopyFrom);
common::partition_idx_t partitioningIdx, common::DataChunk chunkToCopyFrom);

private:
// Same size as a value vector. Each thread will allocate a chunk for each node group,
Expand Down
34 changes: 10 additions & 24 deletions src/include/storage/local_storage/local_node_table.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,35 +11,26 @@ class LocalNodeNG final : public LocalNodeGroup {
public:
LocalNodeNG(common::offset_t nodeGroupStartOffset,
const std::vector<common::LogicalType*>& dataTypes, MemoryManager* mm)
: LocalNodeGroup{nodeGroupStartOffset, dataTypes, mm} {
insertInfo.resize(dataTypes.size());
updateInfo.resize(dataTypes.size());
}
: LocalNodeGroup{nodeGroupStartOffset, dataTypes, mm} {}

void scan(common::ValueVector* nodeIDVector, const std::vector<common::column_id_t>& columnIDs,
const std::vector<common::ValueVector*>& outputVectors);
void lookup(common::offset_t nodeOffset, common::column_id_t columnID,
common::ValueVector* outputVector, common::sel_t posInOutputVector);
void insert(common::ValueVector* nodeIDVector,
const std::vector<common::ValueVector*>& propertyVectors);
void update(common::ValueVector* nodeIDVector, common::column_id_t columnID,
common::ValueVector* propertyVector);
void delete_(common::ValueVector* nodeIDVector);

common::row_idx_t getRowIdx(common::column_id_t columnID, common::offset_t nodeOffset);
bool insert(std::vector<common::ValueVector*> nodeIDVectors,
std::vector<common::ValueVector*> propertyVectors) override;
bool update(std::vector<common::ValueVector*> nodeIDVectors, common::column_id_t columnID,
common::ValueVector* propertyVector) override;
bool delete_(
common::ValueVector* nodeIDVector, common::ValueVector* /*extraVector*/ = nullptr) override;

inline const offset_to_row_idx_t& getInsertInfoRef(common::column_id_t columnID) {
KU_ASSERT(columnID < insertInfo.size());
return insertInfo[columnID];
inline const offset_to_row_idx_t& getInsertInfoRef() {
return insertChunks.getOffsetToRowIdx();
}
inline const offset_to_row_idx_t& getUpdateInfoRef(common::column_id_t columnID) {
KU_ASSERT(columnID < updateInfo.size());
return updateInfo[columnID];
return getUpdateChunks(columnID).getOffsetToRowIdx();
}

private:
std::vector<offset_to_row_idx_t> insertInfo;
std::vector<offset_to_row_idx_t> updateInfo;
};

class LocalNodeTableData final : public LocalTableData {
Expand All @@ -52,11 +43,6 @@ class LocalNodeTableData final : public LocalTableData {
void lookup(common::ValueVector* nodeIDVector,
const std::vector<common::column_id_t>& columnIDs,
const std::vector<common::ValueVector*>& outputVectors);
void insert(common::ValueVector* nodeIDVector,
const std::vector<common::ValueVector*>& propertyVectors);
void update(common::ValueVector* nodeIDVector, common::column_id_t columnID,
common::ValueVector* propertyVector);
void delete_(common::ValueVector* nodeIDVector);

private:
LocalNodeGroup* getOrCreateLocalNodeGroup(common::ValueVector* nodeIDVector) override;
Expand Down
Loading