Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from kuzudb:master #2

Merged
merged 174 commits into from
Apr 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
174 commits
Select commit Hold shift + click to select a range
0d84c73
Support Polars DataFrame export from QueryResult (#2985)
alexander-beedie Mar 4, 2024
89598fd
clean up transaction pointer in physical operator
ray6080 Mar 4, 2024
1131621
Merge pull request #2990 from kuzudb/clean-operator-transaction
ray6080 Mar 4, 2024
3415ff1
Store a stable reference instead of a duplicate string in the ColumnC…
benjaminwinger Mar 4, 2024
c1b2220
fix reset empty heap overflow
ray6080 Mar 2, 2024
93e6b3e
Merge pull request #2994 from kuzudb/string-column-chunk-index
benjaminwinger Mar 5, 2024
1b858cf
Merge pull request #2996 from kuzudb/fix-reset-empty
ray6080 Mar 6, 2024
9e23995
Rework CSV_TO_PARQUET testing feature
manh9203 Feb 29, 2024
ad24bf7
Avoid moving DictionaryChunks
benjaminwinger Mar 6, 2024
74c2f80
Merge pull request #2999 from kuzudb/dictionary-memory-fix
ray6080 Mar 7, 2024
5e598ec
update links to website (#3000)
ray6080 Mar 7, 2024
38e4398
Re-write partitioner to use ColumnChunks instead of ValueVectors
benjaminwinger Feb 27, 2024
b2f50ac
Abstract client config
andyfengHKU Mar 8, 2024
b7e3bc7
Merge pull request #2979 from kuzudb/rel-memory-fix
benjaminwinger Mar 8, 2024
c554a20
Merge pull request #3010 from kuzudb/add-client-config
andyfengHKU Mar 8, 2024
45c5aa9
Support use of QueryResult as a context manager (#3009)
alexander-beedie Mar 9, 2024
a3a6c2a
Pass client context to binder
andyfengHKU Mar 8, 2024
4692c54
Merge pull request #3015 from kuzudb/pass-client-context-binder
andyfengHKU Mar 10, 2024
020c09b
Refactor cast functions
andyfengHKU Mar 9, 2024
dc9771f
Merge pull request #3016 from kuzudb/refactor-cast-function-binding
andyfengHKU Mar 10, 2024
c149349
Combine append(ValueVector) with appendOne
ray6080 Mar 10, 2024
2a5948f
clean up unique_ptr of LogicalType in NodeGroup and BatchInsert
ray6080 Mar 10, 2024
f4a95ab
Merge pull request #3018 from kuzudb/clean-unique-ptr
ray6080 Mar 11, 2024
6c01c80
handle multiple database instantiations for import caching
mxwli Feb 28, 2024
3f84585
Revert "Revert "Implement Python Import Caching""
mxwli Mar 11, 2024
67e9204
Merge pull request #3017 from kuzudb/remove-append-one
ray6080 Mar 11, 2024
8642ccb
Merge pull request #3025 from kuzudb/import-cache-fix-and-revert-revert
mxwli Mar 11, 2024
7c25a3b
Rewrite the Hash Index overflow file to support multiple copies and f…
benjaminwinger Mar 7, 2024
2397c02
Fix issue-2984
andyfengHKU Mar 11, 2024
a6c7e21
Merge pull request #3026 from kuzudb/issue-2984
andyfengHKU Mar 11, 2024
8f5f64a
Add multiplaform test report bot (#3027)
mewim Mar 12, 2024
3bdc752
Python API typing, lint, config/makefile (#3023)
alexander-beedie Mar 12, 2024
18c2c8f
Fix unicode conversion for pandas dataframe (#3029)
mewim Mar 12, 2024
339a471
Update LICENSE
semihsalihoglu-uw Mar 12, 2024
1d5df7f
Merge pull request #3031 from kuzudb/semihsalihoglu-uw-patch-1
semihsalihoglu-uw Mar 12, 2024
0c26056
Merge pull request #3012 from kuzudb/multi-copy-overflow-file
benjaminwinger Mar 12, 2024
bdd650e
Add copy from subquery
andyfengHKU Mar 4, 2024
af50489
Insert into the hash index builder one chunk at a time
benjaminwinger Mar 5, 2024
7a3ff60
Merge pull request #3020 from kuzudb/copy-from-subquery
andyfengHKU Mar 12, 2024
d9d277f
Fix issue-3004
andyfengHKU Mar 12, 2024
891c115
Merge pull request #3036 from kuzudb/issue-3004
andyfengHKU Mar 13, 2024
7da8a62
Optimise Python unit test runtime (~7x speedup) (#3032)
alexander-beedie Mar 13, 2024
7c16897
Add more parameter types for Node.js API (#3037)
mewim Mar 13, 2024
d8487a0
Merge pull request #2997 from kuzudb/hash-index-builder-chunks
benjaminwinger Mar 13, 2024
b304389
Remove the constraint on the HashIndexBuilder template parameter
benjaminwinger Mar 12, 2024
0dbcef6
Allow CI workflow to be manually dispatched (#3043)
mewim Mar 13, 2024
2dfe495
Bump extensions version to 0.2.0 (#3041)
mewim Mar 13, 2024
7110f91
First-pass lint/format for Python `shell` tests (#3034)
alexander-beedie Mar 13, 2024
930ba45
Bump master branch version to 0.3.2.1 (#3044)
mewim Mar 14, 2024
1b6f741
Fixed failing shell tests (#3045)
MSebanc Mar 14, 2024
ff186a5
Add shell tests to CI (#3039)
mewim Mar 14, 2024
77489a5
fix issue 3042
ray6080 Mar 13, 2024
0b7adb9
fix sliding out-of-place commit and null strings
ray6080 Mar 12, 2024
f8efa2a
Merge pull request #3055 from kuzudb/fix-rel-insert-bug
ray6080 Mar 14, 2024
1f88b3f
rework local storage: separate the storage of insertions and updates
ray6080 Mar 14, 2024
4f06cf1
Merge pull request #2982 from kuzudb/multi-copy-rel-s1
ray6080 Mar 14, 2024
d348228
Merge pull request #3046 from kuzudb/fix-3042
ray6080 Mar 14, 2024
4a7b109
Update Debian version in build workflows (#3056)
mewim Mar 14, 2024
a9454b3
Implement duckdb scanner extension
acquamarin Mar 4, 2024
c3556e2
Merge pull request #3052 from kuzudb/duckdb-scanner
acquamarin Mar 15, 2024
2a3012c
Fix Hash index split slot ID when reserving a number of slots which a…
benjaminwinger Mar 15, 2024
28bd03b
Copy table function instead of passing raw pointer
andyfengHKU Mar 16, 2024
a612c0f
Merge pull request #3067 from kuzudb/table-function-copy
andyfengHKU Mar 16, 2024
1d7b9f3
Add replace func
andyfengHKU Mar 13, 2024
a0ee10e
Merge pull request #3069 from kuzudb/replace-func
andyfengHKU Mar 18, 2024
3db0f95
Merge pull request #3030 from kuzudb/hash_index_template_types
benjaminwinger Mar 18, 2024
f12e5e7
Remove unnecessary components for pip package (#3074)
mewim Mar 18, 2024
826927e
Merge pull request #3066 from kuzudb/hash-index-reserve-fix
benjaminwinger Mar 18, 2024
cc93226
Implement catalog cache for postgres scanner
acquamarin Mar 18, 2024
bd963c1
Merge pull request #3071 from kuzudb/catalog-cache
acquamarin Mar 18, 2024
35b9438
Rework Fixed-list
manh9203 Mar 11, 2024
e7c6d73
Merge pull request #3057 from kuzudb/fixed-list-rework
manh9203 Mar 18, 2024
e963df1
Implemented Progress Bar for ScanNodeID Operator (#3051)
MSebanc Mar 18, 2024
775d2e6
replace ValueVector with ColumnChunk in LocalStorage
ray6080 Mar 14, 2024
8854ebd
Merge pull request #3028 from kuzudb/refactor-local-storage
ray6080 Mar 19, 2024
3ce3b1f
fix rel insert and append sanityCheck for column chunk
ray6080 Mar 15, 2024
0531afe
Exclude extension files from the rust crate (#3076)
benjaminwinger Mar 19, 2024
907d831
Remove unnecessary components for pip package (#3085)
mewim Mar 19, 2024
c3decc2
Merge pull request #3081 from kuzudb/fix-rel-insert
ray6080 Mar 19, 2024
c39704d
fix deadlock issue due to bm no frame to claim exception and fix used…
ray6080 Mar 19, 2024
8b2c768
Merge pull request #3082 from kuzudb/fix-node-insert
ray6080 Mar 19, 2024
efdc1e4
Refactor arithmetic functions
manh9203 Mar 18, 2024
8fa40d6
Merge pull request #3079 from kuzudb/arithmetic-functions-refactor
manh9203 Mar 19, 2024
0ced885
Allowed for progress bar to be configurable by CALL (#3080)
MSebanc Mar 19, 2024
7a3ca59
Implement array functions
acquamarin Mar 19, 2024
04fcdec
Merge pull request #3087 from kuzudb/array-functions
acquamarin Mar 19, 2024
6860af0
Remove underscore from the badges in README (#3094)
mewim Mar 20, 2024
f69ad02
Fix python prepared statement null value
acquamarin Mar 20, 2024
e49bb30
Merge pull request #3098 from kuzudb/python-prepared-statement
acquamarin Mar 20, 2024
568e08e
Refactor string functions
manh9203 Mar 19, 2024
f8fe205
Merge pull request #3091 from kuzudb/string-functions-refactor
manh9203 Mar 20, 2024
05359c7
Arrow chunk_size as keyword argument (#3084)
prrao87 Mar 21, 2024
3c90c16
Update rustdoc to show how to enable parallel compilation (#3099)
prrao87 Mar 21, 2024
f6b1d6a
Improve copy-to-parquet perf
acquamarin Mar 21, 2024
7817cc9
Merge pull request #3105 from kuzudb/copy-to-parquet-perf
acquamarin Mar 21, 2024
68c2856
Refactor list functions
manh9203 Mar 19, 2024
96d9a91
Merge pull request #3100 from kuzudb/list-functions-refactor
manh9203 Mar 22, 2024
9effbb1
Refactor cast functions
manh9203 Mar 20, 2024
bdae55f
Merge pull request #3107 from kuzudb/cast-functions-refactor
manh9203 Mar 22, 2024
f9e1b12
Update `get_as_pl` (should always return a single chunk) (#3110)
alexander-beedie Mar 22, 2024
3f817f2
Add standard Python module __version__ attr (#3111)
alexander-beedie Mar 22, 2024
6d39076
Fix DuckDB build for macOS ARM and 32-bit (#3115)
mewim Mar 22, 2024
6e52e22
Add external object scan replacement
andyfengHKU Mar 16, 2024
d65c2b8
clean
andyfengHKU Mar 18, 2024
8f976e4
clean
andyfengHKU Mar 18, 2024
23144c3
pyarrow backend scanning for pandas
mxwli Feb 27, 2024
f0507b0
CLANG-TIDY
mxwli Mar 21, 2024
b97aab5
clang fix
mxwli Mar 21, 2024
cb4d757
clang
mxwli Mar 21, 2024
d4b261b
Merge pull request #3058 from kuzudb/pandas-pyarrow-backend
mxwli Mar 22, 2024
003a706
Add pull request template (#3118)
andyfengHKU Mar 22, 2024
8f37501
Added customizable delay before displaying progress bar (#3092)
MSebanc Mar 22, 2024
c8e4d5b
Hash index cleanup (#3088)
benjaminwinger Mar 22, 2024
167bb87
Fix launch database using homedir (#3108)
acquamarin Mar 22, 2024
7ec590a
remove dummy transactions (#3106)
hououou Mar 22, 2024
9247fd2
fix import database path (#3063)
hououou Mar 22, 2024
f9bc0c6
enable compression for INTERNAL_ID (#3116)
ray6080 Mar 23, 2024
e60e8cd
close 1646 (#3122)
ray6080 Mar 23, 2024
365815b
Refactor Partitioner to use ChunkedNodeGroupCollection (#3123)
ray6080 Mar 23, 2024
3a6bd7e
Replace with client context (#3121)
hououou Mar 23, 2024
599b80f
Rework var list storage layout (#3093)
hououou Mar 24, 2024
3ce064d
Fix 3127 (#3130)
acquamarin Mar 24, 2024
3813eed
Fix issue-3129 (#3131)
andyfengHKU Mar 24, 2024
53ef58e
Refactor scalar function registration (#3119)
manh9203 Mar 25, 2024
b208d15
Support multiple COPY statements on rel tables (#2989)
ray6080 Mar 25, 2024
ad31f02
initialize readfds via FD_ZERO before use (#3132)
neeraj9 Mar 25, 2024
a8b15dc
table scan/update/insert/delete state (#3072)
ray6080 Mar 25, 2024
4d21128
Support read after update (#3126)
andyfengHKU Mar 25, 2024
80b3e94
Factor out benchmark workflow and enable manual trigger for it (#3144)
mewim Mar 26, 2024
3237e6f
Implement postgres-scanner (#3139)
acquamarin Mar 26, 2024
de72fc9
Python List and Map Parameter Support (#3090)
mxwli Mar 26, 2024
a85f4fe
Cache DiskArray write header in-memory (#3109)
benjaminwinger Mar 26, 2024
fc3b4a7
Fix postgres scanner issues (#3148)
acquamarin Mar 26, 2024
c1f68cd
Refactor path functions and RDF functions (#3134)
manh9203 Mar 26, 2024
9ea80ec
Refactor aggregate functions (#3136)
manh9203 Mar 27, 2024
73ed1ea
Pandas Pyarrow Backend Bugfix and Tests (#3152)
mxwli Mar 27, 2024
677d35e
List Auxiliary Buffer NullMask Fix (#3156)
mxwli Mar 27, 2024
c747899
Add support to compute hash on list of struct (#3157)
acquamarin Mar 27, 2024
015bf23
Prepare Statement Improvement (#3140)
hououou Mar 28, 2024
6c82aad
resolve weird ANY resolution (#3160)
mxwli Mar 28, 2024
20bde3a
fix export test (#3164)
hououou Mar 28, 2024
956b3e3
Implement initcap/concat functions (#3161)
acquamarin Mar 28, 2024
2ec13b2
Fix issue 3070: Support extend from unwind node (#3153)
andyfengHKU Mar 28, 2024
08fd180
Add Pyarrow Map Scanning (#3158)
mxwli Mar 28, 2024
293b4e6
Fix export database regression (#3171)
andyfengHKU Mar 28, 2024
37b58bb
Fix hash aggregate edge case (#3172)
andyfengHKU Mar 28, 2024
20e5cbb
Added progress for in_query_call operators (#3120)
MSebanc Mar 28, 2024
cf71770
Fixed shell incorrect command seg fault (#3173)
MSebanc Mar 29, 2024
fb8f4c7
Cache files when replaying WAL (#3137)
benjaminwinger Mar 29, 2024
f80a6eb
Support join hash table on aggregate types (#3174)
acquamarin Mar 29, 2024
fa528c1
Fix delete then scan bug (#3176)
andyfengHKU Mar 30, 2024
4e406a1
Refactor sel vector interface (#3177)
andyfengHKU Mar 31, 2024
6f0d8f8
Fix issue 3151: disable null on internalID columns (#3165)
ray6080 Mar 31, 2024
6b1d45a
Rework DDL operators (#3178)
ray6080 Apr 1, 2024
ac9cbf3
Refactor table functions (#3155)
manh9203 Apr 1, 2024
a99ff6c
Rename VAR_LIST to LIST (#3170)
manh9203 Apr 1, 2024
add8473
Remove unused keywords in test runner (#3193)
hououou Apr 1, 2024
94fd5eb
Split extension tests as separate jobs (#2987)
mewim Apr 2, 2024
a95b29e
Added progress for aggregate scan and order by scan (#3192)
MSebanc Apr 2, 2024
0ad815e
Fix is null executor bug (#3197)
andyfengHKU Apr 2, 2024
f62e7c8
Fix order by radix sort bug (#3201)
acquamarin Apr 3, 2024
1f03f5a
Updated shell result truncation (#3206)
MSebanc Apr 3, 2024
1aaa21f
Fix-3200 (#3203)
prrao87 Apr 3, 2024
b3c6dc9
skip empty history file line (#3184)
neeraj9 Apr 4, 2024
fa0ef79
Merge duplicate key fix (#3207)
acquamarin Apr 4, 2024
37de692
Implemented progress for in memory RDF scan (#3208)
MSebanc Apr 4, 2024
ec6e309
Rework multiple query result (#3191)
hououou Apr 4, 2024
2100fa3
Fix constant compression in-place check for bools (#3211)
benjaminwinger Apr 5, 2024
8923c7f
Replace Slack link with Discord in contributing guideline (#3217)
mewim Apr 5, 2024
33111c8
fix pyarrow segfaulting on fedora 39 (#3213)
mxwli Apr 5, 2024
b3917d9
Bump clang-format to v18 and enable auto format (#3222)
mewim Apr 6, 2024
d946982
Check for format changes on master branch (#3223)
mewim Apr 6, 2024
8006723
CMAKE_CXX_FLAGS handling fails when variable is empty (#3228)
zaddach Apr 7, 2024
c6897b4
Remove extension test from `clang-build-test` job (#3231)
mewim Apr 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
replace ValueVector with ColumnChunk in LocalStorage
  • Loading branch information
ray6080 committed Mar 19, 2024
commit 775d2e691ad99dacebc9151d6eb93c5763e760e8
4 changes: 4 additions & 0 deletions src/common/vector/value_vector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,10 @@ template<>
void ValueVector::setValue(uint32_t pos, std::string val) {
StringVector::addString(this, pos, val.data(), val.length());
}
template<>
void ValueVector::setValue(uint32_t pos, std::string_view val) {
StringVector::addString(this, pos, val.data(), val.length());
}

void ValueVector::setNull(uint32_t pos, bool isNull) {
nullMask->setNull(pos, isNull);
Expand Down
2 changes: 1 addition & 1 deletion src/include/processor/operator/persistent/batch_insert.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ struct BatchInsertSharedState {
};

struct BatchInsertLocalState {
std::unique_ptr<storage::NodeGroup> nodeGroup;
std::unique_ptr<storage::ChunkedNodeGroup> nodeGroup;

virtual ~BatchInsertLocalState() = default;
};
Expand Down
6 changes: 3 additions & 3 deletions src/include/processor/operator/persistent/node_batch_insert.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ struct NodeBatchInsertSharedState final : public BatchInsertSharedState {
uint64_t currentNodeGroupIdx;
// The sharedNodeGroup is to accumulate left data within local node groups in NodeBatchInsert
// ops.
std::unique_ptr<storage::NodeGroup> sharedNodeGroup;
std::unique_ptr<storage::ChunkedNodeGroup> sharedNodeGroup;

NodeBatchInsertSharedState(
storage::Table* table, std::shared_ptr<FactorizedTable> fTable, storage::WAL* wal)
Expand All @@ -60,7 +60,7 @@ struct NodeBatchInsertSharedState final : public BatchInsertSharedState {

inline uint64_t getCurNodeGroupIdx() const { return currentNodeGroupIdx; }

void appendIncompleteNodeGroup(std::unique_ptr<storage::NodeGroup> localNodeGroup,
void appendIncompleteNodeGroup(std::unique_ptr<storage::ChunkedNodeGroup> localNodeGroup,
std::optional<IndexBuilder>& indexBuilder);

inline common::offset_t getNextNodeGroupIdxWithoutLock() { return currentNodeGroupIdx++; }
Expand Down Expand Up @@ -107,7 +107,7 @@ class NodeBatchInsert final : public BatchInsert {

static void writeAndResetNodeGroup(common::node_group_idx_t nodeGroupIdx,
std::optional<IndexBuilder>& indexBuilder, common::column_id_t pkColumnID,
storage::NodeTable* table, storage::NodeGroup* nodeGroup);
storage::NodeTable* table, storage::ChunkedNodeGroup* nodeGroup);

private:
void copyToNodeGroup();
Expand Down
6 changes: 3 additions & 3 deletions src/include/processor/operator/persistent/rel_batch_insert.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,18 +66,18 @@ class RelBatchInsert final : public BatchInsert {

static common::length_t getGapSize(common::length_t length);
static std::vector<common::offset_t> populateStartCSROffsetsAndLengths(
storage::CSRHeaderChunks& csrHeader, common::offset_t numNodes,
storage::ChunkedCSRHeader& csrHeader, common::offset_t numNodes,
PartitioningBuffer::Partition& partition, common::vector_idx_t offsetVectorIdx);
static void populateEndCSROffsets(
storage::CSRHeaderChunks& csrHeader, std::vector<common::offset_t>& gaps);
storage::ChunkedCSRHeader& csrHeader, std::vector<common::offset_t>& gaps);
static void setOffsetToWithinNodeGroup(
storage::ColumnChunk& chunk, common::offset_t startOffset);
static void setOffsetFromCSROffsets(
storage::ColumnChunk* nodeOffsetChunk, storage::ColumnChunk* csrOffsetChunk);

// We only check rel multiplcity constraint (MANY_ONE, ONE_ONE) for now.
std::optional<common::offset_t> checkRelMultiplicityConstraint(
const storage::CSRHeaderChunks& csrHeader);
const storage::ChunkedCSRHeader& csrHeader);

private:
std::shared_ptr<PartitionerSharedState> partitionerSharedState;
Expand Down
12 changes: 7 additions & 5 deletions src/include/storage/local_storage/local_node_table.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,18 @@

#include <utility>

#include "common/copy_constructors.h"
#include "local_table.h"

namespace kuzu {
namespace storage {

class LocalNodeNG final : public LocalNodeGroup {
public:
LocalNodeNG(common::offset_t nodeGroupStartOffset,
const std::vector<common::LogicalType*>& dataTypes, MemoryManager* mm)
: LocalNodeGroup{nodeGroupStartOffset, dataTypes, mm} {}
LocalNodeNG(
common::offset_t nodeGroupStartOffset, const std::vector<common::LogicalType>& dataTypes)
: LocalNodeGroup{nodeGroupStartOffset, dataTypes} {}
DELETE_COPY_DEFAULT_MOVE(LocalNodeNG);

void scan(common::ValueVector* nodeIDVector, const std::vector<common::column_id_t>& columnIDs,
const std::vector<common::ValueVector*>& outputVectors);
Expand All @@ -35,8 +37,8 @@ class LocalNodeNG final : public LocalNodeGroup {

class LocalNodeTableData final : public LocalTableData {
public:
LocalNodeTableData(std::vector<common::LogicalType*> dataTypes, MemoryManager* mm)
: LocalTableData{std::move(dataTypes), mm} {}
explicit LocalNodeTableData(std::vector<common::LogicalType> dataTypes)
: LocalTableData{std::move(dataTypes)} {}

void scan(common::ValueVector* nodeIDVector, const std::vector<common::column_id_t>& columnIDs,
const std::vector<common::ValueVector*>& outputVectors);
Expand Down
12 changes: 7 additions & 5 deletions src/include/storage/local_storage/local_rel_table.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#pragma once

#include "common/copy_constructors.h"
#include "common/enums/rel_multiplicity.h"
#include "common/vector/value_vector.h"
#include "storage/local_storage/local_table.h"
Expand All @@ -14,8 +15,9 @@ class LocalRelNG final : public LocalNodeGroup {
friend class RelTableData;

public:
LocalRelNG(common::offset_t nodeGroupStartOffset, std::vector<common::LogicalType*> dataTypes,
MemoryManager* mm, common::RelMultiplicity multiplicity);
LocalRelNG(common::offset_t nodeGroupStartOffset, std::vector<common::LogicalType> dataTypes,
common::RelMultiplicity multiplicity);
DELETE_COPY_DEFAULT_MOVE(LocalRelNG);

common::row_idx_t scanCSR(common::offset_t srcOffset, common::offset_t posToReadForOffset,
const std::vector<common::column_id_t>& columnIDs,
Expand Down Expand Up @@ -53,9 +55,9 @@ class LocalRelTableData final : public LocalTableData {
friend class RelTableData;

public:
LocalRelTableData(common::RelMultiplicity multiplicity,
std::vector<common::LogicalType*> dataTypes, MemoryManager* mm)
: LocalTableData{std::move(dataTypes), mm}, multiplicity{multiplicity} {}
LocalRelTableData(
common::RelMultiplicity multiplicity, std::vector<common::LogicalType> dataTypes)
: LocalTableData{std::move(dataTypes)}, multiplicity{multiplicity} {}

private:
LocalNodeGroup* getOrCreateLocalNodeGroup(common::ValueVector* nodeIDVector) override;
Expand Down
10 changes: 3 additions & 7 deletions src/include/storage/local_storage/local_storage.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,20 @@

#include <unordered_map>

#include "common/copy_constructors.h"
#include "storage/local_storage/local_table.h"

namespace kuzu {
namespace catalog {
class TableCatalogEntry;
}
namespace storage {

class MemoryManager;

// Data structures in LocalStorage are not thread-safe.
// For now, we only support single thread insertions and updates. Once we optimize them with
// multiple threads, LocalStorage and its related data structures should be reworked to be
// thread-safe.
class LocalStorage {
public:
explicit LocalStorage(storage::MemoryManager* mm);
explicit LocalStorage() {}
DELETE_COPY_AND_MOVE(LocalStorage);

// This function will create the local table data if not exists.
LocalTableData* getOrCreateLocalTableData(common::table_id_t tableID,
Expand All @@ -32,7 +29,6 @@ class LocalStorage {

private:
std::unordered_map<common::table_id_t, std::unique_ptr<LocalTable>> tables;
storage::MemoryManager* mm;
};

} // namespace storage
Expand Down
74 changes: 33 additions & 41 deletions src/include/storage/local_storage/local_table.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

#include <unordered_map>

#include "common/data_chunk/data_chunk_collection.h"
#include "common/enums/rel_multiplicity.h"
#include "common/enums/table_type.h"
#include "common/vector/value_vector.h"
#include "storage/store/node_group.h"

namespace kuzu {
namespace storage {
Expand All @@ -18,26 +18,21 @@ using offset_set_t = std::unordered_set<common::offset_t>;
static constexpr common::column_id_t NBR_ID_COLUMN_ID = 0;
static constexpr common::column_id_t REL_ID_COLUMN_ID = 1;

struct LocalVectorCollection {
std::vector<common::ValueVector*> vectors;
using ChunkCollection = std::vector<ColumnChunk*>;

static LocalVectorCollection empty() { return LocalVectorCollection{}; }

inline bool isEmpty() const { return vectors.empty(); }
inline void appendVector(common::ValueVector* vector) { vectors.push_back(vector); }
inline common::ValueVector* getLocalVector(common::row_idx_t rowIdx) const {
auto vectorIdx = rowIdx >> common::DEFAULT_VECTOR_CAPACITY_LOG_2;
KU_ASSERT(vectorIdx < vectors.size());
return vectors[vectorIdx];
}
class LocalChunkedGroupCollection {
public:
static constexpr uint64_t CHUNK_CAPACITY = 2048;

LocalVectorCollection getStructChildVectorCollection(common::struct_field_idx_t idx) const;
};
explicit LocalChunkedGroupCollection(std::vector<common::LogicalType> dataTypes)
: dataTypes{std::move(dataTypes)}, numRows{0} {}
DELETE_COPY_DEFAULT_MOVE(LocalChunkedGroupCollection);

class LocalDataChunkCollection {
public:
LocalDataChunkCollection(MemoryManager* mm, std::vector<common::LogicalType> dataTypes)
: dataChunkCollection{mm}, mm{mm}, dataTypes{std::move(dataTypes)}, numRows{0} {}
static inline std::pair<uint32_t, uint64_t> getChunkIdxAndOffsetInChunk(
common::row_idx_t rowIdx) {
return std::make_pair(rowIdx / LocalChunkedGroupCollection::CHUNK_CAPACITY,
rowIdx % LocalChunkedGroupCollection::CHUNK_CAPACITY);
}

inline common::row_idx_t getRowIdxFromOffset(common::offset_t offset) {
KU_ASSERT(offsetToRowIdx.contains(offset));
Expand All @@ -61,12 +56,12 @@ class LocalDataChunkCollection {

bool isEmpty() const { return offsetToRowIdx.empty() && srcNodeOffsetToRelOffsets.empty(); }
void readValueAtRowIdx(common::row_idx_t rowIdx, common::column_id_t columnID,
common::ValueVector* outputVector, common::sel_t posInOutputVector);
common::ValueVector* outputVector, common::sel_t posInOutputVector) const;
bool read(common::offset_t offset, common::column_id_t columnID,
common::ValueVector* outputVector, common::sel_t posInOutputVector);

inline void append(common::offset_t offset, std::vector<common::ValueVector*> vectors) {
offsetToRowIdx[offset] = appendToDataChunkCollection(vectors);
offsetToRowIdx[offset] = append(vectors);
}
// Only used for rel tables. Should be moved out later.
inline void append(common::offset_t nodeOffset, common::offset_t relOffset,
Expand All @@ -84,23 +79,21 @@ class LocalDataChunkCollection {
// Only used for rel tables. Should be moved out later.
void remove(common::offset_t srcNodeOffset, common::offset_t relOffset);

inline LocalVectorCollection getLocalChunk(common::column_id_t columnID) {
LocalVectorCollection localVectorCollection;
for (auto& chunk : dataChunkCollection.getChunksUnsafe()) {
localVectorCollection.appendVector(chunk.getValueVector(columnID).get());
inline ChunkCollection getLocalChunk(common::column_id_t columnID) {
ChunkCollection localChunkCollection;
for (auto& chunkedGroup : chunkedGroups.getChunkedGroups()) {
localChunkCollection.push_back(chunkedGroup->getColumnChunkUnsafe(columnID));
}
return localVectorCollection;
return localChunkCollection;
}

private:
common::row_idx_t appendToDataChunkCollection(std::vector<common::ValueVector*> vectors);
common::DataChunk createNewDataChunk();
common::row_idx_t append(std::vector<common::ValueVector*> vectors);

private:
common::DataChunkCollection dataChunkCollection;
ChunkedNodeGroupCollection chunkedGroups;
// The offset here can either be nodeOffset ( for node table) or relOffset (for rel table).
offset_to_row_idx_t offsetToRowIdx;
storage::MemoryManager* mm;
std::vector<common::LogicalType> dataTypes;
common::row_idx_t numRows;

Expand Down Expand Up @@ -147,8 +140,9 @@ class LocalDeletionInfo {

class LocalNodeGroup {
public:
LocalNodeGroup(common::offset_t nodeGroupStartOffset,
std::vector<common::LogicalType*> dataTypes, MemoryManager* mm);
LocalNodeGroup(
common::offset_t nodeGroupStartOffset, const std::vector<common::LogicalType>& dataTypes);
DELETE_COPY_DEFAULT_MOVE(LocalNodeGroup);
virtual ~LocalNodeGroup() = default;

virtual bool insert(std::vector<common::ValueVector*> nodeIDVectors,
Expand All @@ -157,29 +151,28 @@ class LocalNodeGroup {
common::column_id_t columnID, common::ValueVector* propertyVector) = 0;
virtual bool delete_(common::ValueVector* IDVector, common::ValueVector* extraVector) = 0;

LocalDataChunkCollection& getUpdateChunks(common::column_id_t columnID) {
LocalChunkedGroupCollection& getUpdateChunks(common::column_id_t columnID) {
KU_ASSERT(columnID < updateChunks.size());
return updateChunks[columnID];
}
LocalDataChunkCollection& getInsesrtChunks() { return insertChunks; }
LocalChunkedGroupCollection& getInsesrtChunks() { return insertChunks; }

bool hasUpdatesOrDeletions() const;

protected:
common::offset_t nodeGroupStartOffset;
storage::MemoryManager* mm;

LocalDataChunkCollection insertChunks;
LocalChunkedGroupCollection insertChunks;
LocalDeletionInfo deleteInfo;
std::vector<LocalDataChunkCollection> updateChunks;
std::vector<LocalChunkedGroupCollection> updateChunks;
};

class LocalTableData {
friend class NodeTableData;

public:
LocalTableData(std::vector<common::LogicalType*> dataTypes, MemoryManager* mm)
: dataTypes{std::move(dataTypes)}, mm{mm} {}
explicit LocalTableData(std::vector<common::LogicalType> dataTypes)
: dataTypes{std::move(dataTypes)} {}
virtual ~LocalTableData() = default;

inline void clear() { nodeGroups.clear(); }
Expand All @@ -194,8 +187,7 @@ class LocalTableData {
virtual LocalNodeGroup* getOrCreateLocalNodeGroup(common::ValueVector* nodeIDVector) = 0;

protected:
std::vector<common::LogicalType*> dataTypes;
MemoryManager* mm;
std::vector<common::LogicalType> dataTypes;

std::unordered_map<common::node_group_idx_t, std::unique_ptr<LocalNodeGroup>> nodeGroups;
};
Expand All @@ -206,7 +198,7 @@ class LocalTable {
explicit LocalTable(common::TableType tableType) : tableType{tableType} {};

LocalTableData* getOrCreateLocalTableData(const std::vector<std::unique_ptr<Column>>& columns,
MemoryManager* mm, common::vector_idx_t dataIdx, common::RelMultiplicity multiplicity);
common::vector_idx_t dataIdx, common::RelMultiplicity multiplicity);
inline LocalTableData* getLocalTableData(common::vector_idx_t dataIdx) {
KU_ASSERT(dataIdx < localTableDataCollection.size());
return localTableDataCollection[dataIdx].get();
Expand Down
Loading