Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from kuzudb:master #2

Merged
merged 174 commits into from
Apr 7, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
174 commits
Select commit Hold shift + click to select a range
0d84c73
Support Polars DataFrame export from QueryResult (#2985)
alexander-beedie Mar 4, 2024
89598fd
clean up transaction pointer in physical operator
ray6080 Mar 4, 2024
1131621
Merge pull request #2990 from kuzudb/clean-operator-transaction
ray6080 Mar 4, 2024
3415ff1
Store a stable reference instead of a duplicate string in the ColumnC…
benjaminwinger Mar 4, 2024
c1b2220
fix reset empty heap overflow
ray6080 Mar 2, 2024
93e6b3e
Merge pull request #2994 from kuzudb/string-column-chunk-index
benjaminwinger Mar 5, 2024
1b858cf
Merge pull request #2996 from kuzudb/fix-reset-empty
ray6080 Mar 6, 2024
9e23995
Rework CSV_TO_PARQUET testing feature
manh9203 Feb 29, 2024
ad24bf7
Avoid moving DictionaryChunks
benjaminwinger Mar 6, 2024
74c2f80
Merge pull request #2999 from kuzudb/dictionary-memory-fix
ray6080 Mar 7, 2024
5e598ec
update links to website (#3000)
ray6080 Mar 7, 2024
38e4398
Re-write partitioner to use ColumnChunks instead of ValueVectors
benjaminwinger Feb 27, 2024
b2f50ac
Abstract client config
andyfengHKU Mar 8, 2024
b7e3bc7
Merge pull request #2979 from kuzudb/rel-memory-fix
benjaminwinger Mar 8, 2024
c554a20
Merge pull request #3010 from kuzudb/add-client-config
andyfengHKU Mar 8, 2024
45c5aa9
Support use of QueryResult as a context manager (#3009)
alexander-beedie Mar 9, 2024
a3a6c2a
Pass client context to binder
andyfengHKU Mar 8, 2024
4692c54
Merge pull request #3015 from kuzudb/pass-client-context-binder
andyfengHKU Mar 10, 2024
020c09b
Refactor cast functions
andyfengHKU Mar 9, 2024
dc9771f
Merge pull request #3016 from kuzudb/refactor-cast-function-binding
andyfengHKU Mar 10, 2024
c149349
Combine append(ValueVector) with appendOne
ray6080 Mar 10, 2024
2a5948f
clean up unique_ptr of LogicalType in NodeGroup and BatchInsert
ray6080 Mar 10, 2024
f4a95ab
Merge pull request #3018 from kuzudb/clean-unique-ptr
ray6080 Mar 11, 2024
6c01c80
handle multiple database instantiations for import caching
mxwli Feb 28, 2024
3f84585
Revert "Revert "Implement Python Import Caching""
mxwli Mar 11, 2024
67e9204
Merge pull request #3017 from kuzudb/remove-append-one
ray6080 Mar 11, 2024
8642ccb
Merge pull request #3025 from kuzudb/import-cache-fix-and-revert-revert
mxwli Mar 11, 2024
7c25a3b
Rewrite the Hash Index overflow file to support multiple copies and f…
benjaminwinger Mar 7, 2024
2397c02
Fix issue-2984
andyfengHKU Mar 11, 2024
a6c7e21
Merge pull request #3026 from kuzudb/issue-2984
andyfengHKU Mar 11, 2024
8f5f64a
Add multiplaform test report bot (#3027)
mewim Mar 12, 2024
3bdc752
Python API typing, lint, config/makefile (#3023)
alexander-beedie Mar 12, 2024
18c2c8f
Fix unicode conversion for pandas dataframe (#3029)
mewim Mar 12, 2024
339a471
Update LICENSE
semihsalihoglu-uw Mar 12, 2024
1d5df7f
Merge pull request #3031 from kuzudb/semihsalihoglu-uw-patch-1
semihsalihoglu-uw Mar 12, 2024
0c26056
Merge pull request #3012 from kuzudb/multi-copy-overflow-file
benjaminwinger Mar 12, 2024
bdd650e
Add copy from subquery
andyfengHKU Mar 4, 2024
af50489
Insert into the hash index builder one chunk at a time
benjaminwinger Mar 5, 2024
7a3ff60
Merge pull request #3020 from kuzudb/copy-from-subquery
andyfengHKU Mar 12, 2024
d9d277f
Fix issue-3004
andyfengHKU Mar 12, 2024
891c115
Merge pull request #3036 from kuzudb/issue-3004
andyfengHKU Mar 13, 2024
7da8a62
Optimise Python unit test runtime (~7x speedup) (#3032)
alexander-beedie Mar 13, 2024
7c16897
Add more parameter types for Node.js API (#3037)
mewim Mar 13, 2024
d8487a0
Merge pull request #2997 from kuzudb/hash-index-builder-chunks
benjaminwinger Mar 13, 2024
b304389
Remove the constraint on the HashIndexBuilder template parameter
benjaminwinger Mar 12, 2024
0dbcef6
Allow CI workflow to be manually dispatched (#3043)
mewim Mar 13, 2024
2dfe495
Bump extensions version to 0.2.0 (#3041)
mewim Mar 13, 2024
7110f91
First-pass lint/format for Python `shell` tests (#3034)
alexander-beedie Mar 13, 2024
930ba45
Bump master branch version to 0.3.2.1 (#3044)
mewim Mar 14, 2024
1b6f741
Fixed failing shell tests (#3045)
MSebanc Mar 14, 2024
ff186a5
Add shell tests to CI (#3039)
mewim Mar 14, 2024
77489a5
fix issue 3042
ray6080 Mar 13, 2024
0b7adb9
fix sliding out-of-place commit and null strings
ray6080 Mar 12, 2024
f8efa2a
Merge pull request #3055 from kuzudb/fix-rel-insert-bug
ray6080 Mar 14, 2024
1f88b3f
rework local storage: separate the storage of insertions and updates
ray6080 Mar 14, 2024
4f06cf1
Merge pull request #2982 from kuzudb/multi-copy-rel-s1
ray6080 Mar 14, 2024
d348228
Merge pull request #3046 from kuzudb/fix-3042
ray6080 Mar 14, 2024
4a7b109
Update Debian version in build workflows (#3056)
mewim Mar 14, 2024
a9454b3
Implement duckdb scanner extension
acquamarin Mar 4, 2024
c3556e2
Merge pull request #3052 from kuzudb/duckdb-scanner
acquamarin Mar 15, 2024
2a3012c
Fix Hash index split slot ID when reserving a number of slots which a…
benjaminwinger Mar 15, 2024
28bd03b
Copy table function instead of passing raw pointer
andyfengHKU Mar 16, 2024
a612c0f
Merge pull request #3067 from kuzudb/table-function-copy
andyfengHKU Mar 16, 2024
1d7b9f3
Add replace func
andyfengHKU Mar 13, 2024
a0ee10e
Merge pull request #3069 from kuzudb/replace-func
andyfengHKU Mar 18, 2024
3db0f95
Merge pull request #3030 from kuzudb/hash_index_template_types
benjaminwinger Mar 18, 2024
f12e5e7
Remove unnecessary components for pip package (#3074)
mewim Mar 18, 2024
826927e
Merge pull request #3066 from kuzudb/hash-index-reserve-fix
benjaminwinger Mar 18, 2024
cc93226
Implement catalog cache for postgres scanner
acquamarin Mar 18, 2024
bd963c1
Merge pull request #3071 from kuzudb/catalog-cache
acquamarin Mar 18, 2024
35b9438
Rework Fixed-list
manh9203 Mar 11, 2024
e7c6d73
Merge pull request #3057 from kuzudb/fixed-list-rework
manh9203 Mar 18, 2024
e963df1
Implemented Progress Bar for ScanNodeID Operator (#3051)
MSebanc Mar 18, 2024
775d2e6
replace ValueVector with ColumnChunk in LocalStorage
ray6080 Mar 14, 2024
8854ebd
Merge pull request #3028 from kuzudb/refactor-local-storage
ray6080 Mar 19, 2024
3ce3b1f
fix rel insert and append sanityCheck for column chunk
ray6080 Mar 15, 2024
0531afe
Exclude extension files from the rust crate (#3076)
benjaminwinger Mar 19, 2024
907d831
Remove unnecessary components for pip package (#3085)
mewim Mar 19, 2024
c3decc2
Merge pull request #3081 from kuzudb/fix-rel-insert
ray6080 Mar 19, 2024
c39704d
fix deadlock issue due to bm no frame to claim exception and fix used…
ray6080 Mar 19, 2024
8b2c768
Merge pull request #3082 from kuzudb/fix-node-insert
ray6080 Mar 19, 2024
efdc1e4
Refactor arithmetic functions
manh9203 Mar 18, 2024
8fa40d6
Merge pull request #3079 from kuzudb/arithmetic-functions-refactor
manh9203 Mar 19, 2024
0ced885
Allowed for progress bar to be configurable by CALL (#3080)
MSebanc Mar 19, 2024
7a3ca59
Implement array functions
acquamarin Mar 19, 2024
04fcdec
Merge pull request #3087 from kuzudb/array-functions
acquamarin Mar 19, 2024
6860af0
Remove underscore from the badges in README (#3094)
mewim Mar 20, 2024
f69ad02
Fix python prepared statement null value
acquamarin Mar 20, 2024
e49bb30
Merge pull request #3098 from kuzudb/python-prepared-statement
acquamarin Mar 20, 2024
568e08e
Refactor string functions
manh9203 Mar 19, 2024
f8fe205
Merge pull request #3091 from kuzudb/string-functions-refactor
manh9203 Mar 20, 2024
05359c7
Arrow chunk_size as keyword argument (#3084)
prrao87 Mar 21, 2024
3c90c16
Update rustdoc to show how to enable parallel compilation (#3099)
prrao87 Mar 21, 2024
f6b1d6a
Improve copy-to-parquet perf
acquamarin Mar 21, 2024
7817cc9
Merge pull request #3105 from kuzudb/copy-to-parquet-perf
acquamarin Mar 21, 2024
68c2856
Refactor list functions
manh9203 Mar 19, 2024
96d9a91
Merge pull request #3100 from kuzudb/list-functions-refactor
manh9203 Mar 22, 2024
9effbb1
Refactor cast functions
manh9203 Mar 20, 2024
bdae55f
Merge pull request #3107 from kuzudb/cast-functions-refactor
manh9203 Mar 22, 2024
f9e1b12
Update `get_as_pl` (should always return a single chunk) (#3110)
alexander-beedie Mar 22, 2024
3f817f2
Add standard Python module __version__ attr (#3111)
alexander-beedie Mar 22, 2024
6d39076
Fix DuckDB build for macOS ARM and 32-bit (#3115)
mewim Mar 22, 2024
6e52e22
Add external object scan replacement
andyfengHKU Mar 16, 2024
d65c2b8
clean
andyfengHKU Mar 18, 2024
8f976e4
clean
andyfengHKU Mar 18, 2024
23144c3
pyarrow backend scanning for pandas
mxwli Feb 27, 2024
f0507b0
CLANG-TIDY
mxwli Mar 21, 2024
b97aab5
clang fix
mxwli Mar 21, 2024
cb4d757
clang
mxwli Mar 21, 2024
d4b261b
Merge pull request #3058 from kuzudb/pandas-pyarrow-backend
mxwli Mar 22, 2024
003a706
Add pull request template (#3118)
andyfengHKU Mar 22, 2024
8f37501
Added customizable delay before displaying progress bar (#3092)
MSebanc Mar 22, 2024
c8e4d5b
Hash index cleanup (#3088)
benjaminwinger Mar 22, 2024
167bb87
Fix launch database using homedir (#3108)
acquamarin Mar 22, 2024
7ec590a
remove dummy transactions (#3106)
hououou Mar 22, 2024
9247fd2
fix import database path (#3063)
hououou Mar 22, 2024
f9bc0c6
enable compression for INTERNAL_ID (#3116)
ray6080 Mar 23, 2024
e60e8cd
close 1646 (#3122)
ray6080 Mar 23, 2024
365815b
Refactor Partitioner to use ChunkedNodeGroupCollection (#3123)
ray6080 Mar 23, 2024
3a6bd7e
Replace with client context (#3121)
hououou Mar 23, 2024
599b80f
Rework var list storage layout (#3093)
hououou Mar 24, 2024
3ce064d
Fix 3127 (#3130)
acquamarin Mar 24, 2024
3813eed
Fix issue-3129 (#3131)
andyfengHKU Mar 24, 2024
53ef58e
Refactor scalar function registration (#3119)
manh9203 Mar 25, 2024
b208d15
Support multiple COPY statements on rel tables (#2989)
ray6080 Mar 25, 2024
ad31f02
initialize readfds via FD_ZERO before use (#3132)
neeraj9 Mar 25, 2024
a8b15dc
table scan/update/insert/delete state (#3072)
ray6080 Mar 25, 2024
4d21128
Support read after update (#3126)
andyfengHKU Mar 25, 2024
80b3e94
Factor out benchmark workflow and enable manual trigger for it (#3144)
mewim Mar 26, 2024
3237e6f
Implement postgres-scanner (#3139)
acquamarin Mar 26, 2024
de72fc9
Python List and Map Parameter Support (#3090)
mxwli Mar 26, 2024
a85f4fe
Cache DiskArray write header in-memory (#3109)
benjaminwinger Mar 26, 2024
fc3b4a7
Fix postgres scanner issues (#3148)
acquamarin Mar 26, 2024
c1f68cd
Refactor path functions and RDF functions (#3134)
manh9203 Mar 26, 2024
9ea80ec
Refactor aggregate functions (#3136)
manh9203 Mar 27, 2024
73ed1ea
Pandas Pyarrow Backend Bugfix and Tests (#3152)
mxwli Mar 27, 2024
677d35e
List Auxiliary Buffer NullMask Fix (#3156)
mxwli Mar 27, 2024
c747899
Add support to compute hash on list of struct (#3157)
acquamarin Mar 27, 2024
015bf23
Prepare Statement Improvement (#3140)
hououou Mar 28, 2024
6c82aad
resolve weird ANY resolution (#3160)
mxwli Mar 28, 2024
20bde3a
fix export test (#3164)
hououou Mar 28, 2024
956b3e3
Implement initcap/concat functions (#3161)
acquamarin Mar 28, 2024
2ec13b2
Fix issue 3070: Support extend from unwind node (#3153)
andyfengHKU Mar 28, 2024
08fd180
Add Pyarrow Map Scanning (#3158)
mxwli Mar 28, 2024
293b4e6
Fix export database regression (#3171)
andyfengHKU Mar 28, 2024
37b58bb
Fix hash aggregate edge case (#3172)
andyfengHKU Mar 28, 2024
20e5cbb
Added progress for in_query_call operators (#3120)
MSebanc Mar 28, 2024
cf71770
Fixed shell incorrect command seg fault (#3173)
MSebanc Mar 29, 2024
fb8f4c7
Cache files when replaying WAL (#3137)
benjaminwinger Mar 29, 2024
f80a6eb
Support join hash table on aggregate types (#3174)
acquamarin Mar 29, 2024
fa528c1
Fix delete then scan bug (#3176)
andyfengHKU Mar 30, 2024
4e406a1
Refactor sel vector interface (#3177)
andyfengHKU Mar 31, 2024
6f0d8f8
Fix issue 3151: disable null on internalID columns (#3165)
ray6080 Mar 31, 2024
6b1d45a
Rework DDL operators (#3178)
ray6080 Apr 1, 2024
ac9cbf3
Refactor table functions (#3155)
manh9203 Apr 1, 2024
a99ff6c
Rename VAR_LIST to LIST (#3170)
manh9203 Apr 1, 2024
add8473
Remove unused keywords in test runner (#3193)
hououou Apr 1, 2024
94fd5eb
Split extension tests as separate jobs (#2987)
mewim Apr 2, 2024
a95b29e
Added progress for aggregate scan and order by scan (#3192)
MSebanc Apr 2, 2024
0ad815e
Fix is null executor bug (#3197)
andyfengHKU Apr 2, 2024
f62e7c8
Fix order by radix sort bug (#3201)
acquamarin Apr 3, 2024
1f03f5a
Updated shell result truncation (#3206)
MSebanc Apr 3, 2024
1aaa21f
Fix-3200 (#3203)
prrao87 Apr 3, 2024
b3c6dc9
skip empty history file line (#3184)
neeraj9 Apr 4, 2024
fa0ef79
Merge duplicate key fix (#3207)
acquamarin Apr 4, 2024
37de692
Implemented progress for in memory RDF scan (#3208)
MSebanc Apr 4, 2024
ec6e309
Rework multiple query result (#3191)
hououou Apr 4, 2024
2100fa3
Fix constant compression in-place check for bools (#3211)
benjaminwinger Apr 5, 2024
8923c7f
Replace Slack link with Discord in contributing guideline (#3217)
mewim Apr 5, 2024
33111c8
fix pyarrow segfaulting on fedora 39 (#3213)
mxwli Apr 5, 2024
b3917d9
Bump clang-format to v18 and enable auto format (#3222)
mewim Apr 6, 2024
d946982
Check for format changes on master branch (#3223)
mewim Apr 6, 2024
8006723
CMAKE_CXX_FLAGS handling fails when variable is empty (#3228)
zaddach Apr 7, 2024
c6897b4
Remove extension test from `clang-build-test` job (#3231)
mewim Apr 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/binder/bind/bind_query.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ void validateIsAllUnionOrUnionAll(const BoundRegularQuery& regularQuery) {
}
if ((0 < unionAllExpressionCounter) &&
(unionAllExpressionCounter < regularQuery.getNumSingleQueries() - 1)) {
throw BinderException("Union and union all can't be used together.");
throw BinderException("Union and union all can not be used together.");
}
}

Expand Down
25 changes: 11 additions & 14 deletions src/binder/bind_expression/bind_function_expression.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#include "binder/binder.h"
#include "binder/expression/expression_util.h"
#include "binder/expression/function_expression.h"
#include "binder/expression/literal_expression.h"
#include "binder/expression/property_expression.h"
#include "binder/expression_binder.h"
#include "common/exception/binder.h"
Expand All @@ -13,6 +12,7 @@

using namespace kuzu::common;
using namespace kuzu::parser;
using namespace kuzu::function;

namespace kuzu {
namespace binder {
Expand Down Expand Up @@ -58,24 +58,21 @@ std::shared_ptr<Expression> ExpressionBinder::bindScalarFunctionExpression(
childrenTypes.push_back(child->dataType);
}
auto functions = context->getCatalog()->getFunctions(context->getTx());
auto function = ku_dynamic_cast<function::Function*, function::ScalarFunction*>(
auto function = ku_dynamic_cast<Function*, function::ScalarFunction*>(
function::BuiltInFunctionsUtils::matchFunction(functionName, childrenTypes, functions));
expression_vector childrenAfterCast;
std::unique_ptr<function::FunctionBindData> bindData;
if (functionName == CAST_FUNC_NAME) {
// If the expression to cast already has the same type as the target type, skip casting.
if (children.size() == 2) {
auto targetTypeStr = (ku_dynamic_cast<Expression&, LiteralExpression&>(*children[1]))
.getValue()
->getValue<std::string>();
auto outputType = binder::Binder::bindDataType(targetTypeStr);
if (*outputType == children[0]->dataType) {
return children[0];
}
}
bindData = function->bindFunc(children, function);
childrenAfterCast.push_back(
implicitCastIfNecessary(children[0], function->parameterTypeIDs[0]));
if (bindData == nullptr) {
return children[0];
}
auto childAfterCast = children[0];
// See castBindFunc for explanation.
if (children[0]->getDataType().getLogicalTypeID() == LogicalTypeID::ANY) {
childAfterCast = implicitCastIfNecessary(children[0], LogicalTypeID::STRING);
}
childrenAfterCast.push_back(std::move(childAfterCast));
} else {
for (auto i = 0u; i < children.size(); ++i) {
auto targetType = function->isVarLength ? function->parameterTypeIDs[0] :
Expand Down
3 changes: 0 additions & 3 deletions src/function/built_in_function_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,9 +92,6 @@ Function* BuiltInFunctionsUtils::matchFunction(const std::string& name,
uint32_t minCost = UINT32_MAX;
for (auto& function : functionSet) {
auto func = reinterpret_cast<Function*>(function.get());
if (name == CAST_FUNC_NAME) {
return func;
}
auto cost = getFunctionCost(inputTypes, func, isOverload);
if (cost == UINT32_MAX) {
continue;
Expand Down
4 changes: 2 additions & 2 deletions src/function/cast/cast_fixed_list.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ void CastFixedList::stringtoFixedListCastExecFunction<UnaryFunctionExecutor>(
const std::vector<std::shared_ptr<ValueVector>>& params, ValueVector& result, void* dataPtr) {
KU_ASSERT(params.size() == 1);
const auto& param = params[0];
auto option = &reinterpret_cast<CastFunctionBindData*>(dataPtr)->csvConfig.option;
auto option = &reinterpret_cast<CastFunctionBindData*>(dataPtr)->option;
if (param->state->isFlat()) {
auto inputPos = param->state->selVector->selectedPositions[0];
auto resultPos = result.state->selVector->selectedPositions[0];
Expand Down Expand Up @@ -197,7 +197,7 @@ void CastFixedList::stringtoFixedListCastExecFunction<CastChildFunctionExecutor>
const std::vector<std::shared_ptr<ValueVector>>& params, ValueVector& result, void* dataPtr) {
KU_ASSERT(params.size() == 1);
auto numOfEntries = reinterpret_cast<CastFunctionBindData*>(dataPtr)->numOfEntries;
auto option = &reinterpret_cast<CastFunctionBindData*>(dataPtr)->csvConfig.option;
auto option = &reinterpret_cast<CastFunctionBindData*>(dataPtr)->option;
auto inputVector = params[0].get();
for (auto i = 0u; i < numOfEntries; i++) {
result.setNull(i, inputVector->isNull(i));
Expand Down
63 changes: 31 additions & 32 deletions src/function/cast_from_string_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -823,144 +823,143 @@ void CastString::operation(const ku_string_t& input, union_entry_t& result,
}

void CastString::copyStringToVector(
ValueVector* vector, uint64_t rowToAdd, std::string_view strVal, const CSVOption* option) {
ValueVector* vector, uint64_t vectorPos, std::string_view strVal, const CSVOption* option) {
auto& type = vector->dataType;

if (strVal.empty() || isNull(strVal)) {
vector->setNull(rowToAdd, true /* isNull */);
vector->setNull(vectorPos, true /* isNull */);
return;
} else {
vector->setNull(rowToAdd, false /* isNull */);
}
vector->setNull(vectorPos, false /* isNull */);
switch (type.getLogicalTypeID()) {
case LogicalTypeID::INT128: {
int128_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::INT64: {
int64_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::INT32: {
int32_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::INT16: {
int16_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::INT8: {
int8_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::UINT64: {
uint64_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::UINT32: {
uint32_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::UINT16: {
uint16_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::UINT8: {
uint8_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::FLOAT: {
float val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::DOUBLE: {
double val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::BOOL: {
bool val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::BLOB: {
blob_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, rowToAdd, option);
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, vectorPos, option);
} break;
case LogicalTypeID::UUID: {
ku_uuid_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val.value);
vector->setValue(vectorPos, val.value);
} break;
case LogicalTypeID::STRING: {
if (!utf8proc::Utf8Proc::isValid(strVal.data(), strVal.length())) {
throw CopyException{"Invalid UTF8-encoded string."};
}
StringVector::addString(vector, rowToAdd, strVal.data(), strVal.length());
StringVector::addString(vector, vectorPos, strVal.data(), strVal.length());
} break;
case LogicalTypeID::DATE: {
date_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::TIMESTAMP_NS: {
timestamp_ns_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::TIMESTAMP_MS: {
timestamp_ms_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::TIMESTAMP_SEC: {
timestamp_sec_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::TIMESTAMP_TZ: {
timestamp_tz_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::TIMESTAMP: {
timestamp_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::INTERVAL: {
interval_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val);
vector->setValue(rowToAdd, val);
vector->setValue(vectorPos, val);
} break;
case LogicalTypeID::MAP: {
map_entry_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, rowToAdd, option);
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, vectorPos, option);
} break;
case LogicalTypeID::VAR_LIST: {
list_entry_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, rowToAdd, option);
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, vectorPos, option);
} break;
case LogicalTypeID::FIXED_LIST: {
CastStringHelper::castToFixedList(strVal.data(), strVal.length(), vector, rowToAdd, option);
CastStringHelper::castToFixedList(
strVal.data(), strVal.length(), vector, vectorPos, option);
} break;
case LogicalTypeID::STRUCT: {
struct_entry_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, rowToAdd, option);
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, vectorPos, option);
} break;
case LogicalTypeID::UNION: {
union_entry_t val;
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, rowToAdd, option);
CastStringHelper::cast(strVal.data(), strVal.length(), val, vector, vectorPos, option);
} break;
default: {
KU_UNREACHABLE;
Expand Down
34 changes: 19 additions & 15 deletions src/function/vector_cast_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "function/cast/functions/cast_rdf_variant.h"

using namespace kuzu::common;
using namespace kuzu::binder;

namespace kuzu {
namespace function {
Expand Down Expand Up @@ -1003,35 +1004,38 @@ function_set CastToUInt8Function::getFunctionSet() {
return result;
}

std::unique_ptr<FunctionBindData> CastAnyFunction::bindFunc(
static std::unique_ptr<FunctionBindData> castBindFunc(
const binder::expression_vector& arguments, Function* function) {
// check the size of the arguments
if (arguments.size() != 2) {
throw BinderException(stringFormat(
"Invalid number of arguments for given function CAST. Expected: 2, Actual: {}.",
arguments.size()));
KU_ASSERT(arguments.size() == 2);
// Bind target type.
if (arguments[1]->expressionType != ExpressionType::LITERAL) {
throw BinderException(
stringFormat("Second parameter of CAST function must be an literal."));
}

auto literalExpr = ku_dynamic_cast<Expression*, LiteralExpression*>(arguments[1].get());
auto targetTypeStr = literalExpr->getValue()->getValue<std::string>();
auto targetType = binder::Binder::bindDataType(targetTypeStr);
if (*targetType == arguments[0]->getDataType()) { // No need to cast.
return nullptr;
}
// Assign default type if input is ANY type, e.g. NULL
auto inputTypeID = arguments[0]->dataType.getLogicalTypeID();
if (inputTypeID == LogicalTypeID::ANY) {
inputTypeID = LogicalTypeID::STRING;
}
auto str = ((binder::LiteralExpression&)*arguments[1]).getValue()->getValue<std::string>();
auto outputType = binder::Binder::bindDataType(str);
auto func = ku_dynamic_cast<Function*, ScalarFunction*>(function);
func->name = "CAST_TO_" + str;
func->parameterTypeIDs[0] = inputTypeID;
func->name = "CAST_TO_" + targetTypeStr;
func->execFunc =
CastFunction::bindCastFunction(func->name, inputTypeID, outputType->getLogicalTypeID())
CastFunction::bindCastFunction(func->name, inputTypeID, targetType->getLogicalTypeID())
->execFunc;
return std::make_unique<function::CastFunctionBindData>(std::move(outputType));
return std::make_unique<function::CastFunctionBindData>(std::move(targetType));
}

function_set CastAnyFunction::getFunctionSet() {
function_set result;
result.push_back(std::make_unique<ScalarFunction>(CAST_FUNC_NAME,
std::vector<LogicalTypeID>{LogicalTypeID::ANY}, LogicalTypeID::ANY, nullptr, nullptr,
bindFunc, false));
std::vector<LogicalTypeID>{LogicalTypeID::ANY, LogicalTypeID::STRING}, LogicalTypeID::ANY,
nullptr, nullptr, castBindFunc, false));
return result;
}

Expand Down
21 changes: 21 additions & 0 deletions src/include/function/cast/cast_function_bind_data.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#pragma once

#include "common/copier_config/csv_reader_config.h"
#include "function/function.h"

namespace kuzu {
namespace function {

struct CastFunctionBindData : public FunctionBindData {
// We don't allow configuring delimiters, ... in CAST function.
// For performance purpose, we generate a default option object during binding time.
common::CSVOption option;
// TODO(Mahn): the following field should be removed once we refactor fixed list.
uint64_t numOfEntries;

explicit CastFunctionBindData(std::unique_ptr<common::LogicalType> dataType)
: FunctionBindData{std::move(dataType)} {}
};

} // namespace function
} // namespace kuzu
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ namespace function {

struct CastString {
static void copyStringToVector(
ValueVector* vector, uint64_t rowToAdd, std::string_view strVal, const CSVOption* option);
ValueVector* vector, uint64_t vectorPos, std::string_view strVal, const CSVOption* option);

template<typename T>
static inline bool tryCast(const ku_string_t& input, T& result) {
Expand Down
2 changes: 0 additions & 2 deletions src/include/function/cast/vector_cast_functions.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,6 @@ struct CastToUInt8Function {
};

struct CastAnyFunction {
static std::unique_ptr<FunctionBindData> bindFunc(
const binder::expression_vector& arguments, Function* function);
static function_set getFunctionSet();
};

Expand Down
9 changes: 0 additions & 9 deletions src/include/function/function.h
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#pragma once

#include "binder/expression/expression.h"
#include "common/copier_config/csv_reader_config.h"

namespace kuzu {
namespace function {
Expand All @@ -15,14 +14,6 @@ struct FunctionBindData {
virtual ~FunctionBindData() = default;
};

struct CastFunctionBindData : public FunctionBindData {
common::CSVReaderConfig csvConfig;
uint64_t numOfEntries;

explicit CastFunctionBindData(std::unique_ptr<common::LogicalType> dataType)
: FunctionBindData{std::move(dataType)} {}
};

struct Function;
using scalar_bind_func = std::function<std::unique_ptr<FunctionBindData>(
const binder::expression_vector&, Function* definition)>;
Expand Down
Loading