Skip to content

Kuzu db can become inconsistent #3151

Closed
@lbuczkow

Description

It seems kuzu database can become inconsistent and it is neither reported nor fixed by kuzu_shell in particular.

There is a kuzu database with the following structure:

CREATE NODE TABLE User(uid UUID, id INT64, PRIMARY KEY(uid)); 
CREATE NODE TABLE Note(uid UUID, id INT64, PRIMARY KEY(uid)); 
CREATE REL TABLE Links(FROM Note TO Note); 
CREATE REL TABLE NoteOf(FROM Note TO User);

The binary database files of the database are available here:

https://file.io/otuAQVrb4UHz

The database has been originally populated by importing User, Note and Links from .csv files while NoteOf has been created using a node.js script. The script has been interrupted several times. The goal of the script was to create a link between notes and users so as each note belonged to some user. In the end, the database contains: 1000 users, 1000000 notes, about 4000000 links, and is supposed to contain 1000000 NoteOf relationships.

The following query shows that no note remains unassigned:

kuzu> MATCH (n:Note) WHERE NOT EXISTS { MATCH (n)-[:NoteOf]->(:User) } RETURN count(n);
----------------
| COUNT(n._ID) |
----------------
| 0            |
----------------
(1 tuple)
(1 column)
Time: 3.65ms (compiling), 803.85ms (executing)

However, after exporting NoteOf using:

COPY (MATCH (n:Note)-[:NoteOf]->(u:User) RETURN n.uid, u.uid) TO 'note_of.csv';

it appeared that only 999990 relationships out of 1000000 has been exported.

At first I thought it was a problem with export but I imported the exported .csv files into a new database and found the missing IDs:

[743642,853676,684672,853633,992666,992667,602411,755386,831275,643211]

I applied them (plus 1, 2, 3 as some "good" IDs for testing) to the following queries against the original database:

kuzu> MATCH (n:Note) WHERE n.id in [1, 2, 3, 743642,853676,684672,853633,992666,992667,602411,755386,831275,643211] RETURN n.uid, n.id;
-------------------------------------------------
| n.uid                                | n.id   |
-------------------------------------------------
| a017c165-98ce-4f17-9761-cb822b8b2dfe | 684672 |
-------------------------------------------------
| 85f0807e-a588-4db6-866c-d683befd34cc | 755386 |
-------------------------------------------------
| e2bb50df-ccc6-471f-a7e0-a50a17341164 | 743642 |
-------------------------------------------------
| 89dd8aa3-117a-4328-b316-25866d5f4b2b | 831275 |
-------------------------------------------------
| bdf820b4-b188-41b3-b336-f9f1eb94d662 | 853633 |
-------------------------------------------------
| c6c4c6bc-22a7-446f-a838-ab65497866aa | 853676 |
-------------------------------------------------
| 5e8c1059-533c-467b-9bfd-ab7bdcf76896 | 1      |
-------------------------------------------------
| ba05b55b-3d3b-4873-a37b-397581d45e8d | 2      |
-------------------------------------------------
| 2c24f177-ebe0-4695-91f9-5a6d84f379f3 | 3      |
-------------------------------------------------
| 48c27540-c017-4bba-a86b-ac6bc8ff1ae5 | 992666 |
-------------------------------------------------
| bd199b09-53f4-402b-bd56-b9478b35c849 | 992667 |
-------------------------------------------------
| 2b25ecf7-b705-4335-b3bd-4ba867b85be6 | 602411 |
-------------------------------------------------
| 6f0e78d7-c1af-434b-b4c9-cc1da87c7ad8 | 643211 |
-------------------------------------------------
(13 tuples)
(2 columns)
Time: 0.47ms (compiling), 49.18ms (executing)

The query returned all the specified notes as expected. But the next query:

kuzu> MATCH (n:Note)-[:NoteOf]->(u:User) WHERE n.id in [1, 2, 3, 743642,853676,684672,853633,992666,992667,602411,755386,831275,643211] RETURN n.uid, n.id, u.id;
------------------------------------------------------
| n.uid                                | n.id | u.id |
------------------------------------------------------
| 5e8c1059-533c-467b-9bfd-ab7bdcf76896 | 1    | 176  |
------------------------------------------------------
| ba05b55b-3d3b-4873-a37b-397581d45e8d | 2    | 954  |
------------------------------------------------------
| 2c24f177-ebe0-4695-91f9-5a6d84f379f3 | 3    | 176  |
------------------------------------------------------
(3 tuples)
(3 columns)
Time: 1.87ms (compiling), 18.75ms (executing)

returned only "good" IDs.

Summarizing:

MATCH (n:Note) WHERE NOT EXISTS { MATCH (n)-[:NoteOf]->(:User) } RETURN count(n);

reports there are no unconnected notes, but:

MATCH (n:Note)-[:NoteOf]->(u:User) WHERE n.id in [1, 2, 3, 743642,853676,684672,853633,992666,992667,602411,755386,831275,643211] RETURN n.uid, n.id, u.id;

reveals there exist some unconnected notes.

I used kuzu_shell and kuzujs.node built from source on 23 March 21:13 (source code was pulled right before the build).

(It is not the first time I noticed inconsistencies. I also had a case when the same query returned different results, which looked like a random subset of a bigger set. Unfortunately, I have not kept the binary files.)

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions