Description
It seems kuzu database can become inconsistent and it is neither reported nor fixed by kuzu_shell in particular.
There is a kuzu database with the following structure:
CREATE NODE TABLE User(uid UUID, id INT64, PRIMARY KEY(uid));
CREATE NODE TABLE Note(uid UUID, id INT64, PRIMARY KEY(uid));
CREATE REL TABLE Links(FROM Note TO Note);
CREATE REL TABLE NoteOf(FROM Note TO User);
The binary database files of the database are available here:
The database has been originally populated by importing User, Note and Links from .csv files while NoteOf has been created using a node.js script. The script has been interrupted several times. The goal of the script was to create a link between notes and users so as each note belonged to some user. In the end, the database contains: 1000 users, 1000000 notes, about 4000000 links, and is supposed to contain 1000000 NoteOf relationships.
The following query shows that no note remains unassigned:
kuzu> MATCH (n:Note) WHERE NOT EXISTS { MATCH (n)-[:NoteOf]->(:User) } RETURN count(n);
----------------
| COUNT(n._ID) |
----------------
| 0 |
----------------
(1 tuple)
(1 column)
Time: 3.65ms (compiling), 803.85ms (executing)
However, after exporting NoteOf using:
COPY (MATCH (n:Note)-[:NoteOf]->(u:User) RETURN n.uid, u.uid) TO 'note_of.csv';
it appeared that only 999990 relationships out of 1000000 has been exported.
At first I thought it was a problem with export but I imported the exported .csv files into a new database and found the missing IDs:
[743642,853676,684672,853633,992666,992667,602411,755386,831275,643211]
I applied them (plus 1, 2, 3 as some "good" IDs for testing) to the following queries against the original database:
kuzu> MATCH (n:Note) WHERE n.id in [1, 2, 3, 743642,853676,684672,853633,992666,992667,602411,755386,831275,643211] RETURN n.uid, n.id;
-------------------------------------------------
| n.uid | n.id |
-------------------------------------------------
| a017c165-98ce-4f17-9761-cb822b8b2dfe | 684672 |
-------------------------------------------------
| 85f0807e-a588-4db6-866c-d683befd34cc | 755386 |
-------------------------------------------------
| e2bb50df-ccc6-471f-a7e0-a50a17341164 | 743642 |
-------------------------------------------------
| 89dd8aa3-117a-4328-b316-25866d5f4b2b | 831275 |
-------------------------------------------------
| bdf820b4-b188-41b3-b336-f9f1eb94d662 | 853633 |
-------------------------------------------------
| c6c4c6bc-22a7-446f-a838-ab65497866aa | 853676 |
-------------------------------------------------
| 5e8c1059-533c-467b-9bfd-ab7bdcf76896 | 1 |
-------------------------------------------------
| ba05b55b-3d3b-4873-a37b-397581d45e8d | 2 |
-------------------------------------------------
| 2c24f177-ebe0-4695-91f9-5a6d84f379f3 | 3 |
-------------------------------------------------
| 48c27540-c017-4bba-a86b-ac6bc8ff1ae5 | 992666 |
-------------------------------------------------
| bd199b09-53f4-402b-bd56-b9478b35c849 | 992667 |
-------------------------------------------------
| 2b25ecf7-b705-4335-b3bd-4ba867b85be6 | 602411 |
-------------------------------------------------
| 6f0e78d7-c1af-434b-b4c9-cc1da87c7ad8 | 643211 |
-------------------------------------------------
(13 tuples)
(2 columns)
Time: 0.47ms (compiling), 49.18ms (executing)
The query returned all the specified notes as expected. But the next query:
kuzu> MATCH (n:Note)-[:NoteOf]->(u:User) WHERE n.id in [1, 2, 3, 743642,853676,684672,853633,992666,992667,602411,755386,831275,643211] RETURN n.uid, n.id, u.id;
------------------------------------------------------
| n.uid | n.id | u.id |
------------------------------------------------------
| 5e8c1059-533c-467b-9bfd-ab7bdcf76896 | 1 | 176 |
------------------------------------------------------
| ba05b55b-3d3b-4873-a37b-397581d45e8d | 2 | 954 |
------------------------------------------------------
| 2c24f177-ebe0-4695-91f9-5a6d84f379f3 | 3 | 176 |
------------------------------------------------------
(3 tuples)
(3 columns)
Time: 1.87ms (compiling), 18.75ms (executing)
returned only "good" IDs.
Summarizing:
MATCH (n:Note) WHERE NOT EXISTS { MATCH (n)-[:NoteOf]->(:User) } RETURN count(n);
reports there are no unconnected notes, but:
MATCH (n:Note)-[:NoteOf]->(u:User) WHERE n.id in [1, 2, 3, 743642,853676,684672,853633,992666,992667,602411,755386,831275,643211] RETURN n.uid, n.id, u.id;
reveals there exist some unconnected notes.
I used kuzu_shell and kuzujs.node built from source on 23 March 21:13 (source code was pulled right before the build).
(It is not the first time I noticed inconsistencies. I also had a case when the same query returned different results, which looked like a random subset of a bigger set. Unfortunately, I have not kept the binary files.)