-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lockfile (*.wal) is not cleaned up right after connection.close #10002
Comments
@StefanJannCH thanks for raising this issue. Please
|
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
This issue was closed because it has been stale for 30 days with no activity. |
I am facing the same issue. I am creating a CRON job for creating a new DB.
dbCreator.js
|
+1... Happening on quick sequence of operations involving dropping a table and re-creating with same name. This makes database completely inaccessible. I'm using from nodejs: Edit:Same |
This looks to be connected to node-api, possibly a misuse of CAPI, we'll take a look. Unsure if this is best placed here or inhttps://github.com/duckdb/duckdb-node-neo |
@carlopi I'll tag this issue on duck-node repo too. Thanks.. |
@freakynit: could you possibly share an end-to-end reproduction? |
@carlopi Will try to get reproducible code and share.. |
@carlopi Unable to reproduce with test code... will share relevant log if I encounter the issue again. |
To get started even a more complete description of what's happening would help getting started diagnosing this. Note that this might be something that need fixing in the code itself, improving error messages providing way out or documenting better the expected usage of the API. |
@carlopi Got it.. lemme share relevant log lines and existing code (will expand in further comment):
Code: async function run6(connection) {
const dataFilePaths = [
'/Users/nitinbansal/Downloads/mix data/generated.csv',
'/Users/nitinbansal/Downloads/hackernews_10k_dec 2024.csv'
];
const tableNames = ['table_1', 'table_2'];
let counts = 0;
for(let i = 0; i < 3; i++) {
console.log(`iteration: ${i+1}`);
for(let j = 0; j < dataFilePaths.length; j++) {
await useConnection(DB_FILE_PATH, async (connection) => {
await dropTables(connection, true);
console.log(`Tables dropped. Loading data and getting count...`);
let result = await loadFile(connection, dataFilePaths[j], tableNames[j], 3);
let totalRecords = await getRecordCount(connection, tableNames[j]);
counts = totalRecords;
});
}
}
} |
Code for methods used above (almost full code): async function initConnection(dbFilePath) {
const instance = await DuckDBInstance.create(dbFilePath);
return await instance.connect();
}
async function useConnection(dbFilePath, callback) {
let connection = null;
try {
connection = await initConnection(dbFilePath);
return await callback(connection); // Return the result of the callback
} catch (error) {
console.error("Error while using connection:", error);
throw error;
} finally {
connection = null; // todo: prod: use close() if available
}
}
async function dropTables(connection, dropAllTables, tablesToDrop) {
if(dropAllTables) {
const query = `SELECT table_name FROM information_schema.tables WHERE table_schema = 'main'`;
const result = await runAndGetResults(connection, query, -1, true);
tablesToDrop = result.rows.flat();
}
if (tablesToDrop.length === 0) {
console.log('No tables to drop.');
return;
}
console.log(`Going to drop following tables`, tablesToDrop);
for(const tableName of tablesToDrop) {
console.log(`dropping table: ${tableName}`);
try {
await runAndGetResults(connection, `DROP TABLE IF EXISTS ${tableName}`, -1, false);
console.log(`dropped table: ${tableName}`);
} catch (err) {
console.error(`error dropping table ${tableName}:`, err);
}
}
}
async function runAndGetResults(connection, query, maxRowsToFetch = 2048, jsonParseEachRow = false, returnIterator = false) {
const result = await connection.run(query);
...
...
return {
header: columnNames,
schema: schema,
rows: collectedRows
};
}
async function loadFile(connection, filepath, tableName, sampleRowsToGet = 3) {
// Runs `CREATE TABLE ${tableName} AS SELECT * FROM read_csv_auto('${filepath}')`
...
...
return {
"table": tableName,
"filepath": filepath,
"schema": queryResult.schema,
"sampleRows": queryResult.rows
}
}
async function getRecordCount(connection, tableName) {
const query = `select count(1) as total_count from ${tableName}`;
const result = await runAndGetResults(connection, query, -1, true, false);
if(result.rows.length > 0) {
return result.rows[0][0];
} else {
return 0;
}
} cc: @carlopi |
Looks like you're creating a new DuckDBInstance for each operation. I believe it is a known problem that having multiple DuckDB instances in the same process writing to the same database files can cause corruption. Currently, My recommendation is to only have a single DuckDBInstance. Use the |
@jraymakers Thanks.. one question on using same connection only: If I I'll be consuming first 2048 rows only from the resultset, leaving remaining as-is, unconsumed. I can't use |
How much memory a query uses depends on the details. But if you're just scanning a data source, and streaming the results, that should minimize the memory usage. Note that I released |
@jraymakers Thanks a lot for clarification. I'll sure checkout the new release and associated PR. Also, I did simple streaming load test to gauge it's impact on RAM... whether it keeps increasing, or moves up and down corresponding to memory getting reclaimed too. It seems it works fine as of now. The memory doesn' keep piling up. Here's test code: async function run6(connection) {
const dataFilePaths = [
'/Users/nitinbansal/Downloads/mix data/generated.csv',
'/Users/nitinbansal/Downloads/hackernews_10k_dec 2024.csv'
];
const tableNames = ['table_1', 'table_2'];
let counts = 0;
await useConnection(DB_FILE_PATH, async (connection) => {
await dropTables(connection, true);
console.log(`Tables dropped. Loading data and getting count...`);
let result = await loadFile(connection, dataFilePaths[0], tableNames[0], 3);
result = await loadFile(connection, dataFilePaths[1], tableNames[1], 3);
for(let k = 0; k < 200; k++) {
console.log(`iteration: ${k+1}`);
result = await runAndGetResults(connection, `select * from table_1`, 2048, false, false);
console.log(`selected row count ${result?.rows?.length}`)
result = await runAndGetResults(connection, `select * from table_2`, 2048, false, false);
console.log(`selected row count ${result?.rows?.length}`)
}
});
} The DB file size is roughly 1GB. Thanks again.. |
What happens?
When using Java API (duckdb_jdbc 0.9.2) under windows 11, the lock file (*.wal) is not cleaned up with a connection.close call. This leads to an error if another connection is created (java.sql.SQLException: IO Error: Cannot open file ...).
To Reproduce
Open connection, close connection. WAL file is still there.. Open Connection again from another Thread, error!
Sometimes, when GC is run one or multiple times -> WAL goes away.
OS:
Windows 11
DuckDB Version:
0.9.2
DuckDB Client:
Java
Full Name:
Stefan Jann
Affiliation:
none
Have you tried this on the latest
main
branch?I have not tested with any build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: