Lockfile (*.wal) is not cleaned up right after connection.close #10002

StefanJannCH · 2023-12-15T10:01:21Z

What happens?

When using Java API (duckdb_jdbc 0.9.2) under windows 11, the lock file (*.wal) is not cleaned up with a connection.close call. This leads to an error if another connection is created (java.sql.SQLException: IO Error: Cannot open file ...).

To Reproduce

Open connection, close connection. WAL file is still there.. Open Connection again from another Thread, error!
Sometimes, when GC is run one or multiple times -> WAL goes away.

OS:

Windows 11

DuckDB Version:

0.9.2

DuckDB Client:

Java

Full Name:

Stefan Jann

Affiliation:

none

Have you tried this on the latest `main` branch?

I have not tested with any build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Yes, I have

The text was updated successfully, but these errors were encountered:

szarnyasg · 2023-12-18T10:25:46Z

@StefanJannCH thanks for raising this issue. Please

provide a reproducible example
check for that all objects related to the database (e.g., ResultSet, PreparedStatement) are closed. The database is only cleaned up when every such object has been closed.

github-actions · 2024-03-18T00:27:54Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions · 2024-07-02T00:33:01Z

This issue was closed because it has been stale for 30 days with no activity.

AyushSingal22 · 2024-11-19T06:51:18Z

I am facing the same issue. I am creating a CRON job for creating a new DB.
The Databases are being created but it is not removing the .wal [lock]. which is creating the database unaccessible for another processes.
I am sharing the Code.
dailyWorker.js

const cron = require("node-cron");
const uuid = require('uuid');
const {deleteFile , fileExists , makeDir, copyFile} = require("./utils");


let lastDb = "";
const DATA_TEMP_STORE_PATH = "datastore/temp/";
const DATA_STORE_PATH = "datastore/store/";

function getDataFilePath(name) {
    return DATA_TEMP_STORE_PATH + name;
}

makeDir(DATA_STORE_PATH);
makeDir(DATA_TEMP_STORE_PATH);

cron.schedule("*/10 * * * * *", async function() {
    const creator = require("./dbCreator");
    console.log("Running db creator: " + new Date());
    let dbName = uuid.v4();
    let dbFile = getDataFilePath(dbName)
    await creator(dbFile);
    await copyFile(dbFile, DATA_STORE_PATH + dbName);
    lastDb = dbFile;
});

dbCreator.js

const creator = async (dbName) => {
    const duckdb = require("duckdb");
    const db = new duckdb.Database(dbName);

    async function f(val) {
        const con = db.connect();
        con.run('CREATE TABLE a (i INTEGER)');
        const stmt = con.prepare('INSERT INTO a VALUES (?)');
        for (let i = 0; i < 10000000; i++) {
            stmt.run(val);
        }
        stmt.finalize();
        con.close();
    };
    await f(1);
    db.close();
}
module.exports = creator;

freakynit · 2024-12-26T18:34:12Z

+1...

Happening on quick sequence of operations involving dropping a table and re-creating with same name. This makes database completely inaccessible.

I'm using from nodejs: @duckdb/node-api.

Edit:

Same wal file not gettig cleaned issue..
OS: Apple M1 Sonoma 14.5
Node: 20.11.1
Driver: 1.1.3-alpha.7

carlopi · 2024-12-26T18:39:53Z

This looks to be connected to node-api, possibly a misuse of CAPI, we'll take a look. Unsure if this is best placed here or inhttps://github.com/duckdb/duckdb-node-neo

freakynit · 2024-12-26T18:43:28Z

@carlopi I'll tag this issue on duck-node repo too.

Thanks..

carlopi · 2024-12-26T18:43:54Z

@freakynit: could you possibly share an end-to-end reproduction?

freakynit · 2024-12-26T18:46:26Z

@carlopi Will try to get reproducible code and share..

freakynit · 2024-12-26T19:07:24Z

@carlopi Unable to reproduce with test code... will share relevant log if I encounter the issue again.

carlopi · 2024-12-26T19:12:08Z

To get started even a more complete description of what's happening would help getting started diagnosing this.

Note that this might be something that need fixing in the code itself, improving error messages providing way out or documenting better the expected usage of the API.

freakynit · 2024-12-26T19:15:46Z

@carlopi Got it.. lemme share relevant log lines and existing code (will expand in further comment):

iteration: 1
Going to drop following tables [ 'table_2' ]
dropping table: table_2
dropped table: table_2
Tables dropped. Loading data and getting count...
Going to drop following tables [ 'table_1' ]
dropping table: table_1
dropped table: table_1
Tables dropped. Loading data and getting count...
iteration: 2
Going to drop following tables [ 'table_2' ]
dropping table: table_2
dropped table: table_2
Tables dropped. Loading data and getting count...
Error while using connection: [Error: FATAL Error: Failed to create checkpoint because of error: Could not remove file "./db.bin.wal": No such file or directory]
[Error: FATAL Error: Failed to create checkpoint because of error: Could not remove file "./db.bin.wal": No such file or directory]

Code:

async function run6(connection) {
    const dataFilePaths = [
        '/Users/nitinbansal/Downloads/mix data/generated.csv',
        '/Users/nitinbansal/Downloads/hackernews_10k_dec 2024.csv'
    ];
    const tableNames = ['table_1', 'table_2'];

    let counts = 0;

    for(let i = 0; i < 3; i++) {
        console.log(`iteration: ${i+1}`);

        for(let j = 0; j < dataFilePaths.length; j++) {
            await useConnection(DB_FILE_PATH, async (connection) => {
                await dropTables(connection, true);

                console.log(`Tables dropped. Loading data and getting count...`);

                let result = await loadFile(connection, dataFilePaths[j], tableNames[j], 3);
                let totalRecords = await getRecordCount(connection, tableNames[j]);

                counts = totalRecords;
            });
        }
    }
}

freakynit · 2024-12-26T19:20:50Z

Code for methods used above (almost full code):

async function initConnection(dbFilePath) {
    const instance = await DuckDBInstance.create(dbFilePath);
    return await instance.connect();
}


async function useConnection(dbFilePath, callback) {
    let connection = null;
    try {
        connection = await initConnection(dbFilePath);
        return await callback(connection); // Return the result of the callback
    } catch (error) {
        console.error("Error while using connection:", error);
        throw error;
    } finally {
        connection = null; // todo: prod: use close() if available
    }
}


async function dropTables(connection, dropAllTables, tablesToDrop) {
    if(dropAllTables) {
        const query = `SELECT table_name FROM information_schema.tables WHERE table_schema = 'main'`;
        const result = await runAndGetResults(connection, query, -1, true);
        tablesToDrop = result.rows.flat();
    }

    if (tablesToDrop.length === 0) {
        console.log('No tables to drop.');
        return;
    }

    console.log(`Going to drop following tables`, tablesToDrop);

    for(const tableName of tablesToDrop) {
        console.log(`dropping table: ${tableName}`);

        try {
            await runAndGetResults(connection, `DROP TABLE IF EXISTS ${tableName}`, -1, false);
            console.log(`dropped table: ${tableName}`);
        } catch (err) {
            console.error(`error dropping table ${tableName}:`, err);
        }
    }
}


async function runAndGetResults(connection, query, maxRowsToFetch = 2048, jsonParseEachRow = false, returnIterator = false) {
    const result = await connection.run(query);
    ...
    ...
    return {
        header: columnNames,
        schema: schema,
        rows: collectedRows
    };
}


async function loadFile(connection, filepath, tableName, sampleRowsToGet = 3) {
    // Runs `CREATE TABLE ${tableName} AS SELECT * FROM read_csv_auto('${filepath}')`
    ...
    ...
    return {
        "table": tableName,
        "filepath": filepath,
        "schema": queryResult.schema,
        "sampleRows": queryResult.rows
    }
}


async function getRecordCount(connection, tableName) {
    const query = `select count(1) as total_count from ${tableName}`;
    const result = await runAndGetResults(connection, query, -1, true, false);

    if(result.rows.length > 0) {
        return result.rows[0][0];
    } else {
        return 0;
    }
}

cc: @carlopi

jraymakers · 2024-12-26T19:27:03Z

Looks like you're creating a new DuckDBInstance for each operation. I believe it is a known problem that having multiple DuckDB instances in the same process writing to the same database files can cause corruption.

Currently, @duckdb/node-api doesn't provide a way to explicitly clean up an instance. It gets automatically cleaned up a short amount of time after the reference is released, but creating many instances in quick succession, and attaching them to the same database files, could result in multiple instances attached to, and writing to, the same database files at the same time.

My recommendation is to only have a single DuckDBInstance. Use the ATTACH and DETACH commands to load and unload database files.

freakynit · 2024-12-26T19:43:16Z

@jraymakers Thanks.. one question on using same connection only:

If I repeatedly select large amounts of data (like millions of row) using chunked streaming (not full, but, partial, 2048 at a time), can that lead to OOM?

I'll be consuming first 2048 rows only from the resultset, leaving remaining as-is, unconsumed. I can't use limit for this case since I need to run query on full dataset, and only show first 2048 rows. Or that further rows are actually fetched on next chunk call itself?

jraymakers · 2024-12-26T19:46:58Z

How much memory a query uses depends on the details. But if you're just scanning a data source, and streaming the results, that should minimize the memory usage.

Note that I released @duckdb/node-api@1.1.3-alpha.8 a couple days ago that implements true streaming. For your use case, you should be sure to use that, otherwise the results could get materialized eagerly in the background. This PR demonstrates some of the different patterns and compares some aspects of their performance.

freakynit · 2024-12-26T20:00:23Z

@jraymakers Thanks a lot for clarification. I'll sure checkout the new release and associated PR.

Also, I did simple streaming load test to gauge it's impact on RAM... whether it keeps increasing, or moves up and down corresponding to memory getting reclaimed too. It seems it works fine as of now. The memory doesn' keep piling up.

Here's test code:

async function run6(connection) {
    const dataFilePaths = [
        '/Users/nitinbansal/Downloads/mix data/generated.csv',
        '/Users/nitinbansal/Downloads/hackernews_10k_dec 2024.csv'
    ];
    const tableNames = ['table_1', 'table_2'];

    let counts = 0;

    await useConnection(DB_FILE_PATH, async (connection) => {
        await dropTables(connection, true);

        console.log(`Tables dropped. Loading data and getting count...`);

        let result = await loadFile(connection, dataFilePaths[0], tableNames[0], 3);
        result = await loadFile(connection, dataFilePaths[1], tableNames[1], 3);

        for(let k = 0; k < 200; k++) {
            console.log(`iteration: ${k+1}`);

            result = await runAndGetResults(connection, `select * from table_1`, 2048, false, false);
            console.log(`selected row count ${result?.rows?.length}`)

            result = await runAndGetResults(connection, `select * from table_2`, 2048, false, false);
            console.log(`selected row count ${result?.rows?.length}`)
        }
    });
}

The DB file size is roughly 1GB.

Thanks again..

StefanJannCH added the needs triage label Dec 15, 2023

szarnyasg added the under review label Dec 15, 2023

duckdblabs-bot removed the needs triage label Dec 15, 2023

github-actions bot added the stale label Mar 18, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 2, 2024

carlopi reopened this Dec 26, 2024

freakynit mentioned this issue Dec 26, 2024

Possible misuse of CAPI causing wal files to remain intact and mek DB inaccessible duckdb/duckdb-node-neo#83

Closed

carlopi removed the stale label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lockfile (*.wal) is not cleaned up right after connection.close #10002

Lockfile (*.wal) is not cleaned up right after connection.close #10002

StefanJannCH commented Dec 15, 2023

szarnyasg commented Dec 18, 2023

github-actions bot commented Mar 18, 2024

github-actions bot commented Jul 2, 2024

AyushSingal22 commented Nov 19, 2024 •

edited

Loading

freakynit commented Dec 26, 2024 •

edited

Loading

carlopi commented Dec 26, 2024

freakynit commented Dec 26, 2024

carlopi commented Dec 26, 2024

freakynit commented Dec 26, 2024

freakynit commented Dec 26, 2024

carlopi commented Dec 26, 2024 •

edited

Loading

freakynit commented Dec 26, 2024 •

edited

Loading

freakynit commented Dec 26, 2024 •

edited

Loading

jraymakers commented Dec 26, 2024

freakynit commented Dec 26, 2024

jraymakers commented Dec 26, 2024

freakynit commented Dec 26, 2024

Lockfile (*.wal) is not cleaned up right after connection.close #10002

Lockfile (*.wal) is not cleaned up right after connection.close #10002

Comments

StefanJannCH commented Dec 15, 2023

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest main branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

szarnyasg commented Dec 18, 2023

github-actions bot commented Mar 18, 2024

github-actions bot commented Jul 2, 2024

AyushSingal22 commented Nov 19, 2024 • edited Loading

freakynit commented Dec 26, 2024 • edited Loading

Edit:

carlopi commented Dec 26, 2024

freakynit commented Dec 26, 2024

carlopi commented Dec 26, 2024

freakynit commented Dec 26, 2024

freakynit commented Dec 26, 2024

carlopi commented Dec 26, 2024 • edited Loading

freakynit commented Dec 26, 2024 • edited Loading

freakynit commented Dec 26, 2024 • edited Loading

jraymakers commented Dec 26, 2024

freakynit commented Dec 26, 2024

jraymakers commented Dec 26, 2024

freakynit commented Dec 26, 2024

Have you tried this on the latest `main` branch?

AyushSingal22 commented Nov 19, 2024 •

edited

Loading

freakynit commented Dec 26, 2024 •

edited

Loading

carlopi commented Dec 26, 2024 •

edited

Loading

freakynit commented Dec 26, 2024 •

edited

Loading

freakynit commented Dec 26, 2024 •

edited

Loading