Skip to content

Attempted to dereference unique_ptr that is NULL when inserting many rows #10745

Closed
@Miksu82

Description

What happens?

When inserting (in my case) over 80000 rows results in "duckdb.duckdb.FatalException: FATAL Error: Failed to create checkpoint because of error: Attempted to dereference unique_ptr that is NULL!". I first encountered this when I was updating 2000 rows like this

duckdb.sql("CREATE TABLE temp AS SELECT * FROM df")
duckdb.sql("UPDATE target SET col = temp.col FROM temp WHERE target.id = temp.id")
duckdb.sql("DROP TABLE temp")

And I got the same error at DROP TABLE. I cannot share that code because it is propetiary, but I'll try to see if I can create something else to reproduce it. Below a code snippet that triggers the crash when inserting.

To Reproduce

import duckdb
import random
import string

# Step 1: Generate 2000 random strings of max length 10
def generate_random_strings(n=90000, max_length=10):
    random_strings = []
    for _ in range(n):
        length = random.randint(1, max_length)  # Random length up to 10
        random_str = ''.join(random.choices(string.ascii_letters + string.digits, k=length))
        random_strings.append(random_str)
    return random_strings

random_strings = generate_random_strings()

# Step 2: Connect to DuckDB. This will create an in-memory database by default.
# To persist data, you can specify a file name e.g., duckdb.connect('mydata.db')
con = duckdb.connect(database='test.db', read_only=False)

# Step 3: Create a table with an autoincrement ID
con.execute("""
    CREATE TABLE test_table (
        id INTEGER,
        random_string VARCHAR,
        emb FLOAT[]
    );
""")

con.execute("")

# Insert the strings into the table
insert_query = "INSERT INTO test_table (id, random_string) VALUES (?, ?)"

# DuckDB's executemany is used for batch insertion.
con.executemany(insert_query, [(i, s,) for i, s in enumerate(random_strings)])

# Verify the insertion
result = con.execute("SELECT COUNT(*) FROM test_table").fetchall()
print(f"Inserted rows: {result[0][0]}")

# Optional: Display some inserted rows to verify
sample_result = con.execute("SELECT * FROM test_table LIMIT 5;").fetchall()
print("Sample inserted rows:")
for row in sample_result:
    print(row)

# Close the connection (in case of a persistent database)
con.close()

OS:

macOS M3 Pro 14.1

DuckDB Version:

v0.10.0 20b1486

DuckDB Client:

python 3.12.1

Full Name:

Mika Ristimäki

Affiliation:

Private

Have you tried this on the latest nightly build?

I have tested with a nightly build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions