Skip to content

import + pivot + export creates un-importable dump  #14601

Closed
duckdb/duckdb-r
#572
@meh2135

Description

What happens?

If you create a table from a pivot command, export the db, import it again elsewhere, do another pivot, then export again, the exported database will not be importable because of a bug in the schema.sql wherein there are multiple lines that start with CREATE TYPE __pivot_enum_0_0 AS ENUM

To Reproduce

# demonstrate bug
import numpy as np
import pandas as pd
import duckdb
from pathlib import Path

TEMP_PATH0 = "/tmp/duckdemo0"
TEMP_PATH1 = "/tmp/duckdemo1"
N = 1000
df = pd.DataFrame({"a":np.random.choice(["u", "v", "w"], N, True),
                   "b":np.random.choice(["x", "y", "z"], N, True),
                   "c":np.random.randn(N),})
conn0 = duckdb.connect(":memory:0")

ddf = conn0.from_df(df)
ddf.create("input_data")

xx = conn0.query("pivot input_data on a using max(c) group by b;")
xx.create("xx")
conn0.execute(f"EXPORT DATABASE '{TEMP_PATH0}'  (FORMAT PARQUET);")
conn1 = duckdb.connect(":memory:1")
conn1.execute(f"IMPORT DATABASE '{TEMP_PATH0}';")
yy = conn1.query("pivot input_data on b using max(c) group by a;")
yy.create("yy")
conn1.execute(f"EXPORT DATABASE '{TEMP_PATH1}'  (FORMAT PARQUET);")
with (Path(TEMP_PATH1) / "schema.sql").open("r") as f:
    print(f.read())

# CREATE TYPE __pivot_enum_0_0 AS ENUM ( 'x', 'y', 'z' );
# CREATE TYPE __pivot_enum_0_0 AS ENUM ( 'u', 'v', 'w' );

# CREATE TABLE input_data(a VARCHAR, b VARCHAR, c DOUBLE);
# CREATE TABLE xx(b VARCHAR, u DOUBLE, v DOUBLE, w DOUBLE);
# CREATE TABLE yy(a VARCHAR, x DOUBLE, y DOUBLE, z DOUBLE);

# Renames the pivot type. Breaks importing even though it's never used.

conn2 = duckdb.connect(":memory:2")
conn2.execute(f"IMPORT DATABASE '{TEMP_PATH1}';")
# Fails with message:
# CatalogException: Catalog Error: Type with name "__pivot_enum_0_0" already exists!

OS:

Ubuntu 22.04.5 LTS x86_64

DuckDB Version:

1.1.1

DuckDB Client:

python

Hardware:

No response

Full Name:

Michael Hankin

Affiliation:

Mothball Labs

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions