Skip to content

Error accessing an iceberg table hosted in Azure blob storage #194

Open
@jhatcher1

Description

@jhatcher1

What happens?

We are trying to connect to an iceberg table in Azure blob storage, using the iceberg foreign data wrapper. When creating the foreign table, we observe the error:

ERROR:  Invalid Input Error: The provided connection string does not match the storage account named iceberg@mystorageaccount

We do not see this error when using duckdb to access the iceberg table table directly, or when using the parquet foreign data wrapper to access the iceberg table's parquet files directly.

To Reproduce

These are the steps performed to reproduce the error:

$ docker run --name paradedb -e POSTGRES_PASSWORD=password paradedb/paradedb
$ docker exec -it paradedb psql -U postgres
CREATE FOREIGN DATA WRAPPER iceberg_wrapper HANDLER iceberg_fdw_handler VALIDATOR iceberg_fdw_validator;
CREATE SERVER iceberg_server FOREIGN DATA WRAPPER iceberg_wrapper;

CREATE USER MAPPING FOR postgres
SERVER iceberg_server
OPTIONS (
  type 'AZURE',
  connection_string 'DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=<access-key>;EndpointSuffix=core.windows.net'
);

CREATE FOREIGN TABLE iceberg_table ()
SERVER iceberg_server
OPTIONS (
    files 'abfss://iceberg/path/to/table/metadata/<id>.metadata.json',
    skip_schema_inference 'true'
);
ERROR:  Invalid Input Error: The provided connection string does not match the storage account named iceberg@mystorageaccount

We do not see this error when using the duckdb CLI directly

$ duckdb
INSTALL azure;
LOAD azure;

INSTALL iceberg;
LOAD iceberg;

CREATE SECRET mysecret (
    TYPE AZURE,
    CONNECTION_STRING 'DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=<access-key>;EndpointSuffix=core.windows.net'
);

SELECT *
FROM iceberg_scan(
    'abfss://iceberg/path/to/table/metadata/<id>.metadata.json',
    skip_schema_inference = true
);
# Results are displayed from the table

We also do not see this error when using the parquet foreign data wrapper:

CREATE FOREIGN DATA WRAPPER parquet_wrapper HANDLER parquet_fdw_handler VALIDATOR parquet_fdw_validator;
CREATE SERVER parquet_server FOREIGN DATA WRAPPER parquet_wrapper;

CREATE USER MAPPING FOR postgres
SERVER parquet_server
OPTIONS (
  type 'AZURE',
  connection_string 'DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=<access-key>;EndpointSuffix=core.windows.net'
);

CREATE FOREIGN TABLE parquet_table ()
SERVER parquet_server
OPTIONS (files 'abfss://iceberg/path/to/table/data/*.parquet');
SELECT * FROM parquet_table;
# Results are displayed from the table

OS:

macOS (aarch64)

ParadeDB Version:

v0.13.2

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB Docker Image

Full Name:

Jordan Hatcher

Affiliation:

MindBridge AI

Did you include all relevant data sets for reproducing the issue?

N/A - The reproduction does not require a data set

Did you include the code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

  • Yes, I have

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority-lowLow priority issueuser-requestThis issue was directly requested by a user

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions