CSV reader tries to convert quoted string column to BIGINT?

### What happens?

As discussed in https://github.com/posit-dev/positron/discussions/5707, a large tab-delimited text file has a column of quoted strings that contain all numeric digits for the first couple million lines and then start including non-numeric digits after that. Since the values in the column are all quoted, DuckDB should consider them strings without trying to infer a numeric data type. 

### To Reproduce

Download and unzip this file: https://s3.amazonaws.com/data.patentsview.org/download/g_patent.tsv.zip

The following error is generated when trying to scan this file:

```sql
create table patent as from 'g_patent.tsv';
```
```console
Conversion Error: CSV Error on Line: 8123649
Original Line: "D1000263"	"design"	"2023-10-03"	"Connector for expanded cell confinement web with polygon handle"	"S1"	1	0	"ipg231003.xml"
Error when converting column "patent_id". Could not convert string "D1000263" to 'BIGINT'

Column patent_id is being converted as type BIGINT
This type was auto-detected from the CSV file.
Possible solutions:
* Override the type for this column manually by setting the type explicitly, e.g. types={'patent_id': 'VARCHAR'}
* Set the sample size to a larger value to enable the auto-detection to scan more values, e.g. sample_size=-1
* Use a COPY statement to automatically derive types from an existing table.

  file = /home/wesm/Downloads/g_patent.tsv
  delimiter = 	 (Auto-Detected)
  quote = " (Auto-Detected)
  escape = " (Auto-Detected)
  new_line = \n (Auto-Detected)
  header = true (Auto-Detected)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 0
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0

```

This works however:

```sql
create table patent as from read_csv('g_patent.tsv', delim='\t', sample_size=-1);
```

The nightly build (dev2261) has a different error:

```sql
create table patent as from 'g_patent.tsv';
```

```console
Invalid Input Error:
CSV Error on Line: 2156093
Original Line: "4001958"	"utility"	"1977-01-11"	"""""""Take-one"""" display card"""	"A"	2	0	"pftaps19770111_wk02.zip"
Value with unterminated quote found.

Possible fixes:
* Enable ignore errors (ignore_errors=true) to skip this row
* Set quote to empty or to a different value (e.g., quote='')

  file = /home/wesm/Downloads/g_patent.tsv
  delimiter = 	 (Auto-Detected)
  quote = " (Auto-Detected)
  escape = \ (Auto-Detected)
  new_line = \n (Auto-Detected)
  header = true (Auto-Detected)
  skip_rows = 0 (Auto-Detected)
  comment = \0 (Auto-Detected)
  rfc_4180 = true (Auto-Detected)
  date_format =  (Auto-Detected)
  timestamp_format =  (Auto-Detected)
  null_padding = 0
  sample_size = 20480
  ignore_errors = false
  all_varchar = 0
```

This may be the intended behavior, but I couldn't find an explanation for how quoting is used/not used in the documentation (it seems that quoting is only being used for delimiting, but has no bearing on the type inference logic). It may be that a new option could limit type inference for quoted fields only to non-numeric data types.

### OS:

duckdb-wasm, duckdb-cli

### DuckDB Version:

1.1.3

### DuckDB Client:

duckdb-cli, duckdb-wasm

### Hardware:

_No response_

### Full Name:

Wes McKinney

### Affiliation:

Posit

### What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

### Did you include all relevant data sets for reproducing the issue?

Yes

### Did you include all code required to reproduce the issue?

- [X] Yes, I have

### Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

- [X] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV reader tries to convert quoted string column to BIGINT? #15376

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development