GROUP BY on "reusable alias" returns wrong result if there is also a column with same name as that alias #15766
Description
What happens?
A SELECT that creates an alias with the same name as a column returns an unexpected result when using that alias in the GROUP BY: the SELECT returns the values of the SELECT expression, but the GROUP BY is applied on the column, rather than the value of the expression in the SELECT.
One may argue that the GROUP BY conceptually "comes before" the SELECT, but if the alias is chosen differently than any of the column names, the GROUP BY does appear to be applied to the aliased expression from the SELECT.
To Reproduce
SELECT substring(col2, 1, 1) col2
FROM (VALUES(1,'aa'),(1,'ab'),(1,'ac')) as t(col1,col2)
GROUP BY col2
Result is:
┌─────────┐
│ col2 │
│ varchar │
├─────────┤
│ a │
│ a │
│ a │
└─────────┘
This was unexpected - a single row 'a' was expected.
Renaming the alias (and reference to the alias in the GROUP BY list) from col2 to col3 gives the expected result:
SELECT substring(col2, 1, 1) col3
FROM (VALUES(1,'aa'),(1,'ab'),(1,'ac')) as t(col1,col2)
GROUP BY col3
Result as expected:
┌─────────┐
│ col3 │
│ varchar │
├─────────┤
│ a │
└─────────┘
The behavior raises a question about exactly how scope and column alias references are supposed to work. Maybe this is not a bug per se but a documentation issue.
OS:
windows11
DuckDB Version:
1.1.3 GA, 1.1.4-dev4679
DuckDB Client:
CLI
Hardware:
Windows 11
Full Name:
Roland Bouman
Affiliation:
EPAM Systems BV
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have