-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49565][SQL] Improve auto-generated expression aliases with pipe SQL operators #49245
Conversation
Shall we make |
@cloud-fan I tried this, but the PipeExpression needs to check whether its child expression tree contains any aggregate functions, and this needs its child tree to be fully resolved first (which may not have happened yet by the time we run the |
@dtenedor after a few iterations,
|
@cloud-fan I looked again and I was able to make this work in |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AliasResolution.scala
Outdated
Show resolved
Hide resolved
@dtenedor there are more test failures in the golden files. |
respond to code review comments
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
@@ -503,7 +504,8 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor | |||
*/ | |||
object ResolveAliases extends Rule[LogicalPlan] { | |||
def apply(plan: LogicalPlan): LogicalPlan = | |||
plan.resolveOperatorsUpWithPruning(_.containsPattern(UNRESOLVED_ALIAS), ruleId) { | |||
plan.resolveOperatorsUpWithPruning( | |||
_.containsAnyPattern(UNRESOLVED_ALIAS, PIPE_EXPRESSION), ruleId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, reverted this change.
@@ -538,6 +540,13 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor | |||
if c.child.resolved && AliasResolution.hasUnresolvedAlias(c.metrics) => | |||
c.copy(metrics = AliasResolution.assignAliases(c.metrics)) | |||
} | |||
|
|||
private def removePipeExpressions(exprs: Seq[NamedExpression]): Seq[NamedExpression] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, reverted this change.
case node: LogicalPlan => | ||
node.resolveExpressions { | ||
case p: PipeExpression if p.child.resolved => | ||
p.checkInvariantsAndRemove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we remove this lazy val and inline the validation code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, done.
thanks, merging to master! |
What changes were proposed in this pull request?
This RP improves auto-generated expression aliases with pipe SQL operators.
For example, consider the pipe SQL syntax query:
Previously, the analyzed plan was:
After this PR, it is:
Note that the output aliases visible in the resulting DataFrame for the query derive from the
AS <alias>
part of the analyzed plans shown.Why are the changes needed?
This improves the user experience with pipe SQL syntax.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
Existing golden file tests update to show the improved aliases.
Was this patch authored or co-authored using generative AI tooling?
No.