Move PIVOT/UNPIVOT internals to internals page

duckdb · Nov 11, 2024 · 98e4a53 · 98e4a53
1 parent 42fe812
commit 98e4a53
Show file tree

Hide file tree

Showing 4 changed files with 141 additions and 120 deletions.
diff --git a/_data/menu_docs_dev.json b/_data/menu_docs_dev.json
@@ -1601,6 +1601,10 @@
             {
               "page": "Execution Format",
               "url": "vector"
+            },
+            {
+              "page": "Pivot",
+              "url": "pivot"
             }
           ]
         }

diff --git a/docs/internals/pivot.md b/docs/internals/pivot.md
@@ -0,0 +1,127 @@
+---
+layout: docu
+title: Pivot Internals
+---
+
+## `PIVOT`
+
+[Pivoting]({% link docs/sql/statements/pivot.md %}) is implemented as a combination of SQL query re-writing and a dedicated `PhysicalPivot` operator for higher performance.
+Each `PIVOT` is implemented as set of aggregations into lists and then the dedicated `PhysicalPivot` operator converts those lists into column names and values.
+Additional pre-processing steps are required if the columns to be created when pivoting are detected dynamically (which occurs when the `IN` clause is not in use).
+
+DuckDB, like most SQL engines, requires that all column names and types be known at the start of a query.
+In order to automatically detect the columns that should be created as a result of a `PIVOT` statement, it must be translated into multiple queries.
+[`ENUM` types]({% link docs/sql/data_types/enum.md %}) are used to find the distinct values that should become columns.
+Each `ENUM` is then injected into one of the `PIVOT` statement's `IN` clauses.
+
+After the `IN` clauses have been populated with `ENUM`s, the query is re-written again into a set of aggregations into lists.
+
+For example:
+
+```sql
+PIVOT cities
+ON year
+USING sum(population);
+```
+
+is initially translated into:
+
+```sql
+CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM (
+    SELECT DISTINCT
+        year::VARCHAR
+    FROM cities
+    ORDER BY
+        year
+    );
+PIVOT cities
+ON year IN __pivot_enum_0_0
+USING sum(population);
+```
+
+and finally translated into:
+
+```sql
+SELECT country, name, list(year), list(population_sum)
+FROM (
+    SELECT country, name, year, sum(population) AS population_sum
+    FROM cities
+    GROUP BY ALL
+)
+GROUP BY ALL;
+```
+
+This produces the result:
+
+<div class="narrow_table"></div>
+
+| country |     name      |    list("year")    | list(population_sum) |
+|---------|---------------|--------------------|----------------------|
+| NL      | Amsterdam     | [2000, 2010, 2020] | [1005, 1065, 1158]   |
+| US      | Seattle       | [2000, 2010, 2020] | [564, 608, 738]      |
+| US      | New York City | [2000, 2010, 2020] | [8015, 8175, 8772]   |
+
+The `PhysicalPivot` operator converts those lists into column names and values to return this result:
+
+<div class="narrow_table"></div>
+
+| country |     name      | 2000 | 2010 | 2020 |
+|---------|---------------|-----:|-----:|-----:|
+| NL      | Amsterdam     | 1005 | 1065 | 1158 |
+| US      | Seattle       | 564  | 608  | 738  |
+| US      | New York City | 8015 | 8175 | 8772 |
+
+## `UNPIVOT`
+
+### Internals
+
+Unpivoting is implemented entirely as rewrites into SQL queries.
+Each `UNPIVOT` is implemented as set of `unnest` functions, operating on a list of the column names and a list of the column values.
+If dynamically unpivoting, the `COLUMNS` expression is evaluated first to calculate the column list.
+
+For example:
+
+```sql
+UNPIVOT monthly_sales
+ON jan, feb, mar, apr, may, jun
+INTO
+    NAME month
+    VALUE sales;
+```
+
+is translated into:
+
+```sql
+SELECT
+    empid,
+    dept,
+    unnest(['jan', 'feb', 'mar', 'apr', 'may', 'jun']) AS month,
+    unnest(["jan", "feb", "mar", "apr", "may", "jun"]) AS sales
+FROM monthly_sales;
+```
+
+Note the single quotes to build a list of text strings to populate `month`, and the double quotes to pull the column values for use in `sales`.
+This produces the same result as the initial example:
+
+<div class="narrow_table"></div>
+
+| empid |    dept     | month | sales |
+|------:|-------------|-------|------:|
+| 1     | electronics | jan   | 1     |
+| 1     | electronics | feb   | 2     |
+| 1     | electronics | mar   | 3     |
+| 1     | electronics | apr   | 4     |
+| 1     | electronics | may   | 5     |
+| 1     | electronics | jun   | 6     |
+| 2     | clothes     | jan   | 10    |
+| 2     | clothes     | feb   | 20    |
+| 2     | clothes     | mar   | 30    |
+| 2     | clothes     | apr   | 40    |
+| 2     | clothes     | may   | 50    |
+| 2     | clothes     | jun   | 60    |
+| 3     | cars        | jan   | 100   |
+| 3     | cars        | feb   | 200   |
+| 3     | cars        | mar   | 300   |
+| 3     | cars        | apr   | 400   |
+| 3     | cars        | may   | 500   |
+| 3     | cars        | jun   | 600   |
diff --git a/docs/sql/statements/pivot.md b/docs/sql/statements/pivot.md
@@ -11,6 +11,8 @@ The values within those new columns are calculated using an aggregate function o
 DuckDB implements both the SQL Standard `PIVOT` syntax and a simplified `PIVOT` syntax that automatically detects the columns to create while pivoting.
 `PIVOT_WIDER` may also be used in place of the `PIVOT` keyword.
 
+For details on how the `PIVOT` statement is implemented, see the [Pivot Internals site]({% link docs/internals/pivot.md %}#pivot).
+
 > The [`UNPIVOT` statement]({% link docs/sql/statements/unpivot.md %}) is the inverse of the `PIVOT` statement.
 
 ## Simplified `PIVOT` Syntax
@@ -49,6 +51,7 @@ INSERT INTO cities VALUES
 ```
 
 ```sql
+SELECT *
 FROM cities;
 ```
 
@@ -273,6 +276,7 @@ Each `PIVOT` can be treated as if it were a `SELECT` node, so they can be joined
 For example, if two `PIVOT` statements share the same `GROUP BY` expression, they can be joined together using the columns in the `GROUP BY` clause into a wider pivot.
 
 ```sql
+SELECT *
 FROM (PIVOT cities ON year USING sum(population) GROUP BY country) year_pivot
 JOIN (PIVOT cities ON name USING sum(population) GROUP BY country) name_pivot
 USING (country);
@@ -285,74 +289,6 @@ USING (country);
 | NL      | 1005 | 1065 | 1158 | 3228      | NULL          | NULL    |
 | US      | 8579 | 8783 | 9510 | NULL      | 24962         | 1910    |
 
-## Internals
-
-Pivoting is implemented as a combination of SQL query re-writing and a dedicated `PhysicalPivot` operator for higher performance.
-Each `PIVOT` is implemented as set of aggregations into lists and then the dedicated `PhysicalPivot` operator converts those lists into column names and values.
-Additional pre-processing steps are required if the columns to be created when pivoting are detected dynamically (which occurs when the `IN` clause is not in use).
-
-DuckDB, like most SQL engines, requires that all column names and types be known at the start of a query.
-In order to automatically detect the columns that should be created as a result of a `PIVOT` statement, it must be translated into multiple queries.
-[`ENUM` types]({% link docs/sql/data_types/enum.md %}) are used to find the distinct values that should become columns.
-Each `ENUM` is then injected into one of the `PIVOT` statement's `IN` clauses.
-
-After the `IN` clauses have been populated with `ENUM`s, the query is re-written again into a set of aggregations into lists.
-
-For example:
-
-```sql
-PIVOT cities
-ON year
-USING sum(population);
-```
-
-is initially translated into:
-
-```sql
-CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM (
-    SELECT DISTINCT
-        year::VARCHAR
-    FROM cities
-    ORDER BY
-        year
-    );
-PIVOT cities
-ON year IN __pivot_enum_0_0
-USING sum(population);
-```
-
-and finally translated into:
-
-```sql
-SELECT country, name, list(year), list(population_sum)
-FROM (
-    SELECT country, name, year, sum(population) AS population_sum
-    FROM cities
-    GROUP BY ALL
-)
-GROUP BY ALL;
-```
-
-This produces the result:
-
-<div class="narrow_table"></div>
-
-| country |     name      |    list("year")    | list(population_sum) |
-|---------|---------------|--------------------|----------------------|
-| NL      | Amsterdam     | [2000, 2010, 2020] | [1005, 1065, 1158]   |
-| US      | Seattle       | [2000, 2010, 2020] | [564, 608, 738]      |
-| US      | New York City | [2000, 2010, 2020] | [8015, 8175, 8772]   |
-
-The `PhysicalPivot` operator converts those lists into column names and values to return this result:
-
-<div class="narrow_table"></div>
-
-| country |     name      | 2000 | 2010 | 2020 |
-|---------|---------------|-----:|-----:|-----:|
-| NL      | Amsterdam     | 1005 | 1065 | 1158 |
-| US      | Seattle       | 564  | 608  | 738  |
-| US      | New York City | 8015 | 8175 | 8772 |
-
 ## Simplified `PIVOT` Full Syntax Diagram
 
 Below is the full syntax diagram of the `PIVOT` statement.
@@ -364,6 +300,7 @@ Below is the full syntax diagram of the `PIVOT` statement.
 The full syntax diagram is below, but the SQL Standard `PIVOT` syntax can be summarized as:
 
 ```sql
+SELECT *
 FROM ⟨dataset⟩
 PIVOT (
     ⟨values⟩
@@ -374,6 +311,7 @@ PIVOT (
     GROUP BY ⟨rows⟩
 );
 ```
+
 Unlike the simplified syntax, the `IN` clause must be specified for each column to be pivoted.
 If you are interested in dynamic pivoting, the simplified syntax is recommended.
 
@@ -384,6 +322,7 @@ Note that no commas separate the expressions in the `FOR` clause, but that `valu
 This example uses a single value expression, a single column expression, and a single row expression:
 
 ```sql
+SELECT *
 FROM cities
 PIVOT (
     sum(population)
@@ -403,6 +342,7 @@ PIVOT (
 This example is somewhat contrived, but serves as an example of using multiple value expressions and multiple columns in the `FOR` clause.
 
 ```sql
+SELECT *
 FROM cities
 PIVOT (
     sum(population) AS total,

diff --git a/docs/sql/statements/unpivot.md b/docs/sql/statements/unpivot.md
@@ -12,6 +12,8 @@ DuckDB implements both the SQL Standard `UNPIVOT` syntax and a simplified `UNPIV
 Both can utilize a [`COLUMNS` expression]({% link docs/sql/expressions/star.md %}#columns) to automatically detect the columns to unpivot.
 `PIVOT_LONGER` may also be used in place of the `UNPIVOT` keyword.
 
+For details on how the `UNPIVOT` statement is implemented, see the [Pivot Internals site]({% link docs/internals/pivot.md %}#unpivot).
+
 > The [`PIVOT` statement]({% link docs/sql/statements/pivot.md %}) is the inverse of the `UNPIVOT` statement.
 
 ## Simplified `UNPIVOT` Syntax
@@ -212,58 +214,6 @@ UNPIVOT
 | col1 | 84    |
 | col2 | woot  |
 
-### Internals
-
-Unpivoting is implemented entirely as rewrites into SQL queries.
-Each `UNPIVOT` is implemented as set of `unnest` functions, operating on a list of the column names and a list of the column values.
-If dynamically unpivoting, the `COLUMNS` expression is evaluated first to calculate the column list.
-
-For example:
-
-```sql
-UNPIVOT monthly_sales
-ON jan, feb, mar, apr, may, jun
-INTO
-    NAME month
-    VALUE sales;
-```
-
-is translated into:
-
-```sql
-SELECT
-    empid,
-    dept,
-    unnest(['jan', 'feb', 'mar', 'apr', 'may', 'jun']) AS month,
-    unnest(["jan", "feb", "mar", "apr", "may", "jun"]) AS sales
-FROM monthly_sales;
-```
-
-Note the single quotes to build a list of text strings to populate `month`, and the double quotes to pull the column values for use in `sales`.
-This produces the same result as the initial example:
-
-<div class="narrow_table"></div>
-
-| empid |    dept     | month | sales |
-|------:|-------------|-------|------:|
-| 1     | electronics | jan   | 1     |
-| 1     | electronics | feb   | 2     |
-| 1     | electronics | mar   | 3     |
-| 1     | electronics | apr   | 4     |
-| 1     | electronics | may   | 5     |
-| 1     | electronics | jun   | 6     |
-| 2     | clothes     | jan   | 10    |
-| 2     | clothes     | feb   | 20    |
-| 2     | clothes     | mar   | 30    |
-| 2     | clothes     | apr   | 40    |
-| 2     | clothes     | may   | 50    |
-| 2     | clothes     | jun   | 60    |
-| 3     | cars        | jan   | 100   |
-| 3     | cars        | feb   | 200   |
-| 3     | cars        | mar   | 300   |
-| 3     | cars        | apr   | 400   |
-| 3     | cars        | may   | 500   |
-| 3     | cars        | jun   | 600   |
 
 ### Simplified `UNPIVOT` Full Syntax Diagram