doc/user: update docs + release notes for 0.3.1 and 0.4.0

This commit polishes the documentation and release notes for the 0.3.1 and 0.4.0 releases. A bit late, as both of those releases have shipped, but better late than never. The big new items of documentation include: * basic documentation for record types * documentation of escape string literals * documentation of typed string literals I've additionally updated the release notes for 0.3.1 and 0.4.0 to match the style guide, added many missing release notes, grouped things by theme where possible, marked backwards incompatible changes, and resorted several notes that were in the wrong release. These are the standards I'd like to hold release notes too, and I'm happy to do the legwork to make that so.
zRedShift · Jul 30, 2020 · 2140443 · 2140443
1 parent 3870c7c
commit 2140443
Show file tree

Hide file tree

Showing 10 changed files with 536 additions and 36 deletions.
diff --git a/doc/user/content/operations/_index.md b/doc/user/content/operations/_index.md
@@ -0,0 +1,133 @@
+---
+title: "Monitoring and Operations"
+description: "Find details about running your Materialize instances"
+menu: "main"
+weight: 80
+---
+
+_This page is a work in progress and will have more detail in the coming months.
+If you have specific questions, feel free to [file a GitHub
+issue](https://github.com/MaterializeInc/materialize/issues/new?labels=C-feature&template=feature.md)._
+
+Materialize supports integration with monitoring tools using HTTP endpoints.
+
+### Quick monitoring dashboard
+
+Materialize provides a recommended Grafana dashboard and an all-inclusive Docker image
+preconfigured to run the dashboard as [`materialize/dashboard`][simplemon-hub].
+
+The only configuration required to get started with the Docker image is the
+`MATERIALIZED_URL=<host>:<port>` environment variable.
+
+As an example, if you are running `materialized` in a cloud instance at the IP address
+`172.16.0.0`, you can launch the dashboard by running this command and
+opening `http://localhost:3000` in your web browser:
+
+```shell
+#               expose ports  ______point it at materialize______
+$ docker run -d -p 3000:3000  -e MATERIALIZED_URL=172.16.0.0:6875 materialize/dashboard
+```
+
+See [Observing local Materialize](#observing-local-materialize) below if you want to run
+the dashboard on the same machine on which you are running Materialize.
+
+The `materialize/dashboard` Docker image bundles Prometheus and Grafana together to make
+getting insight into Materialize's performance easy. It is not particularly
+configurable, and in particular is not designed to handle large metric volumes or long
+uptimes. It will start truncating metrics history after about 1GB of storage, which
+corresponds to about 3 days of data with the very fine-grained metrics collected inside
+the container.
+
+So, while the dashboard is provided as a convenience and should not be relied on for
+production monitoring, if you would like to persist metrics across restarts of the
+container you can mount a Docker volume onto `/prometheus`:
+
+```console
+$ docker run -d \
+    -v /tmp/prom-data:/prometheus -u "$(id -u):$(id -g)" \
+    -p 3000:3000 -e MATERIALIZED_URL=172.16.0.0:6875 \
+    materialize/dashboard
+```
+
+### Health check
+
+Materialize supports a minimal health check endpoint at `<materialized
+host>/status`.
+
+### Prometheus
+
+Materialize exposes [Prometheus](https://prometheus.io/) metrics at the default
+path, `<materialized host>/metrics`.
+
+Materialize broadly publishes the following types of data there:
+
+- Materialize-specific data with a `mz_*` prefix. For example,
+  `rate(mz_responses_sent_total[10s])` will show you the number of responses
+  averaged over 10 second windows.
+- Standard process metrics with a `process_*` prefix. For exmple, `process_cpu`.
+
+### Grafana
+
+Materialize provides a [recommended dashboard][dashboard-json] that you can [import into
+Grafana][graf-import]. It relies on you having configured Prometheus to scrape
+materialized.
+
+### Datadog
+
+Materialize metrics can be sent to Datadog via the
+[OpenMetrics agent check](https://www.datadoghq.com/blog/monitor-prometheus-metrics/).
+(Requires Datadog Agent 6 and above). Simply configure _"prometheus_url"_ (ie
+`http://<materialized host>/metrics`), namespace, and metrics (ie `mz*`) in
+_"openmetrics.d/conf.yaml"_.
+
+## Other Setups
+
+Even if you aren't running materialized at web scale, you can still use our web-scale
+tools to observe it.
+
+### Observing local Materialize
+
+#### Inside Docker Compose or Kubernetes
+
+Local schedulers like Docker Compose (which we use for our demos) or Kubernetes will
+typically expose running containers to each other using their service name as a public
+DNS hostname, but _only_ within the network that they are running in.
+
+The easiest way to use the dashboard inside a scheduler is to tell the scheduler to run
+it. [Here is an example][dc-example] of configuring Docker Compose to run the dashboard.
+
+#### On MacOS, with materialized running outside of Docker
+
+The problem with this is that `localhost` inside of Docker cannot, on Docker for Mac,
+refer to the mac network. So instead you must use `host.docker.internal`:
+
+```
+docker run -p 3000:3000 -e MATERIALIZED_URL=host.docker.internal:6875 materialize/dashboard
+```
+
+#### On Linux, with Materialize running outside of Docker
+
+Docker containers use a different network than their host by default, but that is easy to
+get around using the `--network` flag. Using the host network means that ports will be
+allocated from the host, so the `-p` flag is no longer necessary:
+
+```
+docker run --network host -e MATERIALIZED_URL=localhost:6875 materialize/dashboard
+```
+
+[simplemon-hub]: https://hub.docker.com/repository/docker/materialize/dashboard
+[dashboard-json]: https://github.com/MaterializeInc/materialize/blob/main/misc/monitoring/dashboard/conf/grafana/dashboards/overview.json
+[graf-import]: https://grafana.com/docs/grafana/latest/reference/export_import/#importing-a-dashboard
+[dc-example]: https://github.com/MaterializeInc/materialize/blob/d793b112758c840c1240eefdd56ca6f7e4f484cf/demo/billing/mzcompose.yml#L60-L70
+
+## Memory
+
+Materialize stores the majority of its state in-memory, and works best when the streamed data
+can be reduced in some way. For example, if you know that only a subset of your rows and columns
+are relevant for your queries, it helps to avoid materializing sources or views until you've
+expressed this to the system (we can avoid stashing that data, which can in some cases dramatically
+reduce the memory footprint).
+
+To minimize the chances that Materialize runs out of memory in a production environment,
+we recommend you make additional memory available to Materialize via a SSD-backed
+swap file or swap partition.
diff --git a/doc/user/content/release-notes.md b/doc/user/content/release-notes.md
@@ -49,57 +49,178 @@ Wrap your release notes at the 80 character mark.
 <span id="v0.4.1"></span>
 ## v0.4.1 (Unreleased)
 
-- Make casting from `numeric` and `float` to `int` consistent with PostgreSQL by
-  rounding before the conversion.
+No release notes yet.
 
 <span id="v0.4.0"></span>
 ## v0.4.0
 
-- Support for SASL PLAIN Authentication to support Confluent Cloud {{% gh 3418 %}}
-- Update Change Data Capture format to allow Kafka sinks to optionally emit Debezium
-  style [consistency metadata](/sql/create-sink/#consistency-metadata).
-- Introduce the ability to [rename indexes, sinks, sources, and
-  views](/sql/alter-rename).
-- Rename the `-w`/`--threads` command line argument to `-w`/`--workers`, since it
-  reflects timely workers and does not limit the number of threads that materialized may
-  start.
-- Fix a bug that prevented dropping databases with cross-schema dependencies.
-  {{% gh 3558 %}}
-- Expose [Prometheus metrics for sinks](https://materialize.io/docs/monitoring/).
-- Allow column names in SELECT clause to be used by GROUP BY {{% gh 1673 %}}
-- Make decimal / float to int casts behave similar to Postgres {{% gh 3700 %}}
-- Fix a bug that prevented ordering by columns that are not in the select clause {{% gh
-  696 %}}
-- Support SHOW TRANSACTION ISOLATION LEVEL {{% gh 800 %}}
-- Support to_jsonb(sql record) {{% gh 2414 %}}
-- Add timestamp to Sink CDC field {{% gh 3216 %}}
+- Rename the `--threads` command-line option to [`--workers`](/cli/#worker-threads),
+  since it controls only the number of dataflow workers that Materialize will
+  start, not the total number of threads that Materialize may use. The short
+  form of this option, `-w`, remains unchanged.
+  **Backwards-incompatible change.**
+
+- Add the `--experimental` command-line option to enable a new [experimental
+  mode](/cli/#experimental-mode), which grants access to experimental features
+  at the risk of compromising stability and backwards compatibility. Forthcoming
+  features that require experimental mode will be marked as such in their
+  documentation.
+
+- Support [SASL PLAIN authentication for Kafka sources](/sql/create-source/avro-kafka/#connecting-to-a-kafka-broker-using-sasl-plain-authentication).
+  Notably, this allows Materialize to connect to Kafka clusters hosted by
+  Confluent Cloud.
+
+- Do not require [Kafka Avro sources](/sql/create-source/avro-kafka/) that use
+  `ENVELOPE NONE` or `ENVELOPE DEBEZIUM` to have key schemas whose fields are a
+  subset of the value schema {{% gh 3677 %}}.
+
+- Teach Kafka sinks to emit Debezium style [consistency
+  metadata](/sql/create-sink/#consistency-metadata) if the new `consistency`
+  option is enabled. The consistency metadata is emitted to a Kafka topic
+  alongside the data topic; the combination of these two topics is considered
+  the Materialize change data capture (CDC) format.
+
+- Introduce the [`AS OF`](/sql/create-sink/#as-of) and
+  [`WITH SNAPSHOT`](/sql/create-sink/#with-snapshot-or-without-snapshot) options
+  for `CREATE SINK` to provide more control over what data the sink will
+  produce.
+
+- Change the default [`TAIL` snapshot behavior](/sql/tail/#with-snapshot-or-without-snapshot)
+  from `WITHOUT SNAPSHOT` to `WITH SNAPSHOT`. **Backwards-incompatible change.**
+
+- Actively shut down [Kafka sinks](https://materialize.io/docs/sql/create-sink/#kafka-sinks)
+  that encounter an unrecoverable error, rather than attempting to produce data
+  until the sink is dropped {{% gh 3419 %}}.
+
+- Improve the performance, stability, and standards compliance of Avro encoding
+  and decoding {{% gh 3397 3557 3568 3579 3583 3584 3585 %}}.
+
+- Support [record types](/sql/types/record), which permit the representation of
+  nested data in SQL. Avro sources also gain support for decoding nested
+  records, which were previously disallowed, into this new SQL record type.
+
+- Allow dropping databases with cross-schema dependencies {{% gh 3558 %}}.
+
+- Avoid crashing if [`date_trunc('week', ...)`](/sql/functions/#time-func) is
+  called on a date that is in the first week of a month {{% gh 3651 %}}.
+
+- Ensure the built-in `mz_avro_ocf_sinks`, `mz_catalog_names`, and
+  `mz_kafka_sinks` views always reflect the latest state of the system
+  {{% gh 3682 %}}. Previously these views could contain stale data that did not
+  reflect the results of recent `CREATE` or `DROP` statements.
+
+- Introduce several new SQL statements:
+
+  - [`ALTER RENAME`](/sql/alter-rename) renames an index, sink, source, or view.
+
+  - [`SHOW CREATE INDEX`](/sql/show-create-index/) displays information about
+    an index.
+
+  - [`EXPLAIN <statement>`](/sql/explain) is shorthand for
+    `EXPLAIN OPTIMIZED PLAN FOR <statement>`.
+
+  - `SHOW TRANSACTION ISOLATION LEVEL` displays a dummy transaction isolation
+    level, `serializable`, in order to satisfy various PostgreSQL tools that
+    depend upon this statement {{% gh 800 %}}.
+
+- Adjust the semantics of several SQL expressions to match PostgreSQL's
+  semantics:
+
+  - Consider `NULL < ANY(...)` to be false and `NULL < ALL (...)` to be true
+    when the right-hand side is the empty set {{% gh 3319 %}}.
+    **Backwards-incompatible change.**
+
+  - Change the meaning of ordinal references in a `GROUP BY` clause, as in
+    `SELECT ... GROUP BY 1`, to refer to columns in the target list, rather than
+    columns in the input set of tables {{% gh 3686 %}}.
+    **Backwards-incompatible change.**
+
+  - When casting from `numeric` or `float` to `int`, round to the nearest
+    integer rather than discarding the fractional component {{% gh 3700 %}}.
+    **Backwards-incompatible change.**
+
+  - Allow expressions in `GROUP BY` to refer to output columns, not just input
+    columns, to match PostgreSQL. In the case of ambiguity, the input column
+    takes precedence {{% gh 1673 %}}.
+
+  - Permit expressions in `ORDER BY` to refer to input columns that are not
+    selected for output, as in `SELECT rel.a FROM rel ORDER BY rel.b`
+    {{% gh 3645 %}}.
 
 <span id="v0.3.1"></span>
 ## v0.3.1
 
-- Introduce the [`AS OF`](/sql/create-sink/#as-of) and
-  [`WITH SNAPSHOT`](/sql/create-sink/#with-snapshot-or-without-snapshot) options for `CREATE SINK` to provide
-  more control over what data the `SINK` will produce.
-- Update the [`SNAPSHOT`](/sql/tail/#with-snapshot-or-without-snapshot) options for `TAIL`
-  to allow more control over what data `TAIL` will produce.
+- Improve the ingestion speed of Kafka sources with multiple partitions by
+  sharding responsibility for each partition across the available worker
+  threads {{% gh 3190 %}}.
+
+- Improve JSON decoding performance when casting a `text` column to `json`, as
+  in `SELECT text_col::json` {{% gh 3195 %}}.
+
 - Simplify converting non-materialized views into materialized views with
   [`CREATE DEFAULT INDEX ON foo`](/sql/create-index). This creates the same
   [index](/overview/api-components/#indexes) on a view that would have been
-  created if you had used [`CREATE MATERIALIZED
-  VIEW`](/sql/create-materialized-view).
-- Produce runtime errors when casting from string to any other data type, rather
-  than producing `NULL` if the cast failed.
-- Add support for PostgreSQL functions `char_length`, `octet_length`, and
-  `bit_length`.
-- Improve `length` function's PostgreSQL compatibility by accepting `bytea` as
-  the first argument when getting the length of encoded bytes.
+  created if you had used [`CREATE MATERIALIZED VIEW`](/sql/create-materialized-view).
+
+- Permit control over the timestamp selection logic on a per-Kafka-source basis
+  via three new [`WITH` options](https://materialize.io/docs/sql/create-source/avro-kafka/#with-options):
+    - `timestamp_frequency_ms`
+    - `max_timestamp_batch_size`
+    - `topic_metadata_refresh_interval_ms`
+
+- Support assigning aliases for column names when referecing a relation
+  in a `SELECT` query, as in:
+
+  ```sql
+  SELECT col1_alias, col2_alias FROM rel AS rel_alias (col1_alias, col2_alias)
+  ```
+
+- Add the [`abs`](/sql/functions/#numbers-func) function for the
+  [`numeric`](/sql/types/numeric/) type.
+
+- Improve the [string function](/sql/functions/#string-func) suite:
+  - Add the trim family of functions to trim characters from the start and/or
+    end of strings. The new functions are `btrim`, `ltrim`, `rtrim`, and `trim`.
+  - Add the SQL standard length functions `char_length`, `octet_length`, and
+    `bit_length`.
+  - Improve the `length` function's PostgreSQL compatibility by accepting
+    `bytea` as the first argument, rather than `text`, when getting the length
+    of encoded bytes.
+
+- Enhance compatibility with PostgreSQL string literals:
+  - Allow the [`TYPE 'string'` syntax](/sql/functions/cast#signatures) to
+    explicitly specify the type of a string literal. This syntax is equivalent
+    to `CAST('string' AS TYPE)` and `'string'::TYPE`.
+  - Support [escape string literals](/sql/types/text/#escape) of the form
+    `E'hello\nworld'`, which permit C-style escapes for several special
+    characters.
+  - Automatically coerce string literals to the appropriate type, as required
+    by their usage in calls to functions and operators {{% gh 481 %}}.
+
+- Produce runtime errors in several new situations:
+  - When multiplication operations overflow {{% gh 3354 %}}. Previously
+    multiplication overflow would result in silent wraparound.
+  - When casting from string to any other data type {{% gh 3156 %}}. Previously
+    failed casts would return `NULL`.
+
+- Fix several misplanned queries:
+  - Ensure `CASE` statements do not trigger errors from unselected
+    branches {{% gh 3395 %}}.
+  - Prevent the optimizer from crashing on some queries involving the
+    the `date_trunc` function {{% gh 3403 %}}.
+  - Handle joins nested with non-default associativity correctly
+    {{% gh 3427 %}}.
+
+- Fix several bugs related to negative intervals:
+  - Ensure the `EXTRACT` function-like operator returns a negative result when
+    its input is negative {{% gh 2800 %}}.
+  - Do not distinguish negative and positive zero {{% gh 2812 %}}.
+
+- Expose [monitoring metrics](/monitoring/) for Kafka sinks {{% gh 3336 %}}.
 
 <span id="v0.3.0"></span>
 ## v0.3.0
 
-Read the [Release Announcement](https://materialize.io/release-materialize-0-3/) for more
-details.
-
 - Support [temporary views](/sql/create-view/#temporary-views).
 
 - Improve the reliability and performance of Kafka sources, especially when the

diff --git a/doc/user/content/sql/functions/cast.md b/doc/user/content/sql/functions/cast.md
@@ -19,6 +19,10 @@ Parameter | Type | Description
 _val_ | [Any](../../types) | The value you want to convert.
 _type_ | [Typename](../../types) | The return value's type.
 
+The following special syntax is permitted if _val_ is a string literal:
+
+{{< diagram "lit-cast.svg" >}}
+
 ### Return value
 
 `cast` returns the value with the type specified by the _type_ parameter.
@@ -51,6 +55,17 @@ Source type | Return type
 
 ## Examples
 
+```sql
+SELECT INT '4';
+```
+```nofmt
+ ?column?
+----------
+         4
+```
+
+<hr>
+
 ```sql
 SELECT CAST (CAST (100.21 AS decimal(10, 2)) AS float) AS dec_to_float;
 ```
@@ -59,6 +74,7 @@ SELECT CAST (CAST (100.21 AS decimal(10, 2)) AS float) AS dec_to_float;
 --------------
        100.21
 ```
+
 <hr/>
 
 ```sql

diff --git a/doc/user/content/sql/types/_index.md b/doc/user/content/sql/types/_index.md
@@ -22,6 +22,7 @@ Type | Aliases | Use | Size (bytes) | Syntax
 [`integer`](integer) | `int4`, `int` | Signed integer | 4 | `123`
 [`interval`](interval) | | Duration of time | 32 | `INTERVAL '1-2 3 4:5:6.7'`
 [`jsonb`](jsonb) | `json` | JSON | Variable | `'{"1":2,"3":4}'::jsonb`
+[`record`](record) | | Tuple with arbitrary contents | Variable | `ROW($expr, ...)`
 [`text`](text) | `string` | Unicode string | Variable | `'foo'`
 [`time`](time) | | Time without date | 4 | `TIME '01:23:45'`
 [`timestamp`](timestamp) | | Date and time | 8 | `TIMESTAMP '2007-02-01 15:04:05'`