Skip to content

Commit

Permalink
doc/user: update docs + release notes for 0.3.1 and 0.4.0
Browse files Browse the repository at this point in the history
This commit polishes the documentation and release notes for the 0.3.1
and 0.4.0 releases. A bit late, as both of those releases have shipped,
but better late than never.

The big new items of documentation include:

  * basic documentation for record types
  * documentation of escape string literals
  * documentation of typed string literals

I've additionally updated the release notes for 0.3.1 and 0.4.0 to match
the style guide, added many missing release notes, grouped things by
theme where possible, marked backwards incompatible changes, and
resorted several notes that were in the wrong release. These are the
standards I'd like to hold release notes too, and I'm happy to do the
legwork to make that so.
  • Loading branch information
benesch committed Jul 30, 2020
1 parent 3870c7c commit 2140443
Show file tree
Hide file tree
Showing 10 changed files with 536 additions and 36 deletions.
133 changes: 133 additions & 0 deletions doc/user/content/operations/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
title: "Monitoring and Operations"
description: "Find details about running your Materialize instances"
menu: "main"
weight: 80
---

_This page is a work in progress and will have more detail in the coming months.
If you have specific questions, feel free to [file a GitHub
issue](https://github.com/MaterializeInc/materialize/issues/new?labels=C-feature&template=feature.md)._

Materialize supports integration with monitoring tools using HTTP endpoints.

### Quick monitoring dashboard

Materialize provides a recommended Grafana dashboard and an all-inclusive Docker image
preconfigured to run the dashboard as [`materialize/dashboard`][simplemon-hub].

The only configuration required to get started with the Docker image is the
`MATERIALIZED_URL=<host>:<port>` environment variable.

As an example, if you are running `materialized` in a cloud instance at the IP address
`172.16.0.0`, you can launch the dashboard by running this command and
opening `http://localhost:3000` in your web browser:

```shell
# expose ports ______point it at materialize______
$ docker run -d -p 3000:3000 -e MATERIALIZED_URL=172.16.0.0:6875 materialize/dashboard
```

See [Observing local Materialize](#observing-local-materialize) below if you want to run
the dashboard on the same machine on which you are running Materialize.

The `materialize/dashboard` Docker image bundles Prometheus and Grafana together to make
getting insight into Materialize's performance easy. It is not particularly
configurable, and in particular is not designed to handle large metric volumes or long
uptimes. It will start truncating metrics history after about 1GB of storage, which
corresponds to about 3 days of data with the very fine-grained metrics collected inside
the container.

So, while the dashboard is provided as a convenience and should not be relied on for
production monitoring, if you would like to persist metrics across restarts of the
container you can mount a Docker volume onto `/prometheus`:

```console
$ docker run -d \
-v /tmp/prom-data:/prometheus -u "$(id -u):$(id -g)" \
-p 3000:3000 -e MATERIALIZED_URL=172.16.0.0:6875 \
materialize/dashboard
```

### Health check

Materialize supports a minimal health check endpoint at `<materialized
host>/status`.

### Prometheus

Materialize exposes [Prometheus](https://prometheus.io/) metrics at the default
path, `<materialized host>/metrics`.

Materialize broadly publishes the following types of data there:

- Materialize-specific data with a `mz_*` prefix. For example,
`rate(mz_responses_sent_total[10s])` will show you the number of responses
averaged over 10 second windows.
- Standard process metrics with a `process_*` prefix. For exmple, `process_cpu`.

### Grafana

Materialize provides a [recommended dashboard][dashboard-json] that you can [import into
Grafana][graf-import]. It relies on you having configured Prometheus to scrape
materialized.

### Datadog

Materialize metrics can be sent to Datadog via the
[OpenMetrics agent check](https://www.datadoghq.com/blog/monitor-prometheus-metrics/).
(Requires Datadog Agent 6 and above). Simply configure _"prometheus_url"_ (ie
`http://<materialized host>/metrics`), namespace, and metrics (ie `mz*`) in
_"openmetrics.d/conf.yaml"_.

## Other Setups

Even if you aren't running materialized at web scale, you can still use our web-scale
tools to observe it.

### Observing local Materialize

#### Inside Docker Compose or Kubernetes

Local schedulers like Docker Compose (which we use for our demos) or Kubernetes will
typically expose running containers to each other using their service name as a public
DNS hostname, but _only_ within the network that they are running in.

The easiest way to use the dashboard inside a scheduler is to tell the scheduler to run
it. [Here is an example][dc-example] of configuring Docker Compose to run the dashboard.

#### On MacOS, with materialized running outside of Docker

The problem with this is that `localhost` inside of Docker cannot, on Docker for Mac,
refer to the mac network. So instead you must use `host.docker.internal`:

```
docker run -p 3000:3000 -e MATERIALIZED_URL=host.docker.internal:6875 materialize/dashboard
```

#### On Linux, with Materialize running outside of Docker

Docker containers use a different network than their host by default, but that is easy to
get around using the `--network` flag. Using the host network means that ports will be
allocated from the host, so the `-p` flag is no longer necessary:

```
docker run --network host -e MATERIALIZED_URL=localhost:6875 materialize/dashboard
```

[simplemon-hub]: https://hub.docker.com/repository/docker/materialize/dashboard
[dashboard-json]: https://github.com/MaterializeInc/materialize/blob/main/misc/monitoring/dashboard/conf/grafana/dashboards/overview.json
[graf-import]: https://grafana.com/docs/grafana/latest/reference/export_import/#importing-a-dashboard
[dc-example]: https://github.com/MaterializeInc/materialize/blob/d793b112758c840c1240eefdd56ca6f7e4f484cf/demo/billing/mzcompose.yml#L60-L70

## Memory

Materialize stores the majority of its state in-memory, and works best when the streamed data
can be reduced in some way. For example, if you know that only a subset of your rows and columns
are relevant for your queries, it helps to avoid materializing sources or views until you've
expressed this to the system (we can avoid stashing that data, which can in some cases dramatically
reduce the memory footprint).

To minimize the chances that Materialize runs out of memory in a production environment,
we recommend you make additional memory available to Materialize via a SSD-backed
swap file or swap partition.
193 changes: 157 additions & 36 deletions doc/user/content/release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,57 +49,178 @@ Wrap your release notes at the 80 character mark.
<span id="v0.4.1"></span>
## v0.4.1 (Unreleased)

- Make casting from `numeric` and `float` to `int` consistent with PostgreSQL by
rounding before the conversion.
No release notes yet.

<span id="v0.4.0"></span>
## v0.4.0

- Support for SASL PLAIN Authentication to support Confluent Cloud {{% gh 3418 %}}
- Update Change Data Capture format to allow Kafka sinks to optionally emit Debezium
style [consistency metadata](/sql/create-sink/#consistency-metadata).
- Introduce the ability to [rename indexes, sinks, sources, and
views](/sql/alter-rename).
- Rename the `-w`/`--threads` command line argument to `-w`/`--workers`, since it
reflects timely workers and does not limit the number of threads that materialized may
start.
- Fix a bug that prevented dropping databases with cross-schema dependencies.
{{% gh 3558 %}}
- Expose [Prometheus metrics for sinks](https://materialize.io/docs/monitoring/).
- Allow column names in SELECT clause to be used by GROUP BY {{% gh 1673 %}}
- Make decimal / float to int casts behave similar to Postgres {{% gh 3700 %}}
- Fix a bug that prevented ordering by columns that are not in the select clause {{% gh
696 %}}
- Support SHOW TRANSACTION ISOLATION LEVEL {{% gh 800 %}}
- Support to_jsonb(sql record) {{% gh 2414 %}}
- Add timestamp to Sink CDC field {{% gh 3216 %}}
- Rename the `--threads` command-line option to [`--workers`](/cli/#worker-threads),
since it controls only the number of dataflow workers that Materialize will
start, not the total number of threads that Materialize may use. The short
form of this option, `-w`, remains unchanged.
**Backwards-incompatible change.**

- Add the `--experimental` command-line option to enable a new [experimental
mode](/cli/#experimental-mode), which grants access to experimental features
at the risk of compromising stability and backwards compatibility. Forthcoming
features that require experimental mode will be marked as such in their
documentation.

- Support [SASL PLAIN authentication for Kafka sources](/sql/create-source/avro-kafka/#connecting-to-a-kafka-broker-using-sasl-plain-authentication).
Notably, this allows Materialize to connect to Kafka clusters hosted by
Confluent Cloud.

- Do not require [Kafka Avro sources](/sql/create-source/avro-kafka/) that use
`ENVELOPE NONE` or `ENVELOPE DEBEZIUM` to have key schemas whose fields are a
subset of the value schema {{% gh 3677 %}}.

- Teach Kafka sinks to emit Debezium style [consistency
metadata](/sql/create-sink/#consistency-metadata) if the new `consistency`
option is enabled. The consistency metadata is emitted to a Kafka topic
alongside the data topic; the combination of these two topics is considered
the Materialize change data capture (CDC) format.

- Introduce the [`AS OF`](/sql/create-sink/#as-of) and
[`WITH SNAPSHOT`](/sql/create-sink/#with-snapshot-or-without-snapshot) options
for `CREATE SINK` to provide more control over what data the sink will
produce.

- Change the default [`TAIL` snapshot behavior](/sql/tail/#with-snapshot-or-without-snapshot)
from `WITHOUT SNAPSHOT` to `WITH SNAPSHOT`. **Backwards-incompatible change.**

- Actively shut down [Kafka sinks](https://materialize.io/docs/sql/create-sink/#kafka-sinks)
that encounter an unrecoverable error, rather than attempting to produce data
until the sink is dropped {{% gh 3419 %}}.

- Improve the performance, stability, and standards compliance of Avro encoding
and decoding {{% gh 3397 3557 3568 3579 3583 3584 3585 %}}.

- Support [record types](/sql/types/record), which permit the representation of
nested data in SQL. Avro sources also gain support for decoding nested
records, which were previously disallowed, into this new SQL record type.

- Allow dropping databases with cross-schema dependencies {{% gh 3558 %}}.

- Avoid crashing if [`date_trunc('week', ...)`](/sql/functions/#time-func) is
called on a date that is in the first week of a month {{% gh 3651 %}}.

- Ensure the built-in `mz_avro_ocf_sinks`, `mz_catalog_names`, and
`mz_kafka_sinks` views always reflect the latest state of the system
{{% gh 3682 %}}. Previously these views could contain stale data that did not
reflect the results of recent `CREATE` or `DROP` statements.

- Introduce several new SQL statements:

- [`ALTER RENAME`](/sql/alter-rename) renames an index, sink, source, or view.

- [`SHOW CREATE INDEX`](/sql/show-create-index/) displays information about
an index.

- [`EXPLAIN <statement>`](/sql/explain) is shorthand for
`EXPLAIN OPTIMIZED PLAN FOR <statement>`.

- `SHOW TRANSACTION ISOLATION LEVEL` displays a dummy transaction isolation
level, `serializable`, in order to satisfy various PostgreSQL tools that
depend upon this statement {{% gh 800 %}}.

- Adjust the semantics of several SQL expressions to match PostgreSQL's
semantics:

- Consider `NULL < ANY(...)` to be false and `NULL < ALL (...)` to be true
when the right-hand side is the empty set {{% gh 3319 %}}.
**Backwards-incompatible change.**

- Change the meaning of ordinal references in a `GROUP BY` clause, as in
`SELECT ... GROUP BY 1`, to refer to columns in the target list, rather than
columns in the input set of tables {{% gh 3686 %}}.
**Backwards-incompatible change.**

- When casting from `numeric` or `float` to `int`, round to the nearest
integer rather than discarding the fractional component {{% gh 3700 %}}.
**Backwards-incompatible change.**

- Allow expressions in `GROUP BY` to refer to output columns, not just input
columns, to match PostgreSQL. In the case of ambiguity, the input column
takes precedence {{% gh 1673 %}}.

- Permit expressions in `ORDER BY` to refer to input columns that are not
selected for output, as in `SELECT rel.a FROM rel ORDER BY rel.b`
{{% gh 3645 %}}.

<span id="v0.3.1"></span>
## v0.3.1

- Introduce the [`AS OF`](/sql/create-sink/#as-of) and
[`WITH SNAPSHOT`](/sql/create-sink/#with-snapshot-or-without-snapshot) options for `CREATE SINK` to provide
more control over what data the `SINK` will produce.
- Update the [`SNAPSHOT`](/sql/tail/#with-snapshot-or-without-snapshot) options for `TAIL`
to allow more control over what data `TAIL` will produce.
- Improve the ingestion speed of Kafka sources with multiple partitions by
sharding responsibility for each partition across the available worker
threads {{% gh 3190 %}}.

- Improve JSON decoding performance when casting a `text` column to `json`, as
in `SELECT text_col::json` {{% gh 3195 %}}.

- Simplify converting non-materialized views into materialized views with
[`CREATE DEFAULT INDEX ON foo`](/sql/create-index). This creates the same
[index](/overview/api-components/#indexes) on a view that would have been
created if you had used [`CREATE MATERIALIZED
VIEW`](/sql/create-materialized-view).
- Produce runtime errors when casting from string to any other data type, rather
than producing `NULL` if the cast failed.
- Add support for PostgreSQL functions `char_length`, `octet_length`, and
`bit_length`.
- Improve `length` function's PostgreSQL compatibility by accepting `bytea` as
the first argument when getting the length of encoded bytes.
created if you had used [`CREATE MATERIALIZED VIEW`](/sql/create-materialized-view).

- Permit control over the timestamp selection logic on a per-Kafka-source basis
via three new [`WITH` options](https://materialize.io/docs/sql/create-source/avro-kafka/#with-options):
- `timestamp_frequency_ms`
- `max_timestamp_batch_size`
- `topic_metadata_refresh_interval_ms`

- Support assigning aliases for column names when referecing a relation
in a `SELECT` query, as in:

```sql
SELECT col1_alias, col2_alias FROM rel AS rel_alias (col1_alias, col2_alias)
```

- Add the [`abs`](/sql/functions/#numbers-func) function for the
[`numeric`](/sql/types/numeric/) type.

- Improve the [string function](/sql/functions/#string-func) suite:
- Add the trim family of functions to trim characters from the start and/or
end of strings. The new functions are `btrim`, `ltrim`, `rtrim`, and `trim`.
- Add the SQL standard length functions `char_length`, `octet_length`, and
`bit_length`.
- Improve the `length` function's PostgreSQL compatibility by accepting
`bytea` as the first argument, rather than `text`, when getting the length
of encoded bytes.

- Enhance compatibility with PostgreSQL string literals:
- Allow the [`TYPE 'string'` syntax](/sql/functions/cast#signatures) to
explicitly specify the type of a string literal. This syntax is equivalent
to `CAST('string' AS TYPE)` and `'string'::TYPE`.
- Support [escape string literals](/sql/types/text/#escape) of the form
`E'hello\nworld'`, which permit C-style escapes for several special
characters.
- Automatically coerce string literals to the appropriate type, as required
by their usage in calls to functions and operators {{% gh 481 %}}.

- Produce runtime errors in several new situations:
- When multiplication operations overflow {{% gh 3354 %}}. Previously
multiplication overflow would result in silent wraparound.
- When casting from string to any other data type {{% gh 3156 %}}. Previously
failed casts would return `NULL`.

- Fix several misplanned queries:
- Ensure `CASE` statements do not trigger errors from unselected
branches {{% gh 3395 %}}.
- Prevent the optimizer from crashing on some queries involving the
the `date_trunc` function {{% gh 3403 %}}.
- Handle joins nested with non-default associativity correctly
{{% gh 3427 %}}.

- Fix several bugs related to negative intervals:
- Ensure the `EXTRACT` function-like operator returns a negative result when
its input is negative {{% gh 2800 %}}.
- Do not distinguish negative and positive zero {{% gh 2812 %}}.

- Expose [monitoring metrics](/monitoring/) for Kafka sinks {{% gh 3336 %}}.

<span id="v0.3.0"></span>
## v0.3.0

Read the [Release Announcement](https://materialize.io/release-materialize-0-3/) for more
details.

- Support [temporary views](/sql/create-view/#temporary-views).

- Improve the reliability and performance of Kafka sources, especially when the
Expand Down
16 changes: 16 additions & 0 deletions doc/user/content/sql/functions/cast.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ Parameter | Type | Description
_val_ | [Any](../../types) | The value you want to convert.
_type_ | [Typename](../../types) | The return value's type.

The following special syntax is permitted if _val_ is a string literal:

{{< diagram "lit-cast.svg" >}}

### Return value

`cast` returns the value with the type specified by the _type_ parameter.
Expand Down Expand Up @@ -51,6 +55,17 @@ Source type | Return type

## Examples

```sql
SELECT INT '4';
```
```nofmt
?column?
----------
4
```

<hr>

```sql
SELECT CAST (CAST (100.21 AS decimal(10, 2)) AS float) AS dec_to_float;
```
Expand All @@ -59,6 +74,7 @@ SELECT CAST (CAST (100.21 AS decimal(10, 2)) AS float) AS dec_to_float;
--------------
100.21
```

<hr/>

```sql
Expand Down
1 change: 1 addition & 0 deletions doc/user/content/sql/types/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Type | Aliases | Use | Size (bytes) | Syntax
[`integer`](integer) | `int4`, `int` | Signed integer | 4 | `123`
[`interval`](interval) | | Duration of time | 32 | `INTERVAL '1-2 3 4:5:6.7'`
[`jsonb`](jsonb) | `json` | JSON | Variable | `'{"1":2,"3":4}'::jsonb`
[`record`](record) | | Tuple with arbitrary contents | Variable | `ROW($expr, ...)`
[`text`](text) | `string` | Unicode string | Variable | `'foo'`
[`time`](time) | | Time without date | 4 | `TIME '01:23:45'`
[`timestamp`](timestamp) | | Date and time | 8 | `TIMESTAMP '2007-02-01 15:04:05'`
Expand Down
Loading

0 comments on commit 2140443

Please sign in to comment.