Skip to content

Commit

Permalink
Merge pull request #4574 from szarnyasg/nits-20240114b
Browse files Browse the repository at this point in the history
Nits 20240114b
  • Loading branch information
szarnyasg authored Jan 14, 2025
2 parents 414d905 + c89caa2 commit 866184b
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 10 deletions.
2 changes: 1 addition & 1 deletion _posts/2024-11-22-runtime-extensible-parsers.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,4 +223,4 @@ We plan to switch DuckDB's parser, which started as a fork of the Postgres YACC

## Acknowledgments

We would like to thank [**Torsten Grust**](https://db.cs.uni-tuebingen.de/team/members/torsten-grust/), [**Gábor Szárnyas**](https://szarnyasg.github.io) and [**Daniël ten Wolde**](https://www.cwi.nl/en/people/daniel-ten-wolde/) for their valuable suggestions. We would also like to thank [**Carlo Piovesan**](https://github.com/carlopi) for his translation of the Postgres YACC grammar to PEG.
We would like to thank [**Torsten Grust**](https://db.cs.uni-tuebingen.de/team/members/torsten-grust/), [**Gábor Szárnyas**](https://szarnyasg.org/) and [**Daniël ten Wolde**](https://www.cwi.nl/en/people/daniel-ten-wolde/) for their valuable suggestions. We would also like to thank [**Carlo Piovesan**](https://github.com/carlopi) for his translation of the Postgres YACC grammar to PEG.
13 changes: 4 additions & 9 deletions docs/sql/expressions/collations.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Collations provide rules for how text should be sorted or compared in the execut

By default, the `BINARY` collation is used. That means that strings are ordered and compared based only on their binary contents. This makes sense for standard ASCII characters (i.e., the letters A-Z and numbers 0-9), but generally does not make much sense for special unicode characters. It is, however, by far the fastest method of performing ordering and comparisons. Hence it is recommended to stick with the `BINARY` collation unless required otherwise.

> The `BINARY` collation is also available under the aliases `C` and `POSIX`.
> Warning Collation support in DuckDB has [some known limitations](https://github.com/duckdb/duckdb/issues?q=is%3Aissue+is%3Aopen+collation+) and has [several planned improvements](https://github.com/duckdb/duckdb/issues/604).
## Using Collations
Expand Down Expand Up @@ -147,22 +149,15 @@ WHERE names.name COLLATE NOACCENT.NOCASE = other_names.name COLLATE NOACCENT.NOC

The collations we have seen so far are not region-dependent, and do not follow any specific regional rules. If you wish to follow the rules of a specific region or language, you will need to use one of the ICU collations. For that, you need to [load the ICU extension]({% link docs/extensions/icu.md %}#installing-and-loading).

If you are using the C++ API, you may find the extension in the `extension/icu` folder of the DuckDB project. Using the C++ API, the extension can be loaded as follows:

```cpp
DuckDB db;
db.LoadExtension<ICUExtension>();
```

Loading this extension will add a number of language and region specific collations to your database. These can be queried using `PRAGMA collations` command, or by querying the `pragma_collations` function.

```sql
PRAGMA collations;
SELECT * FROM pragma_collations();
SELECT list(collname) FROM pragma_collations();
```

```text
[af, am, ar, as, az, be, bg, bn, bo, bs, bs, ca, ceb, chr, cs, cy, da, de, de_AT, dsb, dz, ee, el, en, en_US, en_US, eo, es, et, fa, fa_AF, fi, fil, fo, fr, fr_CA, ga, gl, gu, ha, haw, he, he_IL, hi, hr, hsb, hu, hy, id, id_ID, ig, is, it, ja, ka, kk, kl, km, kn, ko, kok, ku, ky, lb, lkt, ln, lo, lt, lv, mk, ml, mn, mr, ms, mt, my, nb, nb_NO, ne, nl, nn, om, or, pa, pa, pa_IN, pl, ps, pt, ro, ru, se, si, sk, sl, smn, sq, sr, sr, sr_BA, sr_ME, sr_RS, sr, sr_BA, sr_RS, sv, sw, ta, te, th, tk, to, tr, ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, zh, zh, zh_CN, zh_SG, zh, zh_HK, zh_MO, zh_TW, zu]
[af, am, ar, ar_sa, as, az, be, bg, bn, bo, br, bs, ca, ceb, chr, cs, cy, da, de, de_at, dsb, dz, ee, el, en, en_us, eo, es, et, fa, fa_af, ff, fi, fil, fo, fr, fr_ca, fy, ga, gl, gu, ha, haw, he, he_il, hi, hr, hsb, hu, hy, icu_noaccent, id, id_id, ig, is, it, ja, ka, kk, kl, km, kn, ko, kok, ku, ky, lb, lkt, ln, lo, lt, lv, mk, ml, mn, mr, ms, mt, my, nb, nb_no, ne, nfc, nl, nn, noaccent, nocase, om, or, pa, pa_in, pl, ps, pt, ro, ru, sa, se, si, sk, sl, smn, sq, sr, sr_ba, sr_me, sr_rs, sv, sw, ta, te, th, tk, to, tr, ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, yue, yue_cn, zh, zh_cn, zh_hk, zh_mo, zh_sg, zh_tw, zu]
```

These collations can then be used as the other collations would be used before. They can also be combined with the `NOCASE` collation. For example, to use the German collation rules you could use the following code snippet:
Expand Down

0 comments on commit 866184b

Please sign in to comment.