GBNF (GGML BNF) is a format for defining formal grammars to constrain model outputs in llama.cpp
. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in examples/main
and examples/server
.
Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.
In GBNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence...
.
Before going deeper, let's look at some of the features demonstrated in grammars/chess.gbnf
, a small chess notation grammar:
# `root` specifies the pattern for the overall output
root ::= (
# it must start with the characters "1. " followed by a sequence
# of characters that match the `move` rule, followed by a space, followed
# by another move, and then a newline
"1. " move " " move "\n"
# it's followed by one or more subsequent moves, numbered with one or two digits
([1-9] [0-9]? ". " move " " move "\n")+
)
# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?
pawn ::= ...
nonpawn ::= ...
castle ::= ...
Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move
, castle
, or check-mate
.
Terminals are actual characters (code points). They can be specified as a sequence like "1"
or "O-O"
or as ranges like [1-9]
or [NBKQR]
.
Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ]
, or with escapes: 8-bit (\xXX
), 16-bit (\uXXXX
) or 32-bit (\UXXXXXXXX
).
Character ranges can be negated with ^
:
single-line ::= [^\n]+ "\n"`
The order of symbols in a sequence matters. For example, in "1. " move " " move "\n"
, the "1. "
must come before the first move
, etc.
Alternatives, denoted by |
, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle
, move
can be a pawn
move, a nonpawn
move, or a castle
.
Parentheses ()
can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.
*
after a symbol or sequence means that it can be repeated zero or more times (equivalent to{0,}
).+
denotes that the symbol or sequence should appear one or more times (equivalent to{1,}
).?
makes the preceding symbol or sequence optional (equivalent to{0,1}
).{m}
repeats the precedent symbol or sequence exactlym
times{m,}
repeats the precedent symbol or sequence at leastm
times{m,n}
repeats the precedent symbol or sequence at betweenm
andn
times (included){0,n}
repeats the precedent symbol or sequence at mostn
times (included)
Comments can be specified with #
:
# defines optional whitespace
ws ::= [ \t\n]+
Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker |
will continue the current rule, even outside of parentheses.
In a full grammar, the root
rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.
# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"
This guide provides a brief overview. Check out the GBNF files in this directory (grammars/
) for examples of full grammars. You can try them out with:
./llama-cli -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'
llama.cpp
can also convert JSON schemas to grammars either ahead of time or at each request, see below.
Grammars currently have performance gotchas (see ggerganov/llama.cpp#4218).
A common pattern is to allow repetitions of a pattern x
up to N times.
While semantically correct, the syntax x? x? x?.... x?
(with N repetitions) may result in extremely slow sampling. Instead, you can write x{0,N}
(or (x (x (x ... (x)?...)?)?)?
w/ N-deep nesting in earlier llama.cpp versions).
You can use GBNF grammars:
- In llama-server's completion endpoints, passed as the
grammar
body field - In llama-cli, passed as the
--grammar
&--grammar-file
flags - With llama-gbnf-validator tool, to test them against strings.
llama.cpp
supports converting a subset of https://json-schema.org/ to GBNF grammars:
- In llama-server:
- For any completion endpoints, passed as the
json_schema
body field - For the
/chat/completions
endpoint, passed inside theresult_format
body field (e.g.{"type", "json_object", "schema": {"items": {}}}
)
- For any completion endpoints, passed as the
- In llama-cli, passed as the
--json
/-j
flag - To convert to a grammar ahead of time:
- in CLI, with examples/json_schema_to_grammar.py
- in JavaScript with json-schema-to-grammar.mjs (this is used by the server's Web UI)
Take a look at tests to see which features are likely supported (you'll also find usage examples in ggerganov/llama.cpp#5978, ggerganov/llama.cpp#6659 & ggerganov/llama.cpp#6555).
Here is also a non-exhaustive list of unsupported features:
additionalProperties
: to be fixed in ggerganov/llama.cpp#7840minimum
,exclusiveMinimum
,maximum
,exclusiveMaximum
integer
constraints to be implemented in ggerganov/llama.cpp#7797
- Remote
$ref
s in the C++ version (Python & JavaScript versions fetch https refs) - Mixing
properties
w/anyOf
/oneOf
in the same type (ggerganov/llama.cpp#7703) string
formatsuri
,email
contains
/minContains
uniqueItems
$anchor
(cf. dereferencing)not
- Conditionals
if
/then
/else
/dependentSchemas
patternProperties