lexy
is a parser combinator library for C++17 and onwards.
It allows you to write a parser by specifying it in a convenient C++ DSL,
which gives you all the flexibility and control of a handwritten parser without all the manual work.
Documentation: lexy.foonathan.net
#include <lexy/callback.hpp>
#include <lexy/dsl.hpp>
namespace dsl = lexy::dsl;
// Parse an IPv4 address into a `std::uint32_t`.
struct ipv4_address
{
// What is being matched.
static constexpr auto rule = []{
// Match a sequence of (decimal) digits and convert it into a std::uint8_t.
auto octet = dsl::integer<std::uint8_t>(dsl::digits<>);
// Match four of them separated by periods.
return dsl::times<4>(octet, dsl::sep(dsl::period)) + dsl::eof;
}();
// How the matched output is being stored.
static constexpr auto value
= lexy::fold<std::uint32_t>(0u, [](std::uint32_t result, std::uint8_t octet) {
return (result << 8) | octet;
});
};
See examples/ for more examples, such as a fully conforming JSON parser, a (subset of) XML parser, or an interactive REPL for a bash-like language, among others; play with the parsing on the online playground; or jump directly to the tutorial to learn how to write your own grammars.
- Describe the parser, not some abstract grammar
-
Unlike parser generators that use some table driven magic for parsing, `lexy’s grammar is just syntax sugar for a hand-written recursive descent parser. The parsing algorithm does exactly what you’ve instructed it to do. No more ambiguities or weird shift/reduce errors!
- A pure C++ DSL
-
No need to use an external grammar file, embed the DSL directly in your C++ using operator overloading and functions.
- No implicit backtracking or lookahead
-
It will only backtrack when you say it should, and only lookahead when and how far you want it. Don’t worry about rules that have side-effects, they won’t be executed unnecessarily thanks to the user-specified lookahead conditions.
- Bring your own data structures
-
The input is parsed into the data structures you’ve provided. It will not do heap allocations to store output unless you’ve instructed it to do so. You can even evaluate the input on the fly, without storing anything.
- Good error reporting
-
On a parse error, it will invoke a user-defined callback with information about what went wrong and during which production. Custom error messages can be injected using the special
dsl::error
,dsl::require
anddsl::prevent
error. Write parse rules that detect common mistakes and issue appropriate diagnostics! - Error recovery (experimental)
-
Log an error, recover, and continue parsing!
- Unicode support
-
You can parse UTF-8, UTF-16, or UTF-32.
lexy
takes care of code point encoding and decoding as necessary, as well as endianness and byte-order marks. Want to match a string literal containing arbitrary Unicode code points or\u21D4
and store the result in astd::string
? You can do so out of the box. - Keyword and identifier parsing
-
Reserve a set of keywords that won’t be matched as regular identifiers. Try it online.
- Fully
constexpr
parsing -
You want to parse a string literal at compile-time? You can do so.
- Minimal standard library dependencies
-
The core parsing library only depends on the required headers such as
<type_traits>
or<cstddef>
. Some input classes required<cstdio>
. - Header-only core library
-
By necessity, not by choice — it’s
constexpr
after all.
The following features are in various stages of development and will be added before the 1.0.0 release.
- Debug facility
-
Figure out why the grammar isn’t working the way you want it to.
- Operator parsing
-
Parse operators with different precedences using Pratt parsing.
Note
|
lexy is under active development and especially currently undocumented features (such as the internal rule interface) are subject to change.
All breaking changes with migration strategies are documented in the change log and big breaks are announced in advance as an issue.
|
- Why should I use
lexy
over XYZ? -
lexy
is closest to other PEG parsers. However, they usually do more implicit backtracking, which can hurt performance and you need to be very careful with rules that have side-effects. This is not the case forlexy
, where backtracking is controlled using branch conditions.lexy
also gives you a lot of control over error reporting and supports error recovery.- Boost.Spirit
-
The main difference: it is not a Boost library. Otherwise, it is just a different implementation with a different flavor. Use
lexy
if you likelexy
more. - PEGTL
-
PEGTL is very similar and was a big inspiration. The biggest difference is that
lexy
uses an operator based DSL instead of inheriting from templated classes as PEGTL does; depending on your preference this can be an advantage or disadvantage. - Handwritten Parsers
-
Writing a handwritten parser is more manual work and error prone.
lexy
automates that away without having to sacrifice control. You can use it to quickly prototype a parser and then slowly replace more and more with a handwritten parser over time.
- How bad are the compilation times?
-
They’re not as bad as you might expect (in debug mode, that is).
Compiling the example JSON parser with any of the lexy specific things removed, i.e. just the datastructure built using
std::variant
andstd::map
, takes about one second on my machine. The entire parser takes about two seconds if you disable force inline on the parse productions. With force inline, it takes about five seconds.Compile time benchmarks and optimizations are planned. Keep in mind, that you can fully isolate
lexy
in a single translation unit that only needs to be touched when you change the parser. - How bad are the C++ error messages if you mess something up?
-
They’re certainly worse than the error message
lexy
gives you. The big problem here is that the first line gives you the error, followed by dozens of template instantiations, which end at yourlexy::parse
call. Besides providing an external tool to filter those error messages, there is nothing I can do about that. - How fast is it?
-
Benchmarks are available in the
benchmarks/
directory. A sample result of the JSON validator benchmark which compares the example JSON parser with various other implementations is available here. - Why is it called lexy?
-
I previously had a tokenizer library called
foonathan/lex
. I’ve tried adding a parser to it, but found that the line between pure tokenization and parsing has become increasingly blurred.lexy
is a re-imagination on of the parser I’ve added tofoonathan/lex
, and I’ve simply kept a similar name.
The library uses CMake as its build system.
Simply put it somewhere and use add_subdirectory()
to make the following targets available
foonathan::lexy::core
-
This target is required. It is an
INTERFACE
target that sets the required include path and C++ standard flags. foonathan::lexy::file
-
Link to this library if you want to use the (not header only)
lexy::read_file()
functionality. foonathan::lexy
-
Umbrella target that links to all other targets.
Configuration is supported by providing a lexy_user_config.hpp
somewhere in the include search path,
or setting the LEXY_USER_CONFIG_HEADER
CMake option to a header path.
This header can then override many of the detections in lexy/_detail/config.hpp
.
Refer to that header for details.
The library is continuously tested on GCC 7 or higher, clang 6 or higher, as well as MSVC and clang-cl. It requires C++17 support, but works best with C++20.