Releases · pyparsing/pyparsing

Updated generated railroad diagrams to make non-terminal elements links to their related sub-diagrams. This greatly improves navigation of the diagram, especially for large, complex parsers.
Simplified railroad diagrams emitted for parsers using infix_notation, by hiding lookahead terms. Renamed internally generated expressions for clarity, and improved diagramming.
Improved performance of cpp_style_comment, c_style_comment, common.fnumber and common.ieee_float Regex expressions. PRs submitted by Gabriel Gerlero,
nice work, thanks!
Add missing type annotations to match_only_at_col, replace_with, remove_quotes, with_attribute, and with_class. Issue #585 reported by rafrafrek.
Added generated diagrams for many of the examples.
Replaced old examples/0README.html file with examples/README.md file.

Version 3.2.0 - October, 2024

Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names imported from the typing module (e.g., list[str] vs List[str]).
- Reworked portions of the packrat cache to leverage insertion-preserving ordering in dicts (including removal of uses of OrderedDict).
- Changed pdb.set_trace() call in ParserElement.set_break() to breakpoint().
- Converted typing.NamedTuple to dataclasses.dataclass in railroad diagramming code.
- Added from __future__ import annotations to clean up some type annotations. (with assistance from ISyncWithFoo, issue #535, thanks for the help!)
POSSIBLE BREAKING CHANGES

The following bugfixes may result in subtle changes in the results returned or exceptions raised by pyparsing.
- Fixed code in ParseElementEnhance subclasses that replaced detailed exception messages raised in contained expressions with a less-specific and less-informative generic exception message and location.
  
  If your code has conditional logic based on the message content in raised ParseExceptions, this bugfix may require changes in your code.
- Fixed bug in transform_string() where whitespace in the input string was not properly preserved in the output string.
  
  If your code uses transform_string, this bugfix may require changes in your code.
- Fixed bug where an IndexError raised in a parse action was incorrectly handled as an IndexError raised as part of the ParserElement parsing methods, and reraised as a ParseException. Now an IndexError that raises inside a parse action will properly propagate out as an IndexError. (Issue #573, reported by August Karlstedt, thanks!)
  
  If your code raises IndexErrors in parse actions, this bugfix may require changes in your code.
FIXES AND NEW FEATURES
- Added type annotations to remainder of pyparsing package, and added mypy run to tox.ini, so that type annotations are now run as part of pyparsing's CI. Addresses Issue #373, raised by Iwan Aucamp, thanks!
- Exception message format can now be customized, by overriding ParseBaseException.format_message:
```
def custom_exception_message(exc) -> str:
    found_phrase = f", found {exc.found}" if exc.found else ""
    return f"{exc.lineno}:{exc.column} {exc.msg}{found_phrase}"

ParseBaseException.formatted_message = custom_exception_message
```
  (PR #571 submitted by Odysseyas Krystalakos, nice work!)
- run_tests now detects if an exception is raised in a parse action, and will report it with an enhanced error message, with the exception type, string, and parse action name.
- QuotedString now handles translation of escaped integer, hex, octal, and Unicode sequences to their corresponding characters.
- Fixed the displayed output of Regex terms to deduplicate repeated backslashes, for easier reading in debugging, printing, and railroad diagrams.
- Fixed (or at least reduced) elusive bug when generating railroad diagrams, where some diagram elements were just empty blocks. Fix submitted by RoDuth, thanks a ton!
- Fixed railroad diagrams that get generated with a parser containing a Regex element defined using a verbose pattern - the pattern gets flattened and comments removed before creating the corresponding diagram element.
- Defined a more performant regular expression used internally by common_html_entity.
- Regex instances can now be created using a callable that takes no arguments and just returns a string or a compiled regular expression, so that creating complex regular expression patterns can be deferred until they are actually used for the first time in the parser.
- Added optional flatten Boolean argument to ParseResults.as_list(), to return the parsed values in a flattened list.
- Added indent and base_1 arguments to pyparsing.testing.with_line_numbers. When using with_line_numbers inside a parse action, set base_1=False, since the reported loc value is 0-based. indent can be a leading string (typically of spaces or tabs) to indent the numbered string passed to with_line_numbers. Added while working on #557, reported by Bernd Wechner.
NEW/ENHANCED EXAMPLES
- Added query syntax to mongodb_query_expression.py with:
  - better support for array fields ("contains", "contains all", "contains any", and "contains none")
  - "like" and "not like" operators to support SQL "%" wildcard matching and "=~" operator to support regex matching
  - text search using "search for"
  - dates and datetimes as query values
  - a[0] style array referencing
- Added lox_parser.py example, a parser for the Lox language used as a tutorial in Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/). With helpful corrections from RoDuth.
- Added complex_chemical_formulas.py example, to add parsing capability for formulas such as "3(C₆H₅OH)₂".
- Updated tag_emitter.py to use new Tag class, introduced in pyparsing 3.1.3.

Changes since 3.2.0b3:

Fixed handling of IndexError raised in a parse action.
QuotedString parser now handles \xnn, \ooo, and \unnnn characters when convert_whitespace_escapes is True.
Reformatted CHANGES file for final release.

All changes in 3.2.0:

Version 3.2.0 - October, 2024

Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names imported from the typing module (e.g., list[str] vs List[str]).
- Reworked portions of the packrat cache to leverage insertion-preserving ordering in dicts (including removal of uses of OrderedDict).
- Changed pdb.set_trace() call in ParserElement.set_break() to breakpoint().
- Converted typing.NamedTuple to dataclasses.dataclass in railroad diagramming code.
- Added from __future__ import annotations to clean up some type annotations. (with assistance from ISyncWithFoo, issue #535, thanks for the help!)
POSSIBLE BREAKING CHANGES

The following bugfixes may result in subtle changes in the results returned or exceptions raised by pyparsing.
- Fixed code in ParseElementEnhance subclasses that replaced detailed exception messages raised in contained expressions with a less-specific and less-informative generic exception message and location.
  
  If your code has conditional logic based on the message content in raised ParseExceptions, this bugfix may require changes in your code.
- Fixed bug in transform_string() where whitespace in the input string was not properly preserved in the output string.
  
  If your code uses transform_string, this bugfix may require changes in your code.
- Fixed bug where an IndexError raised in a parse action was incorrectly handled as an IndexError raised as part of the ParserElement parsing methods, and reraised as a ParseException. Now an IndexError that raises inside a parse action will properly propagate out as an IndexError. (Issue #573, reported by August Karlstedt, thanks!)
  
  If your code raises IndexErrors in parse actions, this bugfix may require changes in your code.
FIXES AND NEW FEATURES
- Added type annotations to remainder of pyparsing package, and added mypy run to tox.ini, so that type annotations are now run as part of pyparsing's CI. Addresses Issue #373, raised by Iwan Aucamp, thanks!
- Exception message format can now be customized, by overriding ParseBaseException.format_message:
```
def custom_exception_message(exc) -> str:
    found_phrase = f", found {exc.found}" if exc.found else ""
    return f"{exc.lineno}:{exc.column} {exc.msg}{found_phrase}"

ParseBaseException.formatted_message = custom_exception_message
```
  (PR #571 submitted by Odysseyas Krystalakos, nice work!)
- run_tests now detects if an exception is raised in a parse action, and will report it with an enhanced error message, with the exception type, string, and parse action name.
- QuotedString now handles translation of escaped integer, hex, octal, and Unicode sequences to their corresponding characters.
- Fixed the displayed output of Regex terms to deduplicate repeated backslashes, for easier reading in debugging, printing, and railroad diagrams.
- Fixed (or at least reduced) elusive bug when generating railroad diagrams, where some diagram elements were just empty blocks. Fix submitted by RoDuth, thanks a ton!
- Fixed railroad diagrams that get generated with a parser containing a Regex element defined using a verbose pattern - the pattern gets flattened and comments removed before creating the corresponding diagram element.
- Defined a more performant regular expression used internally by common_html_entity.
- Regex instances can now be created using a callable that takes no arguments and just returns a string or a compiled regular expression, so that creating complex regular expression patterns can be deferred until they are actually used for the first time in the parser.
- Added optional flatten Boolean argument to ParseResults.as_list(), to return the parsed values in a flattened list.
- Added indent and base_1 arguments to pyparsing.testing.with_line_numbers. When using with_line_numbers inside a parse action, set base_1=False, since the reported loc value is 0-based. indent can be a leading string (typically of spaces or tabs) to indent the numbered string passed to with_line_numbers. Added while working on #557, reported by Bernd Wechner.
NEW/ENHANCED EXAMPLES
- Added query syntax to mongodb_query_expression.py with:
  - better support for array fields ("contains", "contains all", "contains any", and "contains none")
  - "like" and "not like" operators to support SQL "%" wildcard matching and "=~" operator to support regex matching
  - text search using "search for"
  - dates and datetimes as query values
  - a[0] style array referencing
- Added lox_parser.py example, a parser for the Lox language used as a tutorial in Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/). With helpful corrections from RoDuth.
- Added complex_chemical_formulas.py example, to add parsing capability for formulas such as "3(C₆H₅OH)₂".
- Updated tag_emitter.py to use new Tag class, introduced in pyparsing 3.1.3.

(This is the final beta release before 3.2.0.)

QuotedString now handles translation of escaped integer, hex, octal, and Unicode sequences to their corresponding characters.

Added type annotations to remainder of pyparsing package, and added mypy run to tox.ini, so that type annotations are now run as part of pyparsing's CI. Addresses Issue #373, raised by Iwan Aucamp, thanks!

Exception message format can now be customized, by overriding ParseBaseException.format_message:

def custom_exception_message(exc) -> str:
    found_phrase = f", found {exc.found}" if exc.found else ""
    return f"{exc.lineno}:{exc.column} {exc.msg}{found_phrase}"

ParseBaseException.formatted_message = custom_exception_message

(PR #571 submitted by Odysseyas Krystalakos, nice work!)

POSSIBLE BREAKING CHANGE: Fixed bug in transform_string() where whitespace in the input string was not properly preserved in the output string.

If your code uses transform_string, this bugfix may require changes in your code.
Fixed railroad diagrams that get generated with a parser containing a Regex element defined using a verbose pattern - the pattern gets flattened and comments removed before creating the corresponding diagram element.
Defined a more performant regular expression used internally by common_html_entity.
Regex instances can now be created using a callable that takes no arguments and just returns a string or a compiled regular expression, so that creating complex regular expression patterns can be deferred until they are actually used for the first time in the parser.
Added optional flatten Boolean argument to ParseResults.as_list(), to return the parsed values in a flattened list.

Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names imported from the typing module (e.g., list[str] vs List[str]).
- Reworked portions of the packrat cache to leverage insertion-preserving ordering in dicts.
- Changed pdb.set_trace() call in ParserElement.set_break() to breakpoint().
- Converted typing.NamedTuple to dataclasses.dataclass in railroad diagramming code.
- Added from __future__ import annotations to clean up some type annotations.
POSSIBLE BREAKING CHANGE: Fixed code in ParseElementEnhance subclasses that replaced detailed exception messages raised in contained expressions with a less-specific and less-informative generic exception message and location.

If your code has conditional logic based on the message content in raised ParseExceptions, this bugfix may require changes in your code.
Fixed the displayed output of Regex terms to deduplicate repeated backslashes, for easier reading in debugging, printing, and railroad diagrams.
Fixed (or at least reduced) elusive bug when generating railroad diagrams, where some diagram elements were just empty blocks. Fix submitted by RoDuth, thanks a ton!
Added indent and base_1 arguments to pyparsing.testing.with_line_numbers. When using with_line_numbers inside a parse action, set base_1=False, since the reported loc value is 0-based. indent can be a leading string (typically of spaces or tabs) to indent the numbered string passed to with_line_numbers. Added while working on #557, reported by Bernd Wechner.
Added query syntax to mongodb_query_expression.py with better support for array fields ("contains", "contains all", "contains any", and "contains none"); and "like" and "not like" operators to support SQL "%" wildcard matching and "=~" operator to support regex matching. Also:
- added support for dates and datetimes as query values
- added support for a[0] style array referencing
Added lox_parser.py example, a parser for the Lox language used as a tutorial in Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/). With helpful corrections from RoDuth.
Added complex_chemical_formulas.py example, to add parsing capability for formulas such as "3(C₆H₅OH)₂".

Fixed a regression introduced in pyparsing 3.1.3, addition of a type annotation that referenced re.Pattern. Since this type was introduced in Python 3.7, using this type definition broke Python 3.6 installs of pyparsing 3.1.3. PR submitted by Felix Fontein, nice work!

Added new Tag ParserElement, for inserting metadata into the parsed results. This allows a parser to add metadata or annotations to the parsed tokens. The Tag element also accepts an optional value parameter, defaulting to True. See the new tag_metadata.py example in the examples directory.

Example:

  # add tag indicating mood
  end_punc = "." | ("!" + Tag("enthusiastic")))
  greeting = "Hello" + Word(alphas) + end_punc

  result = greeting.parse_string("Hello World.")
  print(result.dump())

  result = greeting.parse_string("Hello World!")
  print(result.dump())

prints:

  ['Hello', 'World', '.']

  ['Hello', 'World', '!']
  - enthusiastic: True

Added example mongodb_query_expression.py, to convert human-readable infix query expressions (such as a==100 and b>=200) and transform them into the equivalent query argument for the pymongo package ({'$and': [{'a': 100}, {'b': {'$gte': 200}}]}). Supports many equality and inequality operators - see the docstring for the transform_query function for more examples.
Fixed issue where PEP8 compatibility names for ParserElement static methods were not themselves defined as staticmethods. When called using a ParserElement instance, this resulted in a TypeError exception. Reported by eylenburg (#548).
To address a compatibility issue in RDFLib, added a property setter for the ParserElement.name property, to call ParserElement.set_name.
Modified ParserElement.set_name() to accept a None value, to clear the defined name and corresponding error message for a ParserElement.
Updated railroad diagram generation for ZeroOrMore and OneOrMore expressions with stop_on expressions, while investigating #558, reported by user Gu_f.
Added <META> tag to HTML generated for railroad diagrams to force UTF-8 encoding with older browsers, to better display Unicode parser characters.
Fixed some cosmetics/bugs in railroad diagrams:
- fixed groups being shown even when show_groups=False
- show results names as quoted strings when show_results_names=True
- only use integer loop counter if repetition > 2
Some type annotations added for parse action related methods, thanks August Karlstedt (#551).
Added exception type to trace_parse_action exception output, while investigating SO question posted by medihack.
Added set_name calls to internal expressions generated in infix_notation, for improved railroad diagramming.
delta_time, lua_parser, decaf_parser, and roman_numerals examples cleaned up to use latest PEP8 names and add minor enhancements.
Fixed bug (and corresponding test code) in delta_time example that did not handle weekday references in time expressions (like "Monday at 4pm") when the weekday was the same as the current weekday.
Minor performance speedup in trim_arity, to benefit any parsers using parse actions.
Added early testing support for Python 3.13 with JIT enabled.

Support for Python 3.13.
Added ieee_float expression to pyparsing.common, which parses float values, plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (#538).
Updated pep8 synonym wrappers for better type checking compatibility. PR submitted by Ricardo Coccioli (#507).
Fixed empty error message bug, PR submitted by InSync (#534). This should return pyparsing's exception messages to a former, more helpful form. If you have code that parses the exception messages returned by pyparsing, this may require some code changes.
Added unit tests to test for exception message contents, with enhancement to pyparsing.testing.assertRaisesParseException to accept an expected exception message.
Updated example select_parser.py to use PEP8 names and added Groups for better retrieval of parsed values from multiple SELECT clauses.
Added example email_address_parser.py, as suggested by John Byrd (#539).
Added example directx_x_file_parser.py to parse DirectX template definitions, and generate a Pyparsing parser from a template to parse .x files.
Some code refactoring to reduce code nesting, PRs submitted by InSync.
All internal string expressions using '%' string interpolation and str.format() converted to f-strings.

Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! (Issue #502)
Fixed bug in bad exception messages raised by Forward expressions. PR submitted by Kyle Sunden, thanks for your patience and collaboration on this (#493).
Fixed regression in SkipTo, where ignored expressions were not checked when looking for the target expression. Reported by catcombo, Issue #500.
Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, thanks! (Issue #498)
Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 3.2.0 - October, 2024

Version 3.2.0 - October, 2024

Releases: pyparsing/pyparsing

Pyparsing 3.2.1

pyparsing 3.2.0

Version 3.2.0 - October, 2024

pyparsing 3.2.0rc1

Version 3.2.0 - October, 2024

pyparsing 3.2.0b3

pyparsing 3.2.0b2

Pyparsing 3.2.0b1

Pyparsing 3.1.4

Pyparsing 3.1.3

Pyparsing 3.1.2

Pyparsing 3.1.1