-
-
Notifications
You must be signed in to change notification settings - Fork 285
/
Copy pathCHANGES
4462 lines (3344 loc) · 182 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
==========
Change Log
==========
RELEASE PLANNING NOTES:
In the pyparsing release 3.3.0, use of many of the pre-PEP8 methods (such as
`ParserElement.parseString`) will start to raise `DeprecationWarnings`. I plan to
completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release
until some time in 2025. So there is plenty of time to convert existing parsers to
the new function names before the old functions are completely removed. (Big help from
Devin J. Pohly in structuring the code to enable this peaceful transition.)
Version 3.2.2 - under development
---------------------------------
- Fixed bug in `nested_expr` where nested contents were stripped of whitespace when
the default whitespace characters were cleared (raised in this StackOverflow
question https://stackoverflow.com/questions/79327649 by Ben Alan). Also addressed
bug in resolving PEP8 compliant argument name and legacy argument name.
- Better exception message for `MatchFirst` and `Or` expressions, showing all alternatives
rather than just the first one. Fixes Issue #592, reported by Focke, thanks!
- Added return type annotation of "-> None" for all `__init__()` methods, to satisfy
`mypy --strict` type checking. PR submitted by FeRD, thank you!
- Added optional argument `show_hidden` to create_diagram` to show
elements that are used internally by pyparsing, but are not part of the actual
parser grammar. For instance, the `Tag` class can insert values into the parsed
results but it does not actually parse any input, so by default it is not included
in a railroad diagram. By calling `create_diagram` with `show_hidden` = `True`,
these internal elements will be included. (You can see this in the tag_metadata.py
script in the examples directory.)
- Added support for Python 3.14.
Version 3.2.1 - December, 2024
------------------------------
- Updated generated railroad diagrams to make non-terminal elements links to their related
sub-diagrams. This _greatly_ improves navigation of the diagram, especially for
large, complex parsers.
- Simplified railroad diagrams emitted for parsers using `infix_notation`, by hiding
lookahead terms. Renamed internally generated expressions for clarity, and improved
diagramming.
- Improved performance of `cpp_style_comment`, `c_style_comment`, `common.fnumber`
and `common.ieee_float` Regex expressions. PRs submitted by Gabriel Gerlero,
nice work, thanks!
- Add missing type annotations to `match_only_at_col`, `replace_with`, `remove_quotes`,
`with_attribute`, and `with_class`. Issue #585 reported by rafrafrek.
- Added generated diagrams for many of the examples.
- Replaced old examples/0README.html file with examples/README.md file.
Version 3.2.0 - October, 2024
-------------------------------
- Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from
Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names
imported from the `typing` module (e.g., `list[str]` vs `List[str]`).
- Reworked portions of the packrat cache to leverage insertion-preserving ordering
in dicts (including removal of uses of `OrderedDict`).
- Changed `pdb.set_trace()` call in `ParserElement.set_break()` to `breakpoint()`.
- Converted `typing.NamedTuple` to `dataclasses.dataclass` in railroad diagramming
code.
- Added `from __future__ import annotations` to clean up some type annotations.
(with assistance from ISyncWithFoo, issue #535, thanks for the help!)
- POSSIBLE BREAKING CHANGES
The following bugfixes may result in subtle changes in the results returned or
exceptions raised by pyparsing.
- Fixed code in `ParseElementEnhance` subclasses that
replaced detailed exception messages raised in contained expressions with a
less-specific and less-informative generic exception message and location.
If your code has conditional logic based on the message content in raised
`ParseExceptions`, this bugfix may require changes in your code.
- Fixed bug in `transform_string()` where whitespace
in the input string was not properly preserved in the output string.
If your code uses `transform_string`, this bugfix may require changes in
your code.
- Fixed bug where an `IndexError` raised in a parse action was
incorrectly handled as an `IndexError` raised as part of the `ParserElement`
parsing methods, and reraised as a `ParseException`. Now an `IndexError`
that raises inside a parse action will properly propagate out as an `IndexError`.
(Issue #573, reported by August Karlstedt, thanks!)
If your code raises `IndexError`s in parse actions, this bugfix may require
changes in your code.
- FIXES AND NEW FEATURES
- Added type annotations to remainder of `pyparsing` package, and added `mypy`
run to `tox.ini`, so that type annotations are now run as part of pyparsing's CI.
Addresses Issue #373, raised by Iwan Aucamp, thanks!
- Exception message format can now be customized, by overriding
`ParseBaseException.format_message`:
def custom_exception_message(exc) -> str:
found_phrase = f", found {exc.found}" if exc.found else ""
return f"{exc.lineno}:{exc.column} {exc.msg}{found_phrase}"
ParseBaseException.formatted_message = custom_exception_message
(PR #571 submitted by Odysseyas Krystalakos, nice work!)
- `run_tests` now detects if an exception is raised in a parse action, and will
report it with an enhanced error message, with the exception type, string,
and parse action name.
- `QuotedString` now handles translation of escaped integer, hex, octal, and
Unicode sequences to their corresponding characters.
- Fixed the displayed output of `Regex` terms to deduplicate repeated backslashes,
for easier reading in debugging, printing, and railroad diagrams.
- Fixed (or at least reduced) elusive bug when generating railroad diagrams,
where some diagram elements were just empty blocks. Fix submitted by RoDuth,
thanks a ton!
- Fixed railroad diagrams that get generated with a parser containing a Regex element
defined using a verbose pattern - the pattern gets flattened and comments removed
before creating the corresponding diagram element.
- Defined a more performant regular expression used internally by `common_html_entity`.
- `Regex` instances can now be created using a callable that takes no arguments
and just returns a string or a compiled regular expression, so that creating complex
regular expression patterns can be deferred until they are actually used for the first
time in the parser.
- Added optional `flatten` Boolean argument to `ParseResults.as_list()`, to
return the parsed values in a flattened list.
- Added `indent` and `base_1` arguments to `pyparsing.testing.with_line_numbers`. When
using `with_line_numbers` inside a parse action, set `base_1`=False, since the
reported `loc` value is 0-based. `indent` can be a leading string (typically of
spaces or tabs) to indent the numbered string passed to `with_line_numbers`.
Added while working on #557, reported by Bernd Wechner.
- NEW/ENHANCED EXAMPLES
- Added query syntax to `mongodb_query_expression.py` with:
- better support for array fields ("contains all",
"contains any", and "contains none")
- "like" and "not like" operators to support SQL "%" wildcard matching
and "=~" operator to support regex matching
- text search using "search for"
- dates and datetimes as query values
- `a[0]` style array referencing
- Added `lox_parser.py` example, a parser for the Lox language used as a tutorial in
Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/).
With helpful corrections from RoDuth.
- Added `complex_chemical_formulas.py` example, to add parsing capability for
formulas such as "3(C₆H₅OH)₂".
- Updated `tag_emitter.py` to use new `Tag` class, introduced in pyparsing
3.1.3.
Version 3.1.4 - August, 2024
----------------------------
- Fixed a regression introduced in pyparsing 3.1.3, addition of a type annotation that
referenced `re.Pattern`. Since this type was introduced in Python 3.7, using this type
definition broke Python 3.6 installs of pyparsing 3.1.3. PR submitted by Felix Fontein,
nice work!
Version 3.1.3 - August, 2024
----------------------------
- Added new `Tag` ParserElement, for inserting metadata into the parsed results.
This allows a parser to add metadata or annotations to the parsed tokens.
The `Tag` element also accepts an optional `value` parameter, defaulting to `True`.
See the new `tag_metadata.py` example in the `examples` directory.
Example:
# add tag indicating mood
end_punc = "." | ("!" + Tag("enthusiastic")))
greeting = "Hello" + Word(alphas) + end_punc
result = greeting.parse_string("Hello World.")
print(result.dump())
result = greeting.parse_string("Hello World!")
print(result.dump())
prints:
['Hello', 'World', '.']
['Hello', 'World', '!']
- enthusiastic: True
- Added example `mongodb_query_expression.py`, to convert human-readable infix query
expressions (such as `a==100 and b>=200`) and transform them into the equivalent
query argument for the pymongo package (`{'$and': [{'a': 100}, {'b': {'$gte': 200}}]}`).
Supports many equality and inequality operators - see the docstring for the
`transform_query` function for more examples.
- Fixed issue where PEP8 compatibility names for `ParserElement` static methods were
not themselves defined as `staticmethods`. When called using a `ParserElement` instance,
this resulted in a `TypeError` exception. Reported by eylenburg (#548).
- To address a compatibility issue in RDFLib, added a property setter for the
`ParserElement.name` property, to call `ParserElement.set_name`.
- Modified `ParserElement.set_name()` to accept a None value, to clear the defined
name and corresponding error message for a `ParserElement`.
- Updated railroad diagram generation for `ZeroOrMore` and `OneOrMore` expressions with
`stop_on` expressions, while investigating #558, reported by user Gu_f.
- Added `<META>` tag to HTML generated for railroad diagrams to force UTF-8 encoding
with older browsers, to better display Unicode parser characters.
- Fixed some cosmetics/bugs in railroad diagrams:
- fixed groups being shown even when `show_groups`=False
- show results names as quoted strings when `show_results_names`=True
- only use integer loop counter if repetition > 2
- Some type annotations added for parse action related methods, thanks August
Karlstedt (#551).
- Added exception type to `trace_parse_action` exception output, while investigating
SO question posted by medihack.
- Added `set_name` calls to internal expressions generated in `infix_notation`, for
improved railroad diagramming.
- `delta_time`, `lua_parser`, `decaf_parser`, and `roman_numerals` examples cleaned up
to use latest PEP8 names and add minor enhancements.
- Fixed bug (and corresponding test code) in `delta_time` example that did not handle
weekday references in time expressions (like "Monday at 4pm") when the weekday was
the same as the current weekday.
- Minor performance speedup in `trim_arity`, to benefit any parsers using parse actions.
- Added early testing support for Python 3.13 with JIT enabled.
Version 3.1.2 - March, 2024
---------------------------
- Added `ieee_float` expression to `pyparsing.common`, which parses float values,
plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (#538).
- Updated pep8 synonym wrappers for better type checking compatibility. PR submitted
by Ricardo Coccioli (#507).
- Fixed empty error message bug, PR submitted by InSync (#534). This _should_ return
pyparsing's exception messages to a former, more helpful form. If you have code that
parses the exception messages returned by pyparsing, this may require some code
changes.
- Added unit tests to test for exception message contents, with enhancement to
`pyparsing.testing.assertRaisesParseException` to accept an expected exception message.
- Updated example `select_parser.py` to use PEP8 names and added Groups for better retrieval
of parsed values from multiple SELECT clauses.
- Added example `email_address_parser.py`, as suggested by John Byrd (#539).
- Added example `directx_x_file_parser.py` to parse DirectX template definitions, and
generate a Pyparsing parser from a template to parse .x files.
- Some code refactoring to reduce code nesting, PRs submitted by InSync.
- All internal string expressions using '%' string interpolation and `str.format()`
converted to f-strings.
Version 3.1.1 - July, 2023
--------------------------
- Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! (Issue #502)
- Fixed bug in bad exception messages raised by Forward expressions. PR submitted
by Kyle Sunden, thanks for your patience and collaboration on this (#493).
- Fixed regression in SkipTo, where ignored expressions were not checked when looking
for the target expression. Reported by catcombo, Issue #500.
- Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, thanks! (Issue #498)
- Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)
Version 3.1.0 - June, 2023
--------------------------
- Added `tag_emitter.py` to examples. This example demonstrates how to insert
tags into your parsed results that are not part of the original parsed text.
Version 3.1.0b2 - May, 2023
---------------------------
- Updated `create_diagram()` code to be compatible with railroad-diagrams package
version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars),
reported by Sam Morley-Short.
- Fixed bug in `NotAny`, where parse actions on the negated expr were not being run.
This could cause `NotAny` to incorrectly fail if the expr would normally match,
but would fail to match if a condition used as a parse action returned False.
Fixes Issue #482, raised by byaka, thank you!
- Fixed `create_diagram()` to accept keyword args, to be passed through to the
`template.render()` method to generate the output HTML (PR submitted by Aussie Schnore,
good catch!)
- Fixed bug in `python_quoted_string` regex.
- Added `examples/bf.py` Brainf*ck parser/executor example. Illustrates using
a pyparsing grammar to parse language syntax, and attach executable AST nodes to
the parsed results.
Version 3.1.0b1 - April, 2023
-----------------------------
- Added support for Python 3.12.
- API CHANGE: A slight change has been implemented when unquoting a quoted string
parsed using the `QuotedString` class. Formerly, when unquoting and processing
whitespace markers such as \t and \n, these substitutions would occur first, and
then any additional '\' escaping would be done on the resulting string. This would
parse "\\n" as "\<newline>". Now escapes and whitespace markers are all processed
in a single pass working left to right, so the quoted string "\\n" would get unquoted
to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq,
thanks!
- Added named field "url" to `pyparsing.common.url`, returning the entire
parsed URL string.
- Fixed bug when parse actions returned an empty string for an expression that
had a results name, that the results name was not saved. That is:
expr = Literal("X").add_parse_action(lambda tokens: "")("value")
result = expr.parse_string("X")
print(result["value"])
would raise a `KeyError`. Now empty strings will be saved with the associated
results name. Raised in Issue #470 by Nicco Kunzmann, thank you.
- Fixed bug in `SkipTo` where ignore expressions were not properly handled while
scanning for the target expression. Issue #475, reported by elkniwt, thanks
(this bug has been there for a looooong time!).
- Updated `ci.yml` permissions to limit default access to source - submitted by Joyce
Brum of Google. Thanks so much!
- Updated the `lucene_grammar.py` example (better support for '*' and '?' wildcards)
and corrected the test cases - brought to my attention by Elijah Nicol, good catch!
Version 3.1.0a1 - March, 2023
-----------------------------
- API ENHANCEMENT: `Optional(expr)` may now be written as `expr | ""`
This will make this code:
"{" + Optional(Literal("A") | Literal("a")) + "}"
writable as:
"{" + (Literal("A") | Literal("a") | "") + "}"
Some related changes implemented as part of this work:
- `Literal("")` now internally generates an `Empty()` (and no longer raises an exception)
- `Empty` is now a subclass of `Literal`
Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.
- Added new class property `identifier` to all Unicode set classes in `pyparsing.unicode`,
using the class's values for `cls.identchars` and `cls.identbodychars`. Now Unicode-aware
parsers that formerly wrote:
ppu = pyparsing.unicode
ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
can now write:
ident = ppu.Greek.identifier
# or
# ident = ppu.Ελληνικά.identifier
- `ParseResults` now has a new method `deepcopy()`, in addition to the current
`copy()` method. `copy()` only makes a shallow copy - any contained `ParseResults`
are copied as references - changes in the copy will be seen as changes in the original.
In many cases, a shallow copy is sufficient, but some applications require a deep copy.
`deepcopy()` makes a deeper copy: any contained `ParseResults` or other mappings or
containers are built with copies from the original, and do not get changed if the
original is later changed. Addresses issue #463, reported by Bryn Pickering.
- Reworked `delimited_list` function into the new `DelimitedList` class.
`DelimitedList` has the same constructor interface as `delimited_list`, and
in this release, `delimited_list` changes from a function to a synonym for
`DelimitedList`. `delimited_list` and the older `delimitedList` method will be
deprecated in a future release, in favor of `DelimitedList`.
- Error messages from `MatchFirst` and `Or` expressions will try to give more details
if one of the alternatives matches better than the others, but still fails.
Question raised in Issue #464 by msdemlei, thanks!
- Added new class method `ParserElement.using_each`, to simplify code
that creates a sequence of `Literals`, `Keywords`, or other `ParserElement`
subclasses.
For instance, to define suppressible punctuation, you would previously
write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
You can now write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
`using_each` will also accept optional keyword args, which it will
pass through to the class initializer. Here is an expression for
single-letter variable names that might be used in an algebraic
expression:
algebra_var = MatchFirst(
Char.using_each(string.ascii_lowercase, as_keyword=True)
)
- Added new builtin `python_quoted_string`, which will match any form
of single-line or multiline quoted strings defined in Python. (Inspired
by discussion with Andreas Schörgenhumer in Issue #421.)
- Extended `expr[]` notation for repetition of `expr` to accept a
slice, where the slice's stop value indicates a `stop_on`
expression:
test = "BEGIN aaa bbb ccc END"
BEGIN, END = Keyword.using_each("BEGIN END".split())
body_word = Word(alphas)
expr = BEGIN + Group(body_word[...:END]) + END
# equivalent to
# expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
print(expr.parse_string(test))
Prints:
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
- `ParserElement.validate()` is deprecated. It predates the support for left-recursive
parsers, and was prone to false positives (warning that a grammar was invalid when
it was in fact valid). It will be removed in a future pyparsing release. In its
place, developers should use debugging and analytical tools, such as `ParserElement.set_debug()`
and `ParserElement.create_diagram()`.
(Raised in Issue #444, thanks Andrea Micheli!)
- Added bool `embed` argument to `ParserElement.create_diagram()`.
When passed as True, the resulting diagram will omit the `<DOCTYPE>`,
`<HEAD>`, and `<BODY>` tags so that it can be embedded in other
HTML source. (Useful when embedding a call to `create_diagram()` in
a PyScript HTML page.)
- Added `recurse` argument to `ParserElement.set_debug` to set the
debug flag on an expression and all of its sub-expressions. Requested
by multimeric in Issue #399.
- Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.
- Fixed bug in `Word` when `max=2`. Also added performance enhancement
when specifying `exact` argument. Reported in issue #409 by
panda-34, nice catch!
- `Word` arguments are now validated if `min` and `max` are both
given, that `min` <= `max`; raises `ValueError` if values are invalid.
- Fixed bug in srange, when parsing escaped '/' and '\' inside a
range set.
- Fixed exception messages for some `ParserElements` with custom names,
which instead showed their contained expression names.
- Fixed bug in pyparsing.common.url, when input URL is not alone
on an input line. Fixes Issue #459, reported by David Kennedy.
- Multiple added and corrected type annotations. With much help from
Stephen Rosen, thanks!
- Some documentation and error message clarifications on pyparsing's
keyword logic, cited by Basil Peace.
- General docstring cleanup for Sphinx doc generation, PRs submitted
by Devin J. Pohly. A dirty job, but someone has to do it - much
appreciated!
- `invRegex.py` example renamed to `inv_regex.py` and updated to PEP-8
variable and method naming. PR submitted by Ross J. Duff, thanks!
- Removed examples `sparser.py` and `pymicko.py`, since each included its
own GPL license in the header. Since this conflicts with pyparsing's
MIT license, they were removed from the distribution to avoid
confusion among those making use of them in their own projects.
Version 3.0.9 - May, 2022
-------------------------
- Added Unicode set `BasicMultilingualPlane` (may also be referenced
as `BMP`) representing the Basic Multilingual Plane (Unicode
characters up to code point 65535). Can be used to parse
most language characters, but omits emojis, wingdings, etc.
Raised in discussion with Dave Tapley (issue #392).
- To address mypy confusion of `pyparsing.Optional` and `typing.Optional`
resulting in `error: "_SpecialForm" not callable` message
reported in issue #365, fixed the import in `exceptions.py`. Nice
sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you!
(Removed definitions of `OptionalType`, `DictType`, and `IterableType`
and replaced them with `typing.Optional`, `typing.Dict`, and
`typing.Iterable` throughout.)
- Fixed typo in jinja2 template for railroad diagrams, thanks for the
catch Nioub (issue #388).
- Removed use of deprecated `pkg_resources` package in
railroad diagramming code (issue #391).
- Updated `bigquery_view_parser.py` example to parse examples at
https://cloud.google.com/bigquery/docs/reference/legacy-sql
Version 3.0.8 - April, 2022
---------------------------
- API CHANGE: modified `pyproject.toml` to require Python version
3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6
fail in evaluating the `version_info` class (implemented using
`typing.NamedTuple`). If you are using an earlier version of Python
3.6, you will need to use pyparsing 2.4.7.
- Improved pyparsing import time by deferring regex pattern compiles.
PR submitted by Anthony Sottile to fix issue #362, thanks!
- Updated build to use flit, PR by Michał Górny, added `BUILDING.md`
doc and removed old Windows build scripts - nice cleanup work!
- More type-hinting added for all arithmetic and logical operator
methods in `ParserElement`. PR from Kazantcev Andrey, thank you.
- Fixed `infix_notation`'s definitions of `lpar` and `rpar`, to accept
parse expressions such that they do not get suppressed in the parsed
results. PR submitted by Philippe Prados, nice work.
- Fixed bug in railroad diagramming with expressions containing `Combine`
elements. Reported by Jeremy White, thanks!
- Added `show_groups` argument to `create_diagram` to highlight grouped
elements with an unlabeled bounding box.
- Added `unicode_denormalizer.py` to the examples as a demonstration
of how Python's interpreter will accept Unicode characters in
identifiers, but normalizes them back to ASCII so that identifiers
`print` and `𝕡𝓻ᵢ𝓃𝘁` and `𝖕𝒓𝗂𝑛ᵗ` are all equivalent.
- Removed imports of deprecated `sre_constants` module for catching
exceptions when compiling regular expressions. PR submitted by
Serhiy Storchaka, thank you.
Version 3.0.7 - January, 2022
-----------------------------
- Fixed bug #345, in which delimitedList changed expressions in place
using `expr.streamline()`. Reported by Kim Gräsman, thanks!
- Fixed bug #346, when a string of word characters was passed to WordStart
or `WordEnd` instead of just taking the default value. Originally posted
as a question by Parag on StackOverflow, good catch!
- Fixed bug #350, in which `White` expressions could fail to match due to
unintended whitespace-skipping. Reported by Fu Hanxi, thank you!
- Fixed bug #355, when a `QuotedString` is defined with characters in its
quoteChar string containing regex-significant characters such as ., *,
?, [, ], etc.
- Fixed bug in `ParserElement.run_tests` where comments would be displayed
using `with_line_numbers`.
- Added optional "min" and "max" arguments to `delimited_list`. PR
submitted by Marius, thanks!
- Added new API change note in `whats_new_in_pyparsing_3_0_0`, regarding
a bug fix in the `bool()` behavior of `ParseResults`.
Prior to pyparsing 3.0.x, the `ParseResults` class implementation of
`__bool__` would return `False` if the `ParseResults` item list was empty,
even if it contained named results. In 3.0.0 and later, `ParseResults` will
return `True` if either the item list is not empty *or* if the named
results dict is not empty.
# generate an empty ParseResults by parsing a blank string with
# a ZeroOrMore
result = Word(alphas)[...].parse_string("")
print(result.as_list())
print(result.as_dict())
print(bool(result))
# add a results name to the result
result["name"] = "empty result"
print(result.as_list())
print(result.as_dict())
print(bool(result))
Prints:
[]
{}
False
[]
{'name': 'empty result'}
True
In previous versions, the second call to `bool()` would return `False`.
- Minor enhancement to Word generation of internal regular expression, to
emit consecutive characters in range, such as "ab", as "ab", not "a-b".
- Fixed character ranges for search terms using non-Western characters
in booleansearchparser, PR submitted by tc-yu, nice work!
- Additional type annotations on public methods.
Version 3.0.6 - November, 2021
------------------------------
- Added `suppress_warning()` method to individually suppress a warning on a
specific ParserElement. Used to refactor `original_text_for` to preserve
internal results names, which, while undocumented, had been adopted by
some projects.
- Fix bug when `delimited_list` was called with a str literal instead of a
parse expression.
Version 3.0.5 - November, 2021
------------------------------
- Added return type annotations for `col`, `line`, and `lineno`.
- Fixed bug when `warn_ungrouped_named_tokens_in_collection` warning was raised
when assigning a results name to an `original_text_for` expression.
(Issue #110, would raise warning in packaging.)
- Fixed internal bug where `ParserElement.streamline()` would not return self if
already streamlined.
- Changed `run_tests()` output to default to not showing line and column numbers.
If line numbering is desired, call with `with_line_numbers=True`. Also fixed
minor bug where separating line was not included after a test failure.
Version 3.0.4 - October, 2021
-----------------------------
- Fixed bug in which `Dict` classes did not correctly return tokens as nested
`ParseResults`, reported by and fix identified by Bu Sun Kim, many thanks!!!
- Documented API-changing side-effect of converting `ParseResults` to use `__slots__`
to pre-define instance attributes. This means that code written like this (which
was allowed in pyparsing 2.4.7):
result = Word(alphas).parseString("abc")
result.xyz = 100
now raises this Python exception:
AttributeError: 'ParseResults' object has no attribute 'xyz'
To add new attribute values to ParseResults object in 3.0.0 and later, you must
assign them using indexed notation:
result["xyz"] = 100
You will still be able to access this new value as an attribute or as an
indexed item.
- Fixed bug in railroad diagramming where the vertical limit would count all
expressions in a group, not just those that would create visible railroad
elements.
Version 3.0.3 - October, 2021
-----------------------------
- Fixed regex typo in `one_of` fix for `as_keyword=True`.
- Fixed a whitespace-skipping bug, Issue #319, introduced as part of the revert
of the `LineStart` changes. Reported by Marc-Alexandre Côté,
thanks!
- Added header column labeling > 100 in `with_line_numbers` - some input lines
are longer than others.
Version 3.0.2 - October, 2021
-----------------------------
- Reverted change in behavior with `LineStart` and `StringStart`, which changed the
interpretation of when and how `LineStart` and `StringStart` should match when
a line starts with spaces. In 3.0.0, the `xxxStart` expressions were not
really treated like expressions in their own right, but as modifiers to the
following expression when used like `LineStart() + expr`, so that if there
were whitespace on the line before `expr` (which would match in versions prior
to 3.0.0), the match would fail.
3.0.0 implemented this by automatically promoting `LineStart() + expr` to
`AtLineStart(expr)`, which broke existing parsers that did not expect `expr` to
necessarily be right at the start of the line, but only be the first token
found on the line. This was reported as a regression in Issue #317.
In 3.0.2, pyparsing reverts to the previous behavior, but will retain the new
`AtLineStart` and `AtStringStart` expression classes, so that parsers can chose
whichever behavior applies in their specific instance. Specifically:
# matches expr if it is the first token on the line
# (allows for leading whitespace)
LineStart() + expr
# matches only if expr is found in column 1
AtLineStart(expr)
- Performance enhancement to `one_of` to always generate an internal `Regex`,
even if `caseless` or `as_keyword` args are given as `True` (unless explicitly
disabled by passing `use_regex=False`).
- `IndentedBlock` class now works with `recursive` flag. By default, the
results parsed by an `IndentedBlock` are grouped. This can be disabled by constructing
the `IndentedBlock` with `grouped=False`.
Version 3.0.1 - October, 2021
-----------------------------
- Fixed bug where `Word(max=n)` did not match word groups less than length 'n'.
Thanks to Joachim Metz for catching this!
- Fixed bug where `ParseResults` accidentally created recursive contents.
Joachim Metz on this one also!
- Fixed bug where `warn_on_multiple_string_args_to_oneof` warning is raised
even when not enabled.
Version 3.0.0 - October, 2021
-----------------------------
- A consolidated list of all the changes in the 3.0.0 release can be found in
`docs/whats_new_in_3_0_0.rst`.
(https://github.com/pyparsing/pyparsing/blob/master/docs/whats_new_in_3_0_0.rst)
Version 3.0.0.final - October, 2021
-----------------------------------
- Added support for python `-W` warning option to call `enable_all_warnings`() at startup.
Also detects setting of `PYPARSINGENABLEALLWARNINGS` environment variable to any non-blank
value. (If using `-Wd` for testing, but wishing to disable pyparsing warnings, add
`-Wi:::pyparsing`.)
- Fixed named results returned by `url` to match fields as they would be parsed
using `urllib.parse.urlparse`.
- Early response to `with_line_numbers` was positive, with some requested enhancements:
. added a trailing "|" at the end of each line (to show presence of trailing spaces);
can be customized using `eol_mark` argument
. added expand_tabs argument, to control calling str.expandtabs (defaults to True
to match `parseString`)
. added mark_spaces argument to support display of a printing character in place of
spaces, or Unicode symbols for space and tab characters
. added mark_control argument to support highlighting of control characters using
'.' or Unicode symbols, such as "␍" and "␊".
- Modified helpers `common_html_entity` and `replace_html_entity()` to use the HTML
entity definitions from `html.entities.html5`.
- Updated the class diagram in the pyparsing docs directory, along with the supporting
.puml file (PlantUML markup) used to create the diagram.
- Added global method `autoname_elements()` to call `set_name()` on all locally
defined `ParserElements` that haven't been explicitly named using `set_name()`, using
their local variable name. Useful for setting names on multiple elements when
creating a railroad diagram.
a = pp.Literal("a")
b = pp.Literal("b").set_name("bbb")
pp.autoname_elements()
`a` will get named "a", while `b` will keep its name "bbb".
Version 3.0.0rc2 - October, 2021
--------------------------------
- Added `url` expression to `pyparsing_common`. (Sample code posted by Wolfgang Fahl,
very nice!)
This new expression has been added to the `urlExtractorNew.py` example, to show how
it extracts URL fields into separate results names.
- Added method to `pyparsing_test` to help debugging, `with_line_numbers`.
Returns a string with line and column numbers corresponding to values shown
when parsing with expr.set_debug():
data = """\
A
100"""
expr = pp.Word(pp.alphanums).set_name("word").set_debug()
print(ppt.with_line_numbers(data))
expr[...].parseString(data)
prints:
1
1234567890
1: A
2: 100
Match word at loc 3(1,4)
A
^
Matched word -> ['A']
Match word at loc 11(2,7)
100
^
Matched word -> ['100']
- Added new example `cuneiform_python.py` to demonstrate creating a new Unicode
range, and writing a Cuneiform->Python transformer (inspired by zhpy).
- Fixed issue #272, reported by PhasecoreX, when `LineStart`() expressions would match
input text that was not necessarily at the beginning of a line.
As part of this fix, two new classes have been added: AtLineStart and AtStringStart.
The following expressions are equivalent:
LineStart() + expr and AtLineStart(expr)
StringStart() + expr and AtStringStart(expr)
[`LineStart` and `StringStart` changes reverted in 3.0.2.]
- Fixed `ParseFatalExceptions` failing to override normal exceptions or expression
matches in `MatchFirst` expressions. Addresses issue #251, reported by zyp-rgb.
- Fixed bug in which `ParseResults` replaces a collection type value with an invalid
type annotation (as a result of changed behavior in Python 3.9). Addresses issue #276, reported by
Rob Shuler, thanks.
- Fixed bug in `ParseResults` when calling `__getattr__` for special double-underscored
methods. Now raises `AttributeError` for non-existent results when accessing a
name starting with '__'. Addresses issue #208, reported by Joachim Metz.
- Modified debug fail messages to include the expression name to make it easier to sync
up match vs success/fail debug messages.
Version 3.0.0rc1 - September, 2021
----------------------------------
- Railroad diagrams have been reformatted:
. creating diagrams is easier - call
expr.create_diagram("diagram_output.html")
create_diagram() takes 3 arguments:
. the filename to write the diagram HTML
. optional 'vertical' argument, to specify the minimum number of items in a path
to be shown vertically; default=3
. optional 'show_results_names' argument, to specify whether results name
annotations should be shown; default=False
. every expression that gets a name using `setName()` gets separated out as
a separate subdiagram
. results names can be shown as annotations to diagram items
. `Each`, `FollowedBy`, and `PrecededBy` elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND]
annotations
. removed annotations for Suppress elements
. some diagram cleanup when a grammar contains Forward elements
. check out the examples make_diagram.py and railroad_diagram_demo.py
- Type annotations have been added to most public API methods and classes.
- Better exception messages to show full word where an exception occurred.
Word(alphas, alphanums)[...].parseString("ab1 123", parseAll=True)
Was:
pyparsing.ParseException: Expected end of text, found '1' (at char 4), (line:1, col:5)
Now:
pyparsing.exceptions.ParseException: Expected end of text, found '123' (at char 4), (line:1, col:5)
- Suppress can be used to suppress text skipped using "...".
source = "lead in START relevant text END trailing text"
start_marker = Keyword("START")
end_marker = Keyword("END")
find_body = Suppress(...) + start_marker + ... + end_marker
print(find_body.parseString(source).dump())
Prints:
['START', 'relevant text ', 'END']
- _skipped: ['relevant text ']
- New string constants `identchars` and `identbodychars` to help in defining identifier Word expressions
Two new module-level strings have been added to help when defining identifiers, `identchars` and `identbodychars`.
Instead of writing::
import pyparsing as pp
identifier = pp.Word(pp.alphas + "_", pp.alphanums + "_")
you will be able to write::
identifier = pp.Word(pp.identchars, pp.identbodychars)
Those constants have also been added to all the Unicode string classes::
import pyparsing as pp
ppu = pp.pyparsing_unicode
cjk_identifier = pp.Word(ppu.CJK.identchars, ppu.CJK.identbodychars)
greek_identifier = pp.Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
- Added a caseless parameter to the `CloseMatch` class to allow for casing to be
ignored when checking for close matches. (Issue #281) (PR by Adrian Edwards, thanks!)
- Fixed bug in Located class when used with a results name. (Issue #294)
- Fixed bug in `QuotedString` class when the escaped quote string is not a
repeated character. (Issue #263)
- `parseFile()` and `create_diagram()` methods now will accept `pathlib.Path`
arguments.
Version 3.0.0b3 - August, 2021
------------------------------
- PEP-8 compatible names are being introduced in pyparsing version 3.0!
All methods such as `parseString` have been replaced with the PEP-8
compliant name `parse_string`. In addition, arguments such as `parseAll`
have been renamed to `parse_all`. For backward-compatibility, synonyms for
all renamed methods and arguments have been added, so that existing
pyparsing parsers will not break. These synonyms will be removed in a future
release.
In addition, the Optional class has been renamed to Opt, since it clashes
with the common typing.Optional type specifier that is used in the Python
type annotations. A compatibility synonym is defined for now, but will be
removed in a future release.
- HUGE NEW FEATURE - Support for left-recursive parsers!
Following the method used in Python's PEG parser, pyparsing now supports
left-recursive parsers when left recursion is enabled.
import pyparsing as pp
pp.ParserElement.enable_left_recursion()
# a common left-recursion definition
# define a list of items as 'list + item | item'
# BNF:
# item_list := item_list item | item
# item := word of alphas
item_list = pp.Forward()
item = pp.Word(pp.alphas)
item_list <<= item_list + item | item
item_list.run_tests("""\
To parse or not to parse that is the question
""")
Prints:
['To', 'parse', 'or', 'not', 'to', 'parse', 'that', 'is', 'the', 'question']
Great work contributed by Max Fischer!
- `delimited_list` now supports an additional flag `allow_trailing_delim`,
to optionally parse an additional delimiter at the end of the list.
Contributed by Kazantcev Andrey, thanks!
- Removed internal comparison of results values against b"", which
raised a `BytesWarning` when run with `python -bb`. Fixes issue #271 reported
by Florian Bruhin, thank you!
- Fixed STUDENTS table in sql2dot.py example, fixes issue #261 reported by
legrandlegrand - much better.
- Python 3.5 will not be supported in the pyparsing 3 releases. This will allow
for future pyparsing releases to add parameter type annotations, and to take
advantage of dict key ordering in internal results name tracking.
Version 3.0.0b2 - December, 2020