Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex construction: optimizations #134

Merged
merged 6 commits into from
Apr 26, 2024
Merged

Regex construction: optimizations #134

merged 6 commits into from
Apr 26, 2024

Conversation

vhavlena
Copy link
Collaborator

@vhavlena vhavlena commented Apr 17, 2024

This PR optimizes working with regexes:

  • new rewriting rules for special cases of re.loop
  • wrapper for Alphabet handling
  • block-wise construction of loop NFA

@vhavlena
Copy link
Collaborator Author

# of formulae: 103318
--------------------------------------------------
tool                                               ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
-----------------------------------------------  ----  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-b0899c4-d95fe13                          0   1365   16198.75   0.16   0.02   1.75      0        0        638    658        69        0
z3-noodler-ec4ce36-d95fe13                          0   2939   57016.87   0.57   0.01   5.76      0        0        639   2230        70        0
cvc5-1.1.2                                          0   3619   76478.24   0.77   0.00   6.77      0        0          2   3608         9        0
z3-4.13.0                                           0   5631  132036.99   1.35   0.01   8.11      0        0        210   4960       461        0
ostrich-5dd2e10ca                                   0  16986  782639.00   9.06   1.75  19.20      0        0          0  16962        12       12
z3-4.13.0+cvc5-1.1.2                                0   2562   60604.78   0.60   0.00   6.00      0        0       2562      0         0        0
z3-noodler-b0899c4-d95fe13+z3-4.13.0+cvc5-1.1.2     0    147   11206.03   0.11   0.00   1.66      0        0        147      0         0        0
--------------------------------------------------

image

@vhavlena vhavlena marked this pull request as ready for review April 25, 2024 12:35
@vhavlena vhavlena requested a review from jurajsic April 25, 2024 12:35
@vhavlena
Copy link
Collaborator Author

Particular impact on regex benchmarks

Benchmark automatark
# of formulae: 15995
--------------------------------------------------
tool                                               ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
-----------------------------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-b0899c4-d95fe13                          0    32   1017.08   0.06   0.01   0.67      0        0          0    32         0        0
z3-noodler-ec4ce36-d95fe13                          0    60   1977.23   0.12   0.01   2.56      0        0          0    60         0        0
cvc5-1.1.2                                          0    93   2216.97   0.14   0.00   2.83      0        0          0    85         8        0
z3-4.13.0                                           0   125   6425.06   0.40   0.02   3.90      0        0          0   124         1        0
ostrich-5dd2e10ca                                   0    48  24332.97   1.53   1.41   1.89      0        0          0    48         0        0
z3-4.13.0+cvc5-1.1.2                                0    26    543.82   0.03   0.00   1.22      0        0         26     0         0        0
z3-noodler-b0899c4-d95fe13+z3-4.13.0+cvc5-1.1.2     0     0    102.99   0.01   0.00   0.18      0        0          0     0         0        0
--------------------------------------------------
Benchmark redos
# of formulae: 3287
--------------------------------------------------
tool                                               ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
-----------------------------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-b0899c4-d95fe13                          0    30   5331.42   1.64   0.87   6.43      0        0          0    30         0        0
z3-noodler-ec4ce36-d95fe13                          0  1558  45753.35  26.46   9.68  33.23      0        0          0  1558         0        0
cvc5-1.1.2                                          0  2217  27894.06  26.07   0.03  38.26      0        0          0  2217         0        0
z3-4.13.0                                           0  2399    326.46   0.37   0.01   2.85      0        0          0  2099       300        0
ostrich-5dd2e10ca                                   0  1572  55284.56  32.24  33.17  36.98      0        0          0  1572         0        0
z3-4.13.0+cvc5-1.1.2                                0  1873  27009.57  19.10   0.02  34.70      0        0       1873     0         0        0
z3-noodler-b0899c4-d95fe13+z3-4.13.0+cvc5-1.1.2     0    18   4379.87   1.34   0.34   5.53      0        0         18     0         0        0
--------------------------------------------------

Copy link
Member

@jurajsic jurajsic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really great! Do you know why we get TO for those few benchmarks in redos?

Comment on lines +37 to +40
// bound for loop (above this number an optimized construction is used)
const unsigned LOOP_BOUND = 5000;
// simulation reduction bound in states (bigger automata are not reduced)
const unsigned RED_BOUND = 1000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try also different values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

@vhavlena
Copy link
Collaborator Author

Looks really great! Do you know why we get TO for those few benchmarks in redos?

It seems that it is caused by a combination of a big base NFA and a large number of repetitions inside loop (80000). We can later check if a bigger timeout helps. We can also implement a better block-wise concatenation (with exponentially increasing concatenation steps).

@jurajsic
Copy link
Member

Ok, we can merge then.

@vhavlena vhavlena merged commit b90edac into devel Apr 26, 2024
@vhavlena vhavlena deleted the regex-loop branch April 26, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants