Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple membership heuristic improvement #145

Merged
merged 6 commits into from
May 22, 2024
Merged

Multiple membership heuristic improvement #145

merged 6 commits into from
May 22, 2024

Conversation

jurajsic
Copy link
Member

This PR changes the way that multiple membership heuristic work.
We have
L_1 ∩ ... ∩ L_m ∩ ¬L_{m+1} ∩ ... ∩ ¬L_n
and we want to know if this intersection is empty (unsat).
We did it by sorting the regexes by their expected complexity (with negated regexes at the end), and then iteratively computing the NFAs and intersection, which could help with unsat instances, as the intersection could become empty and the more complex regexes would not be transformed to NFA (see #139).

The bottleneck is usually computing complement. We can however use de Morgan to get
L_1 ∩ ... ∩ L_m ∩ ¬L_{m+1} ∩ ... ∩ ¬L_n = L_1 ∩ ... ∩ L_m ∩ ¬(L_{m+1} ∪ ... ∪ L_n)
and then compute two NFAs:
L = L_1 ∩ ... ∩ L_m (if m=0, we set L to be universal automaton)
L' = L_{m+1} ∪ ... ∪ L_n (if m=n, we set L' = ∅)
We are now asking whether L ∩ ¬L' = ∅ which is the same as asking if L ⊆ L'. Therefore we only compute the inclusion, which is usually easier than complement.

@jurajsic
Copy link
Member Author

Benchmark automatark
# of formulae: 15995
--------------------------------------------------
tool                           ✅    ❌     time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-5d9f410-d95fe13  15990     5   543.27   0.03   0.01   0.30  10477     5513          0     5         0        0
z3-noodler-b7b8ad5-d95fe13  15979    16   567.33   0.04   0.01   0.36  10466     5513          0    16         0        0
cvc5-1.1.2                  15902    93  2216.97   0.14   0.00   2.83  10460     5442          0    85         8        0
z3-4.13.0                   15870   125  6425.06   0.40   0.02   3.90  10469     5401          0   124         1        0
--------------------------------------------------

image

@jurajsic
Copy link
Member Author

There is some problem with redos benchmark:

Benchmark redos
# of formulae: 3287
--------------------------------------------------
tool                          ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-59f141d-d95fe13  2657   630   2743.91   1.03   0.02   4.89   1817      840          0    54       576        0
z3-noodler-b7b8ad5-d95fe13  3284     3   2463.36   0.75   0.03   3.60   2444      840          0     3         0        0
cvc5-1.1.2                  1070  2217  27894.06  26.07   0.03  38.26    574      496          0  2217         0        0
z3-4.13.0                    888  2399    326.46   0.37   0.01   2.85     48      840          0  2099       300        0
--------------------------------------------------

@jurajsic
Copy link
Member Author

Current results:

# of formulae: 103388
--------------------------------------------------
tool                            ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
--------------------------  ------  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-0ce8a69-d95fe13  102285   1103   13988.52   0.14   0.01   1.65  62864    39421        628    425        50        0
z3-noodler-b7b8ad5-d95fe13  102274   1114   12938.26   0.13   0.01   1.63  62853    39421        628    435        51        0
cvc5-1.1.2                   99769   3619   76478.24   0.77   0.00   6.77  60868    38901          2   3608         9        0
z3-4.13.0                    97687   5701  132036.99   1.35   0.01   8.11  58874    38813        210   4960       461       70
ostrich-e386836db            86606  16782  729217.84   8.42   1.64  18.69  47790    38816          0  16767         0       15
--------------------------------------------------

image

Selected benchmarks:

Benchmark automatark
# of formulae: 15995
--------------------------------------------------
tool                           ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-0ce8a69-d95fe13  15990     5    718.04   0.04   0.01   0.48  10477     5513          0     5         0        0
z3-noodler-b7b8ad5-d95fe13  15979    16    567.33   0.04   0.01   0.36  10466     5513          0    16         0        0
cvc5-1.1.2                  15902    93   2216.97   0.14   0.00   2.83  10460     5442          0    85         8        0
z3-4.13.0                   15870   125   6425.06   0.40   0.02   3.90  10469     5401          0   124         1        0
ostrich-e386836db           15947    48  24033.70   1.51   1.39   1.84  10440     5507          0    48         0        0
--------------------------------------------------

Benchmark redos
# of formulae: 3287
--------------------------------------------------
tool                          ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-0ce8a69-d95fe13  3284     3   3378.95   1.03   0.03   3.83   2444      840          0     3         0        0
z3-noodler-b7b8ad5-d95fe13  3284     3   2463.36   0.75   0.03   3.60   2444      840          0     3         0        0
cvc5-1.1.2                  1070  2217  27894.06  26.07   0.03  38.26    574      496          0  2217         0        0
z3-4.13.0                    888  2399    326.46   0.37   0.01   2.85     48      840          0  2099       300        0
ostrich-e386836db           1717  1570  55074.14  32.08  33.04  37.27   1053      664          0  1570         0        0
--------------------------------------------------

@jurajsic
Copy link
Member Author

There is a slow down in redos, I think I know why (I am doing unnecessary first intersection + reduction, which is not needed).

@jurajsic jurajsic marked this pull request as ready for review May 21, 2024 18:24
@jurajsic jurajsic requested a review from vhavlena May 21, 2024 18:24
@jurajsic
Copy link
Member Author

There is a slow down in redos, I think I know why (I am doing unnecessary first intersection + reduction, which is not needed).

Fixed, I was creating nfa from regex twice and redos has really big automata.

Benchmark redos
# of formulae: 3287
--------------------------------------------------
tool                          ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-11a800e-d95fe13  3284     3   2560.71   0.78   0.03   3.61   2444      840          0     3         0        0
z3-noodler-b7b8ad5-d95fe13  3284     3   2463.36   0.75   0.03   3.60   2444      840          0     3         0        0
cvc5-1.1.2                  1070  2217  27894.06  26.07   0.03  38.26    574      496          0  2217         0        0
z3-4.13.0                    888  2399    326.46   0.37   0.01   2.85     48      840          0  2099       300        0
ostrich-e386836db           1717  1570  55074.14  32.08  33.04  37.27   1053      664          0  1570         0        0
--------------------------------------------------

image

Copy link
Collaborator

@vhavlena vhavlena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice.

@jurajsic jurajsic merged commit 4bcf006 into devel May 22, 2024
@jurajsic jurajsic deleted the regex_union branch May 22, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants