Multiple membership heuristic improvement #145

jurajsic · 2024-05-20T14:37:41Z

This PR changes the way that multiple membership heuristic work.
We have
L_1 ∩ ... ∩ L_m ∩ ¬L_{m+1} ∩ ... ∩ ¬L_n
and we want to know if this intersection is empty (unsat).
We did it by sorting the regexes by their expected complexity (with negated regexes at the end), and then iteratively computing the NFAs and intersection, which could help with unsat instances, as the intersection could become empty and the more complex regexes would not be transformed to NFA (see #139).

The bottleneck is usually computing complement. We can however use de Morgan to get
L_1 ∩ ... ∩ L_m ∩ ¬L_{m+1} ∩ ... ∩ ¬L_n = L_1 ∩ ... ∩ L_m ∩ ¬(L_{m+1} ∪ ... ∪ L_n)
and then compute two NFAs:
L = L_1 ∩ ... ∩ L_m (if m=0, we set L to be universal automaton)
L' = L_{m+1} ∪ ... ∪ L_n (if m=n, we set L' = ∅)
We are now asking whether L ∩ ¬L' = ∅ which is the same as asking if L ⊆ L'. Therefore we only compute the inclusion, which is usually easier than complement.

jurajsic · 2024-05-20T14:39:34Z

Benchmark automatark
# of formulae: 15995
--------------------------------------------------
tool                           ✅    ❌     time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-5d9f410-d95fe13  15990     5   543.27   0.03   0.01   0.30  10477     5513          0     5         0        0
z3-noodler-b7b8ad5-d95fe13  15979    16   567.33   0.04   0.01   0.36  10466     5513          0    16         0        0
cvc5-1.1.2                  15902    93  2216.97   0.14   0.00   2.83  10460     5442          0    85         8        0
z3-4.13.0                   15870   125  6425.06   0.40   0.02   3.90  10469     5401          0   124         1        0
--------------------------------------------------

jurajsic · 2024-05-20T22:02:45Z

There is some problem with redos benchmark:

Benchmark redos
# of formulae: 3287
--------------------------------------------------
tool                          ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-59f141d-d95fe13  2657   630   2743.91   1.03   0.02   4.89   1817      840          0    54       576        0
z3-noodler-b7b8ad5-d95fe13  3284     3   2463.36   0.75   0.03   3.60   2444      840          0     3         0        0
cvc5-1.1.2                  1070  2217  27894.06  26.07   0.03  38.26    574      496          0  2217         0        0
z3-4.13.0                    888  2399    326.46   0.37   0.01   2.85     48      840          0  2099       300        0
--------------------------------------------------

jurajsic · 2024-05-21T18:20:24Z

Current results:

# of formulae: 103388
--------------------------------------------------
tool                            ✅     ❌       time    avg    med    std    sat    unsat    unknown     TO    MO+ERR    other
--------------------------  ------  -----  ---------  -----  -----  -----  -----  -------  ---------  -----  --------  -------
z3-noodler-0ce8a69-d95fe13  102285   1103   13988.52   0.14   0.01   1.65  62864    39421        628    425        50        0
z3-noodler-b7b8ad5-d95fe13  102274   1114   12938.26   0.13   0.01   1.63  62853    39421        628    435        51        0
cvc5-1.1.2                   99769   3619   76478.24   0.77   0.00   6.77  60868    38901          2   3608         9        0
z3-4.13.0                    97687   5701  132036.99   1.35   0.01   8.11  58874    38813        210   4960       461       70
ostrich-e386836db            86606  16782  729217.84   8.42   1.64  18.69  47790    38816          0  16767         0       15
--------------------------------------------------

Selected benchmarks:

Benchmark automatark
# of formulae: 15995
--------------------------------------------------
tool                           ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-0ce8a69-d95fe13  15990     5    718.04   0.04   0.01   0.48  10477     5513          0     5         0        0
z3-noodler-b7b8ad5-d95fe13  15979    16    567.33   0.04   0.01   0.36  10466     5513          0    16         0        0
cvc5-1.1.2                  15902    93   2216.97   0.14   0.00   2.83  10460     5442          0    85         8        0
z3-4.13.0                   15870   125   6425.06   0.40   0.02   3.90  10469     5401          0   124         1        0
ostrich-e386836db           15947    48  24033.70   1.51   1.39   1.84  10440     5507          0    48         0        0
--------------------------------------------------

Benchmark redos
# of formulae: 3287
--------------------------------------------------
tool                          ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-0ce8a69-d95fe13  3284     3   3378.95   1.03   0.03   3.83   2444      840          0     3         0        0
z3-noodler-b7b8ad5-d95fe13  3284     3   2463.36   0.75   0.03   3.60   2444      840          0     3         0        0
cvc5-1.1.2                  1070  2217  27894.06  26.07   0.03  38.26    574      496          0  2217         0        0
z3-4.13.0                    888  2399    326.46   0.37   0.01   2.85     48      840          0  2099       300        0
ostrich-e386836db           1717  1570  55074.14  32.08  33.04  37.27   1053      664          0  1570         0        0
--------------------------------------------------

jurajsic · 2024-05-21T18:23:33Z

There is a slow down in redos, I think I know why (I am doing unnecessary first intersection + reduction, which is not needed).

jurajsic · 2024-05-21T18:46:39Z

There is a slow down in redos, I think I know why (I am doing unnecessary first intersection + reduction, which is not needed).

Fixed, I was creating nfa from regex twice and redos has really big automata.

Benchmark redos
# of formulae: 3287
--------------------------------------------------
tool                          ✅    ❌      time    avg    med    std    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  --------  -----  -----  -----  -----  -------  ---------  ----  --------  -------
z3-noodler-11a800e-d95fe13  3284     3   2560.71   0.78   0.03   3.61   2444      840          0     3         0        0
z3-noodler-b7b8ad5-d95fe13  3284     3   2463.36   0.75   0.03   3.60   2444      840          0     3         0        0
cvc5-1.1.2                  1070  2217  27894.06  26.07   0.03  38.26    574      496          0  2217         0        0
z3-4.13.0                    888  2399    326.46   0.37   0.01   2.85     48      840          0  2099       300        0
ostrich-e386836db           1717  1570  55074.14  32.08  33.04  37.27   1053      664          0  1570         0        0
--------------------------------------------------

vhavlena

Really nice.

jurajsic added 4 commits May 20, 2024 15:19

multiple membership heuristic computes inclusion

e03ee20

fix

df77d80

simplify

5d9f410

comments

59f141d

reduce only if the automaton is not too big

0ce8a69

jurajsic marked this pull request as ready for review May 21, 2024 18:24

jurajsic requested a review from vhavlena May 21, 2024 18:24

remove double nfa creation

11a800e

vhavlena approved these changes May 22, 2024

View reviewed changes

jurajsic merged commit 4bcf006 into devel May 22, 2024

jurajsic deleted the regex_union branch May 22, 2024 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple membership heuristic improvement #145

Multiple membership heuristic improvement #145

jurajsic commented May 20, 2024

jurajsic commented May 20, 2024

jurajsic commented May 20, 2024

jurajsic commented May 21, 2024

jurajsic commented May 21, 2024

jurajsic commented May 21, 2024

vhavlena left a comment

Multiple membership heuristic improvement #145

Multiple membership heuristic improvement #145

Conversation

jurajsic commented May 20, 2024

jurajsic commented May 20, 2024

jurajsic commented May 20, 2024

jurajsic commented May 21, 2024

jurajsic commented May 21, 2024

jurajsic commented May 21, 2024

vhavlena left a comment

Choose a reason for hiding this comment