-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple membership heuristic improvement #145
Conversation
|
There is some problem with
|
Current results:
Selected benchmarks:
|
There is a slow down in |
Fixed, I was creating nfa from regex twice and
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice.
This PR changes the way that multiple membership heuristic work.
We have
L_1 ∩ ... ∩ L_m ∩ ¬L_{m+1} ∩ ... ∩ ¬L_n
and we want to know if this intersection is empty (unsat).
We did it by sorting the regexes by their expected complexity (with negated regexes at the end), and then iteratively computing the NFAs and intersection, which could help with unsat instances, as the intersection could become empty and the more complex regexes would not be transformed to NFA (see #139).
The bottleneck is usually computing complement. We can however use de Morgan to get
L_1 ∩ ... ∩ L_m ∩ ¬L_{m+1} ∩ ... ∩ ¬L_n = L_1 ∩ ... ∩ L_m ∩ ¬(L_{m+1} ∪ ... ∪ L_n)
and then compute two NFAs:
L = L_1 ∩ ... ∩ L_m
(ifm=0
, we setL
to be universal automaton)L' = L_{m+1} ∪ ... ∪ L_n
(ifm=n
, we setL' = ∅
)We are now asking whether
L ∩ ¬L' = ∅
which is the same as asking ifL ⊆ L'
. Therefore we only compute the inclusion, which is usually easier than complement.