Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underapproximating to_int/from_int #117

Merged
merged 18 commits into from
Feb 13, 2024
Merged

Underapproximating to_int/from_int #117

merged 18 commits into from
Feb 13, 2024

Conversation

jurajsic
Copy link
Member

@jurajsic jurajsic commented Feb 1, 2024

This PR is mostly for stringfuzz benchmark.

It adds a possibility to generate underapproximating LIA formula for to_int/from_int, so that if we end up with non-finite language for the string variable of these two predicates, we still generate "something": we check all the words up to some length (right now 3).
It can lead to SAT in most of the cases.

For the UNSAT formulae, I added a new preprocessing step (conversions_validity), that can immediately tell that there are no valid inputs for to_int (i.e., its result is -1). This can also help in a lot of cases.

It seems that what would really help (and probably would solve nearly everything in stringfuzz), is to get information about input equations such as 5 = str.to_int(x).
We could then generate x ∈ 0*5 which would in most cases lead to UNSAT.
However, I did not find a way to get this information into noodler.

@jurajsic jurajsic changed the title To from int 3 Underapproximating to_int/from_int Feb 8, 2024
@jurajsic jurajsic marked this pull request as ready for review February 8, 2024 13:38
@jurajsic
Copy link
Member Author

jurajsic commented Feb 10, 2024

Impact on benchmarks containing to_int/from_int:

Benchmark stringfuzz/to120.csv
# of formulae: 11618
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-c69c980-2cddb2f  11572    46   0.03   0.01    313.64   6349     5223         42     4         0        0
cvc5-1.0.8                  10552  1066   2.61   0.02  27503.53   5777     4775          0  1065         1        0
z3-4.12.2                   11232   386   4.14   0.04  46483.27   6148     5084          0   367        19        0
cvc5-1.1.1                  10522  1096   2.65   0.00  27893.23   5756     4766          0  1095         1        0
z3-4.12.5                   11072   546   4.09   0.01  45283.01   5986     5086          0   421       125        0
z3-noodler-7421e81-2cddb2f  11347   271   0.03   0.01    291.77   6165     5182        266     5         0        0
--------------------------------------------------

Benchmark str_small_rw/to120.csv
# of formulae: 1880
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-c69c980-2cddb2f  1733   147   0.22   0.01  373.44      4     1729        110    24        13        0
cvc5-1.0.8                  1861    19   0.03   0.01   48.26     22     1839          2    17         0        0
z3-4.12.2                   1817    63   0.06   0.03  112.09     25     1792          0    63         0        0
cvc5-1.1.1                  1861    19   0.02   0.00   28.48     20     1841          2    17         0        0
z3-4.12.5                   1819    61   0.07   0.01  132.27     25     1794          0    61         0        0
z3-noodler-7421e81-2cddb2f  1731   149   0.10   0.01  173.57      4     1727        111    25        13        0
--------------------------------------------------

Benchmark full_str_int/to120.csv
# of formulae: 16968
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-c69c980-2cddb2f  14555  2413   0.14   0.01   2103.69    808    13747       1201  1064       148        0
cvc5-1.0.8                  16968     0   0.31   0.03   5258.46   2526    14442          0     0         0        0
z3-4.12.2                   16734   234   1.45   0.04  24231.84   2478    14256          0   234         0        0
cvc5-1.1.1                  16963     5   0.27   0.01   4657.42   2522    14441          0     5         0        0
z3-4.12.5                   16727   241   1.21   0.01  20316.56   2471    14256          0   241         0        0
z3-noodler-7421e81-2cddb2f  14505  2463   0.09   0.01   1346.08    758    13747       1635   813        15        0
--------------------------------------------------

There is no (or minimal) impact on other benchmarks.

@jurajsic jurajsic requested review from vhavlena and Adda0 February 10, 2024 20:41
Copy link
Collaborator

@vhavlena vhavlena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Just a few comments and ideas for a future work.

src/smt/theory_str_noodler/decision_procedure.cpp Outdated Show resolved Hide resolved
src/smt/theory_str_noodler/decision_procedure.cpp Outdated Show resolved Hide resolved
src/smt/theory_str_noodler/formula_preprocess.cpp Outdated Show resolved Hide resolved
STRACE("str-conversion", tout << "failing NFA:" << std::endl << *aut_valid_part << std::endl;);
// util::throw_error("cannot process to_int/from_int for automaton with infinite language");
res_precision = LenNodePrecision::UNDERAPPROX;
if (max_length_of_words > 3) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future work, we could propagate the LIA solver here and check, if the formula subst_var < 100 is valid. In that case we have a precise solution and we could set res_precision to PRECISE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subst_var is a string var, so we would have to check something else, probably you meant |subst_var| < 3.
But I am not sure if it would work.
I was thinking about doing something a bit similar, returning multiple formulae (one underapproximation, one something like overapproximation), but it is quite complicated and it is not that clear how it should work.

Anyway, for stringfuzz, the best thing would be to transform n = str.to_int(x) on the input to n = str.to_int(x) & x ∈ 0*n (and also do that for lenghts, i.e. n = str.len(x) & x ∈ .{n}), I think that would solve everything.
So doing something with overapproximation or something similar would only be helpful for full_str_int, and it is questionable how helpful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. ok, and what about the same checking for from_int? Is it possible at all in our setting? The replacement of from_int by to_int is done in relevant_eh, right ?
  2. Agree, there is a good hope that this can be done in seq rewriter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I do not understand the question. What do you mean by same checking? Also from_int is not replaced with to_int, I do handle them in similar way during the LIA formula creation, but it is still important to know if I have to_int or from_int (for handling invalid cases).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wrote it confusedly. I though that the current problem (mainly for full_str_int) are constraints of the form w = from_int(x) where x is some integer variable. Because we do not know about the possible values of x, we are not currently able to solve it, right? But if we find out that x is always smaller than 100, we can enumerate it quite easily.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I understand now. It can probably be somehow applied for it, I'll have to think about it.

Anyway, there are only 363 formulae with from_int in full_str_int, so I don't think it's the main problem for this benchmark right now.
You can see that the underapproximation, which is useful only for SAT cases (see how number of UNSAT cases did not change at all), already helped a lot with unknowns, however, most of them moved to TO/MO, so I think we also need to focus on simplifying the formula somehow.

Copy link
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me overall.

src/smt/theory_str_noodler/decision_procedure.h Outdated Show resolved Hide resolved
src/smt/theory_str_noodler/formula_preprocess.cpp Outdated Show resolved Hide resolved
@jurajsic
Copy link
Member Author

Can I merge this?

@vhavlena
Copy link
Collaborator

Yes

@Adda0
Copy link
Collaborator

Adda0 commented Feb 13, 2024

I am fine with merging, too.

@jurajsic jurajsic merged commit a8d7bad into devel Feb 13, 2024
@jurajsic jurajsic deleted the to-from-int-3 branch February 13, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants