Underapproximating `to_int`/`from_int` #117

jurajsic · 2024-02-01T18:58:11Z

This PR is mostly for stringfuzz benchmark.

It adds a possibility to generate underapproximating LIA formula for to_int/from_int, so that if we end up with non-finite language for the string variable of these two predicates, we still generate "something": we check all the words up to some length (right now 3).
It can lead to SAT in most of the cases.

For the UNSAT formulae, I added a new preprocessing step (conversions_validity), that can immediately tell that there are no valid inputs for to_int (i.e., its result is -1). This can also help in a lot of cases.

It seems that what would really help (and probably would solve nearly everything in stringfuzz), is to get information about input equations such as 5 = str.to_int(x).
We could then generate x ∈ 0*5 which would in most cases lead to UNSAT.
However, I did not find a way to get this information into noodler.

jurajsic · 2024-02-10T20:41:07Z

Impact on benchmarks containing to_int/from_int:

Benchmark stringfuzz/to120.csv
# of formulae: 11618
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-c69c980-2cddb2f  11572    46   0.03   0.01    313.64   6349     5223         42     4         0        0
cvc5-1.0.8                  10552  1066   2.61   0.02  27503.53   5777     4775          0  1065         1        0
z3-4.12.2                   11232   386   4.14   0.04  46483.27   6148     5084          0   367        19        0
cvc5-1.1.1                  10522  1096   2.65   0.00  27893.23   5756     4766          0  1095         1        0
z3-4.12.5                   11072   546   4.09   0.01  45283.01   5986     5086          0   421       125        0
z3-noodler-7421e81-2cddb2f  11347   271   0.03   0.01    291.77   6165     5182        266     5         0        0
--------------------------------------------------

Benchmark str_small_rw/to120.csv
# of formulae: 1880
--------------------------------------------------
tool                          ✅    ❌    avg    med    time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  ----  ----  -----  -----  ------  -----  -------  ---------  ----  --------  -------
z3-noodler-c69c980-2cddb2f  1733   147   0.22   0.01  373.44      4     1729        110    24        13        0
cvc5-1.0.8                  1861    19   0.03   0.01   48.26     22     1839          2    17         0        0
z3-4.12.2                   1817    63   0.06   0.03  112.09     25     1792          0    63         0        0
cvc5-1.1.1                  1861    19   0.02   0.00   28.48     20     1841          2    17         0        0
z3-4.12.5                   1819    61   0.07   0.01  132.27     25     1794          0    61         0        0
z3-noodler-7421e81-2cddb2f  1731   149   0.10   0.01  173.57      4     1727        111    25        13        0
--------------------------------------------------

Benchmark full_str_int/to120.csv
# of formulae: 16968
--------------------------------------------------
tool                           ✅    ❌    avg    med      time    sat    unsat    unknown    TO    MO+ERR    other
--------------------------  -----  ----  -----  -----  --------  -----  -------  ---------  ----  --------  -------
z3-noodler-c69c980-2cddb2f  14555  2413   0.14   0.01   2103.69    808    13747       1201  1064       148        0
cvc5-1.0.8                  16968     0   0.31   0.03   5258.46   2526    14442          0     0         0        0
z3-4.12.2                   16734   234   1.45   0.04  24231.84   2478    14256          0   234         0        0
cvc5-1.1.1                  16963     5   0.27   0.01   4657.42   2522    14441          0     5         0        0
z3-4.12.5                   16727   241   1.21   0.01  20316.56   2471    14256          0   241         0        0
z3-noodler-7421e81-2cddb2f  14505  2463   0.09   0.01   1346.08    758    13747       1635   813        15        0
--------------------------------------------------

There is no (or minimal) impact on other benchmarks.

vhavlena

Nice work. Just a few comments and ideas for a future work.

src/smt/theory_str_noodler/decision_procedure.cpp

src/smt/theory_str_noodler/formula_preprocess.cpp

vhavlena · 2024-02-11T11:54:49Z

src/smt/theory_str_noodler/decision_procedure.cpp

+                STRACE("str-conversion", tout << "failing NFA:" << std::endl << *aut_valid_part << std::endl;);
+                // util::throw_error("cannot process to_int/from_int for automaton with infinite language");
+                res_precision = LenNodePrecision::UNDERAPPROX;
+                if (max_length_of_words > 3) {


As a future work, we could propagate the LIA solver here and check, if the formula subst_var < 100 is valid. In that case we have a precise solution and we could set res_precision to PRECISE.

subst_var is a string var, so we would have to check something else, probably you meant |subst_var| < 3.
But I am not sure if it would work.
I was thinking about doing something a bit similar, returning multiple formulae (one underapproximation, one something like overapproximation), but it is quite complicated and it is not that clear how it should work.

Anyway, for stringfuzz, the best thing would be to transform n = str.to_int(x) on the input to n = str.to_int(x) & x ∈ 0*n (and also do that for lenghts, i.e. n = str.len(x) & x ∈ .{n}), I think that would solve everything.
So doing something with overapproximation or something similar would only be helpful for full_str_int, and it is questionable how helpful.

ok, and what about the same checking for from_int? Is it possible at all in our setting? The replacement of from_int by to_int is done in relevant_eh, right ?

Agree, there is a good hope that this can be done in seq rewriter.

I do not understand the question. What do you mean by same checking? Also from_int is not replaced with to_int, I do handle them in similar way during the LIA formula creation, but it is still important to know if I have to_int or from_int (for handling invalid cases).

Sorry, I wrote it confusedly. I though that the current problem (mainly for full_str_int) are constraints of the form w = from_int(x) where x is some integer variable. Because we do not know about the possible values of x, we are not currently able to solve it, right? But if we find out that x is always smaller than 100, we can enumerate it quite easily.

Hmm, I think I understand now. It can probably be somehow applied for it, I'll have to think about it.

Anyway, there are only 363 formulae with from_int in full_str_int, so I don't think it's the main problem for this benchmark right now.
You can see that the underapproximation, which is useful only for SAT cases (see how number of UNSAT cases did not change at all), already helped a lot with unknowns, however, most of them moved to TO/MO, so I think we also need to focus on simplifying the formula somehow.

Adda0

Seems good to me overall.

src/smt/theory_str_noodler/decision_procedure.h

src/smt/theory_str_noodler/formula_preprocess.cpp

Co-authored-by: David Chocholatý <chocholaty.david@protonmail.com>

…-from-int-3

jurajsic · 2024-02-13T10:08:23Z

Can I merge this?

vhavlena · 2024-02-13T10:21:21Z

Yes

Adda0 · 2024-02-13T10:51:20Z

I am fine with merging, too.

jurajsic added 7 commits February 6, 2024 15:36

trying to-int underapprox

a3cb376

limit length of cases

fbc35f3

different limit

a2558ce

do not compute non_valid_part for from_int

f0f1018

faster unsat for invalid to_int using prep

cac9114

push/pop conversions (I have no idea what I am doing)

434888f

typo

c0490a0

jurajsic force-pushed the to-from-int-3 branch from b6da399 to c0490a0 Compare February 6, 2024 14:36

jurajsic added 4 commits February 6, 2024 17:21

comments, better prepocessing digit aut

4ece46d

fix checking non_valid aut for from_int

664b2e8

move trace

84c8ae7

approximation info is part of returning pair of get_lengts

624cfbe

jurajsic changed the title ~~To from int 3~~ Underapproximating to_int/from_int Feb 8, 2024

jurajsic marked this pull request as ready for review February 8, 2024 13:38

jurajsic added 2 commits February 9, 2024 14:29

fix remove_regular for conversions

a89cf58

fix remote_regular

c69c980

jurajsic requested review from vhavlena and Adda0 February 10, 2024 20:41

vhavlena approved these changes Feb 11, 2024

View reviewed changes

parametric underapproximation

595d742

Adda0 approved these changes Feb 12, 2024

View reviewed changes

src/smt/theory_str_noodler/decision_procedure.h Outdated Show resolved Hide resolved

src/smt/theory_str_noodler/formula_preprocess.cpp Outdated Show resolved Hide resolved

jurajsic and others added 4 commits February 12, 2024 10:49

typo

690c39d

Co-authored-by: David Chocholatý <chocholaty.david@protonmail.com>

moved digit automaton construction into AutAssignment

4b23813

Merge branch 'to-from-int-3' of github.com:VeriFIT/z3-noodler into to…

d13b34b

…-from-int-3

from magic numbers 48, 57 to magic constants

3b6cb6d

jurajsic merged commit a8d7bad into devel Feb 13, 2024

jurajsic deleted the to-from-int-3 branch February 13, 2024 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Underapproximating `to_int`/`from_int` #117

Underapproximating `to_int`/`from_int` #117

jurajsic commented Feb 1, 2024 •

edited

Loading

jurajsic commented Feb 10, 2024 •

edited

Loading

vhavlena left a comment

vhavlena Feb 11, 2024

jurajsic Feb 11, 2024

vhavlena Feb 12, 2024

jurajsic Feb 12, 2024

vhavlena Feb 12, 2024

jurajsic Feb 12, 2024

Adda0 left a comment

jurajsic commented Feb 13, 2024

vhavlena commented Feb 13, 2024

Adda0 commented Feb 13, 2024

Underapproximating to_int/from_int #117

Underapproximating to_int/from_int #117

Conversation

jurajsic commented Feb 1, 2024 • edited Loading

jurajsic commented Feb 10, 2024 • edited Loading

vhavlena left a comment

Choose a reason for hiding this comment

vhavlena Feb 11, 2024

Choose a reason for hiding this comment

jurajsic Feb 11, 2024

Choose a reason for hiding this comment

vhavlena Feb 12, 2024

Choose a reason for hiding this comment

jurajsic Feb 12, 2024

Choose a reason for hiding this comment

vhavlena Feb 12, 2024

Choose a reason for hiding this comment

jurajsic Feb 12, 2024

Choose a reason for hiding this comment

Adda0 left a comment

Choose a reason for hiding this comment

jurajsic commented Feb 13, 2024

vhavlena commented Feb 13, 2024

Adda0 commented Feb 13, 2024

Underapproximating `to_int`/`from_int` #117

Underapproximating `to_int`/`from_int` #117

jurajsic commented Feb 1, 2024 •

edited

Loading

jurajsic commented Feb 10, 2024 •

edited

Loading