Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebis-dev: Strings starting with a null byte aren't stored on the heap #2757

Open
adri326 opened this issue Jan 7, 2025 · 7 comments
Open

Comments

@adri326
Copy link
Contributor

adri326 commented Jan 7, 2025

The following query behaves incorrectly on the current head of rebis-dev:

?- A = "\x0\a".
    A = []. % invalid, expected A = "\x0\a".
@adri326
Copy link
Contributor Author

adri326 commented Jan 7, 2025

I'm currently working on fixing the odd logic in Heap::compute_pstr_size, Heap::allocate_pstr and build_functor!, but I still don't quite understand what the intended pstr layout should be.

@triska
Copy link
Contributor

triska commented Jan 7, 2025

I think this may be the reason for #2593.

Note that strings containing \x0\ are not correctly handled in master either in general (#2554).

@triska
Copy link
Contributor

triska commented Jan 7, 2025

what the intended pstr layout should be.

The string should be written directly into the heap in its UTF-8 representation (with the exception of 0-bytes, for which a brief excursion into a list with '\x0\' as its first element is needed, whose tail is again a partial string), described in #24 and most completely in #95 (comment).

The key challenge here is to structure the Rust code in such a way so that as few parts as possible need to be aware of this representation. We have seen many issues so far, mostly because currently (in master, and to a lesser extent also still in rebis-dev) too many parts need to take the representation explicitly into account, and it is easy to forget to handle the case.

@triska
Copy link
Contributor

triska commented Jan 7, 2025

In particular, in your example, the term stored on the heap should be for example:

----------
| '.'/2  |
----------    
| '\x0\' |     
----------    
|  PSTR  |---+
----------   |
|a0000000| <-+
----------
|  []/0  |
----------

Is it the case? Who knows: The mistake could be in writing the term, which is currently also written in Rust but should be written in Prolog to minimize the Rust parts that need to be aware of this representation.

@adri326
Copy link
Contributor Author

adri326 commented Jan 7, 2025

The mistake is in the current Rust implementation of Heap::allocate_pstr; I'm not sure how these rather direct accesses to the heap should be implemented in prolog instead, but I do agree that we desperately need proper encapsulation to limit the amount of code that manually handles this representation (both here and in other places).

@triska
Copy link
Contributor

triska commented Jan 7, 2025

how these rather direct accesses to the heap should be implemented in prolog instead

Prolog predicates will never "see" this representation, they will only "see" that a list must be printed and print it accordingly. Only the underlying Rust-based primitives (notably unification) must be aware of this representation, and ideally as localized as possible.

@adri326
Copy link
Contributor Author

adri326 commented Jan 8, 2025

Oop, I got a funnier one:

A = "abcdefgh\x0\\x0\\x0\".

This one straight up triggers UB :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants