Add `substr_range`, `elem_offset`, and `subslice_range` methods

# Proposal
[proposal]: #proposal

Add the following methods
```rust
impl str {
    fn substr_range(&self, item: &str) -> Option<Range<usize>>;
}
impl<T> [T] {
    fn subslice_range(&self, item: &[T]) -> Option<Range<usize>>;
    fn elem_offset(&self, item: &T) -> Option<usize>;
}
```

NOTE: These methods are completely different in both behavior and functionality from `str::find` and friends. They do not compare characters/items inside of the `str`/`slice`s. Instead they utilize pointer arithmetic to find where a subslice/item is located in the `str`/`slice`.

## Problem statement
[problem-statement]: #problem-statement

This change attempts to fix two distinct issue. The first of which is the inflexibility of `slice::split`, `str::split`, `str::matches`, and related methods. The second is for parse errors.

#### Problem 1
[problem-1]: #problem-1

`str::split` and its related functions are really convenient, but I find myself having to manually implement them if I want more complex logic.

For instance, lets say that I want to split up a string by commas. Specifically, I want to do something like `str::split_inclusive`, but I want to include the separators at the beginning instead of the end.

You cannot do this with `str::split`
```rust
let foo = "a,bc,d";
for sub in foo.split(",") {
    // Now I have the split up string, but it doesn't have the comma at the beginning :-(
}
```
Instead, you have to manually do it yourself
```rust
    let mut foo = "a,bc,d";

    while let Some(comma_idx) = foo.get(1..).map(|f| 1+f.find(",").unwrap_or(f.len())) {
        let sub = &foo[..comma_idx];
        foo = &foo[comma_idx..];

        // sub now has leading comma
    }

```

#### Problem 2
[problem-2]: #problem-2

Say I have a function for parsing and a helper function. The `Range` denotes where in the string the error occurs. We can use that `Range` to tell the user where the error is if one occurs.
```rust
pub fn parse<'a>(in: &'a str) -> Result<usize, Range<usize>> {
    // omitted //
    let a = parse_helper(&in[x..y])?;
    let b = parse_helper(&in[y+1..])?;
   // omitted //
}
fn parse_helper<'a>(in: &'a str) -> Result<usize, Range<usize>> { /* omitted */ }
```
The issue with this is that the range error from `parse_helper` is relative to the substring passed to it. This means that we have to manually offset the `Range` inside of `parse`.
```rust
pub fn parse<'a>(in: &'a str) -> Result<usize, Range<usize>> {
    // omitted //
    let a = parse_helper(&in[x..y]).map_err(|e| e.start + x..e.end + x)?;
    let b = parse_helper(&in[y+1..]).map_err(|e| e.start + y+1..e.end + y+1)?;
   // omitted //
}
// ... //
```

We could also pass an offset to `parse_helper` and use that to adjust the returned `Range`, but that would just move the complexity to the `parse_helper` function.

 This gets even worse if instead of a `Range`, we have an enum that contains `Range`s. Say
```rust
enum ParseError {
   Error1(Range<usize>),
   Error2(Range<usize>, Range<usize>),
}
```


## Motivating examples or use cases



### Split and similar methods

`subslice_offset` would allow using indices to extend methods like `str::split`. We can implement the aforementioned inclusive `str::split` like so:
```rust
let foo = "a,bc,d,";
for sub in foo.split(",") {
    let idx = foo.substr_range(sub).unwrap();
    let sub = &foo[idx.start.saturating_sub(1)..idx.end];
    // Now `sub` is "a", ",bc", ",d", and ","
}
```

### Error handling with string input data
We can also use this method to remove complexity from the code described in **Problem 2**.
```rust
pub fn parse<'a>(in: &'a str) -> Result<usize, &'a str> {
    // omitted //
    let a = parse_helper(&in[x..y])?;
    let b = parse_helper(&in[y+1..])?;
   // omitted //
}
fn parse_helper<'a>(in: &'a str) -> Result<usize, &'a str> { /* omitted */ }
```

Instead of returning `Range`s, we can return `&'a str`. Then, if a caller of `parse` wants to find where the error occurred, they can do
```rust
let input = "9+10-21";
match parse(input) {
    Ok(val) => { /* omitted */ },
    Err(err) => {
        display_error(input.substr_range(err).unwrap());
    },
```

## Solution sketch



```rust
impl str {
    fn substr_range(&self, substr: &str) -> Option<Range<usize>> {
        self.as_bytes().subslice_range(substr.as_bytes())
    }
}

impl<T> [T] {
    fn subslice_range(&self, subslice: &[T]) -> Option<Range<usize>> {
        if core::mem::size_of::<T>() == 0 {
            panic!();
        }

        let self_start = self.as_ptr() as usize;
        let subslice_start = subslice.as_ptr() as usize;

        let byte_start = subslice_start.wrapping_sub(self_start);
        let start = byte_start / core::mem::size_of::<T>();
        let end = start.wrapping_add(subslice.len());

        if byte_start % core::mem::size_of::<T>() != 0 {
            return None;
        }

        if start <= self.len() && end <= self.len() {
            Some(start..end)
        } else {
            None
        }
    }

    fn elem_offset(&self, element: &T) -> Option<usize> {
        if core::mem::size_of::<T>() == 0 {
            panic!();
        }

        let self_start = self.as_ptr() as usize;
        let elem_start = element as *const T as usize;

        let byte_offset = elem_start.wrapping_sub(self_start);
        let offset = byte_offset / core::mem::size_of::<T>();

        if byte_offset % core::mem::size_of::<T>() != 0 {
            return None;
        }

        if offset < self.len() {
            Some(offset)
        } else {
            None
        }
    }
}
```
The following code demonstrates the above behavior
```rust
    let foo = "mississippi".to_owned();
    let is = &foo[4..6];

    assert_eq!(is, "is");
    assert_eq!(foo.substr_offset(is), Some(4..6));

    assert_eq!(foo.substr_offset("is"), None); 
```

## Alternatives



- [subslice_offset](https://docs.rs/subslice-offset/latest/subslice_offset/index.html) crate
- [str::find](https://doc.rust-lang.org/std/primitive.str.html#method.find) (significantly less efficient and has the aforementioned issue)

## Links and related work



[subslice_offset](https://docs.rs/subslice-offset/latest/subslice_offset/index.html) crate

Old, deprecated [`str::subslice_offset`](https://github.com/rust-lang/rust/blob/b3aa1a6d4ac88f68e036a05fdf19be63b522b65d/src/libcore/str/mod.rs#L1610). ([deprecated here](https://github.com/rust-lang/rust/commit/b3aa1a6d4ac88f68e036a05fdf19be63b522b65d#diff-cc3d2c2b93569a4e3ce58a7a53e3bb443e77a9cd2e4015af53d0152f2f8e3183R1648)).
This was deprecated with `str::find` being listed as its replacement, but as mentioned before, this has different functionality.

[Original `subslice_offset` PR](https://github.com/rust-lang/rust/pull/5823)

https://github.com/rust-lang/rfcs/pull/2796 (an RFC similar to this but only for slices). The PR for this was abandoned because the author did not have time to make suggested changes.

https://github.com/wr7/rfcs/blob/substr_range/text/3648-substr-range.md - An RFC that I wrote for this

https://stackoverflow.com/questions/50781561/how-to-find-the-starting-offset-of-a-string-slice-of-another-string

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `substr_range`, `elem_offset`, and `subslice_range` methods #382

Proposal

Problem statement

Problem 1

Problem 2

Motivating examples or use cases

Split and similar methods

Error handling with string input data

Solution sketch

Alternatives

Links and related work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add substr_range, elem_offset, and subslice_range methods #382

Description