[R] stringr binding for str_sub()
silently mishandles negative start/stop values #43960
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
I noticed some unusual behavior behavior when attempting to use negative start/end values (i.e. counting from the end of the string) when using str_sub()
in arrow. I've included a few examples below, contrasting how str_sub
behaves with tibbles in R and arrow tables:
library(arrow)
library(tidyverse)
library(reprex)
str_tbl <- tibble(my_string = 'abcde')
str_tbl_a <- as_arrow_table(str_tbl)
# example #1: extract first through second-to-last characters from string
# works fine with a tibble
str_tbl %>% mutate(my_substring = str_sub(my_string, start = 1, end = -2))
#> # A tibble: 1 × 2
#> my_string my_substring
#> <chr> <chr>
#> 1 abcde abcd
# but with arrow: ruh-roh -- missing all characters
str_tbl_a %>% mutate(my_substring = str_sub(my_string, start = 1, end = -2)) %>% collect()
#> # A tibble: 1 × 2
#> my_string my_substring
#> <chr> <chr>
#> 1 abcde ""
# example #2: extract third-to-last through second-to-last characters from string
# works fine with a tibble
str_tbl %>% mutate(my_substring = str_sub(my_string, start = -3, end = -2))
#> # A tibble: 1 × 2
#> my_string my_substring
#> <chr> <chr>
#> 1 abcde cd
# but with arrow: ruh-roh -- missing a character
str_tbl_a %>% mutate(my_substring = str_sub(my_string, -3, -2)) %>% collect()
#> # A tibble: 1 × 2
#> my_string my_substring
#> <chr> <chr>
#> 1 abcde c
# example #3: extract third-to-last through last characters from string
# works fine with a tibble
str_tbl %>% mutate(my_substring = str_sub(my_string, start = -3, end = -1))
#> # A tibble: 1 × 2
#> my_string my_substring
#> <chr> <chr>
#> 1 abcde cde
# but with arrow: bizarrely, this is also fine
str_tbl_a %>% mutate(my_substring = str_sub(my_string, -3, -1)) %>% collect()
#> # A tibble: 1 × 2
#> my_string my_substring
#> <chr> <chr>
#> 1 abcde cde
Created on 2024-09-05 with reprex v2.1.1
Note: the above reprex was created on an Ubuntu 22.04 system running R 4.4.1 and Arrow 16.1.0
Component(s)
R