Infinite loop with tar -s/re/repl/g when re matches the empty string #2438
Description
Minor as unlikely to happen in practice, but
(cd /etc && tar cf - issue) | bsdtar -'s/^/x/g' -xpf -
Or:
(cd /etc && tar cf - issue) | bsdtar -'s/i*/<~>/g' -xpf -
Or:
(cd /etc && tar cf - issue) | bsdtar -'s/\</</g' -xpf -
Run into infinite loops presumably because after the first substitution, it tries again where the previous match ended which is the exact same place.
Things like sed 's/re/repl/g'
avoid that problem by advancing by one character when there was a match for the empty string.
$ echo issue | sed 's/i*/<&>/g'
<i>s<>s<>u<>e<>
There is a separate issue in that:
$ (cd /etc && tar cf - issue) | bsdtar -'s/\<./<~/gp' -xpf -
issue >> <i<s<s<u<e
Where word boundaries are found where they shouldn't be.
Some sed
implementations have the same problem, an issue hard to address with the POSIX regexp API (\<
being not a regexp operator except in ex
/vi
) where there's no equivalent of REG_NOTBOL
for \<
.
REG_NOTBOL
which btw should probably be used in:
$ (cd /etc && tar cf - issue) | bsdtar -'s/i/I/' -'s/^s/<~/p' -xf -
issue >> I<ssue
(similar issue as above where the beginning of the subject is found where it shouldn't be).
That's with:
$ bsdtar --version
bsdtar 3.7.4 - libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6
On Debian.