Skip to content

Infinite loop with tar -s/re/repl/g when re matches the empty string #2438

Open
@stephane-chazelas

Description

Minor as unlikely to happen in practice, but

(cd /etc && tar cf - issue) | bsdtar -'s/^/x/g' -xpf -

Or:

(cd /etc && tar cf - issue) | bsdtar -'s/i*/<~>/g' -xpf -

Or:

(cd /etc && tar cf - issue) | bsdtar -'s/\</</g' -xpf -

Run into infinite loops presumably because after the first substitution, it tries again where the previous match ended which is the exact same place.

Things like sed 's/re/repl/g' avoid that problem by advancing by one character when there was a match for the empty string.

$ echo issue | sed 's/i*/<&>/g'
<i>s<>s<>u<>e<>

There is a separate issue in that:

$ (cd /etc && tar cf - issue) | bsdtar -'s/\<./<~/gp' -xpf -
issue >> <i<s<s<u<e

Where word boundaries are found where they shouldn't be.

Some sed implementations have the same problem, an issue hard to address with the POSIX regexp API (\< being not a regexp operator except in ex/vi) where there's no equivalent of REG_NOTBOL for \<.

REG_NOTBOL which btw should probably be used in:

$ (cd /etc && tar cf - issue) | bsdtar -'s/i/I/' -'s/^s/<~/p' -xf -
issue >> I<ssue

(similar issue as above where the beginning of the subject is found where it shouldn't be).

That's with:

$ bsdtar --version
bsdtar 3.7.4 - libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6

On Debian.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions