Skip to content

interactive shells embedded inside non-interactive shells can become deadlocked #14762

Open
@dead10ck

Description

Describe the bug

When a non-interactive instance of nushell tries to run an interactive instance of nushell, both the child process and the parent can become deadlocked waiting for each other.

I noticed this with the tests added in 4884894, which have since been fixed with a workaround, but I still think this is a legitimate bug in and of itself that should be handled more gracefully.

After some digging, I think I found the source of the deadlock. When a non-interactive instance tries to write input to a child process through a byte stream, it will copy all input into the child process's pipe, and then simply wait for the child process to die.

ByteStreamSource::Child(mut child) => {
// All `OutDest`s except `OutDest::PipeSeparate` will cause `stderr` to be `None`.
// Only `save`, `tee`, and `complete` set the stderr `OutDest` to `OutDest::PipeSeparate`,
// and those commands have proper simultaneous handling of stdout and stderr.
debug_assert!(child.stderr.is_none(), "stderr should not exist");
if let Some(stdout) = child.stdout.take() {
match stdout {
ChildPipe::Pipe(pipe) => {
copy_with_signals(pipe, dest, span, signals)?;
}
ChildPipe::Tee(tee) => {
copy_with_signals(tee, dest, span, signals)?;
}
}
}
child.wait()?;
}

Meanwhile, the child nushell instance, in interactive mode, will attempt to acquire the terminal.

nushell/src/terminal.rs

Lines 93 to 115 in b60f91f

for _ in 0..4096 {
match unistd::tcgetpgrp(unsafe { nu_system::stdin_fd() }) {
Ok(owner_pgid) if owner_pgid == shell_pgid => {
// success
return owner_pgid;
}
Ok(owner_pgid) if owner_pgid == Pid::from_raw(0) => {
// Zero basically means something like "not owned" and we can just take it
let _ = unistd::tcsetpgrp(unsafe { nu_system::stdin_fd() }, shell_pgid);
}
Err(Errno::ENOTTY) => {
eprintln!("ERROR: no TTY for interactive shell");
std::process::exit(1);
}
_ => {
// fish also has other heuristics than "too many attempts" for the orphan check, but they're optional
if killpg(shell_pgid, Signal::SIGTTIN).is_err() {
eprintln!("ERROR: failed to SIGTTIN ourselves");
std::process::exit(1);
}
}
}
}

If it so happens that another process in the same process group currently owns the terminal, it will freeze itself with SIGTTIN, which stops the process.

I can see that the code is using the GNU documentation for implementing job control as a reference, and indeed, the code seems to follow this manual pretty closely.

However, what's not made clear is how the child process is expected to be woken up. It seems like one of the parent processes is supposed to be giving the child process the terminal, so perhaps it is supposed to wake up the child process as well?

How to reproduce

I was able to come up with a test case that triggers this bug pretty easily:

1..20 | par-each -t 10 { print $"start ($in)"; nu --no-std-lib --no-config-file -c $"nu -i -c 'print ($in)'"; print $"end ($in)" }

Expected behavior

I'm really not sure what the correct behavior is, since the parent is non-interactive itself, and therefore does not have the terminal.

Configuration

key value
version 0.101.0
major 0
minor 101
patch 0
branch
commit_hash
build_os linux-x86_64
build_target aarch64-linux-android
rust_version rustc 1.81.0 (eeb90cda1 2024-09-04)
rust_channel 1.81.0-x86_64-unknown-linux-gnu
cargo_version cargo 1.81.0 (2dbb1af80 2024-08-20)
build_time 2024-12-23 00:36:54 +00:00
build_rust_channel release
allocator mimalloc
features default, sqlite, trash
installed_plugins dns 3.0.7-alpha.1, polars 0.98.0

Metadata

Assignees

No one assigned

    Labels

    🐛 bugSomething isn't workingpipelinesignal-cancelProblems with how executions react to signals or attempts at cancellation (CTRL-C)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions