Skip to content

Commit

Permalink
debug and improve auto_stop for triton server
Browse files Browse the repository at this point in the history
  • Loading branch information
kpedro88 committed Feb 10, 2021
1 parent ba877ae commit 7bf9345
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 5 deletions.
1 change: 1 addition & 0 deletions HeterogeneousCore/SonicTriton/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ The script has two operations (`start` and `stop`) and the following options:
* `-m [dir]`: specific model directory (can be given more than one)
* `-n [name]`: name of container instance, also used for hidden temporary dir (default: triton_server_instance)
* `-P [port]`: base port number for services (-1: automatically find an unused port range) (default: 8000)
* `-p [pid]`: automatically shut down server when process w/ specified PID ends (-1: use parent process PID)
* `-p`: automatically shut down server when parent process ends
* `-r [num]`: number of retries when starting container (default: 3)
* `-s [dir]`: Singularity sandbox directory (default: /cvmfs/unpacked.cern.ch/registry.hub.docker.com/fastml/triton-torchgeo:20.09-py3-geometric)
Expand Down
20 changes: 16 additions & 4 deletions HeterogeneousCore/SonicTriton/scripts/cmsTriton
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ usage() {
$ECHO "-m [dir] \t specific model directory (can be given more than one)"
$ECHO "-n [name] \t name of container instance, also used for default hidden temporary dir (default: ${SERVER})"
$ECHO "-P [port] \t base port number for services (-1: automatically find an unused port range) (default: ${BASEPORT})"
$ECHO "-p \t automatically shut down server when parent process ends"
$ECHO "-p [pid] \t automatically shut down server when process w/ specified PID ends (-1: use parent process PID)"
$ECHO "-r [num] \t number of retries when starting container (default: ${RETRIES})"
$ECHO "-s [dir] \t Singularity sandbox directory (default: ${SANDBOX})"
$ECHO "-t [dir] \t non-default hidden temporary dir"
Expand All @@ -56,7 +56,7 @@ if [ -e /run/shm ]; then
SHM=/run/shm
fi

while getopts "cDdfgi:M:m:n:P:pr:s:t:vw:h" opt; do
while getopts "cDdfgi:M:m:n:P:p:r:s:t:vw:h" opt; do
case "$opt" in
c) CLEANUP=""
;;
Expand All @@ -78,7 +78,7 @@ while getopts "cDdfgi:M:m:n:P:pr:s:t:vw:h" opt; do
;;
P) if [ "$OPTARG" -eq -1 ]; then AUTOPORT=true; else BASEPORT="$OPTARG"; fi
;;
p) PARENTPID="$PPID"
p) if [ "$OPTARG" -eq -1 ]; then PARENTPID="$PPID"; else PARENTPID="$OPTARG"; fi
;;
r) RETRIES="$OPTARG"
;;
Expand Down Expand Up @@ -297,13 +297,25 @@ auto_stop(){
PARENTPID="$2"

if [ -n "$PARENTPID" ]; then
if [ -n "$VERBOSE" ]; then
echo "watching PID $PARENTPID"
ps
fi
PCOUNTER=0
PMAX=5
while [ "$PCOUNTER" -le "$PMAX" ]; do
if ! kill -0 $PARENTPID >& /dev/null; then
PCOUNTER=$((PCOUNTER+1))
if [ -n "$VERBOSE" ]; then
echo "trigger $PCOUNTER:"
ps
fi
else
# must get 5 in a row, otherwise reset
# must get N in a row, otherwise reset
if [ "$PCOUNTER" -gt 0 ] && [ -n "$VERBOSE" ]; then
echo "reset:"
ps
fi
PCOUNTER=0
fi
sleep 1
Expand Down
3 changes: 2 additions & 1 deletion HeterogeneousCore/SonicTriton/src/TritonService.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include <filesystem>
#include <utility>
#include <tuple>
#include <unistd.h>

namespace ni = nvidia::inferenceserver;
namespace nic = ni::client;
Expand Down Expand Up @@ -207,7 +208,7 @@ void TritonService::preBeginJob(edm::PathsAndConsumesOfModulesBase const&, edm::
}

//assemble server start command
std::string command("cmsTriton -p -P -1");
std::string command("cmsTriton -P -1 -p " + std::to_string(::getpid()));
if (fallbackOpts_.debug)
command += " -c";
if (fallbackOpts_.verbose)
Expand Down

0 comments on commit 7bf9345

Please sign in to comment.