-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor improvement to TritonService #32861
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32861/21088
|
A new Pull Request was created by @kpedro88 (Kevin Pedro) for master. It involves the following packages: HeterogeneousCore/SonicTriton @makortel, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
7bf9345
to
7d47581
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32861/21089
|
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b7e177/12815/summary.html Comparison SummarySummary:
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
While testing an algorithm with a significantly longer inference time than the one used for the SonicTriton unit test, I re-encountered the issue with the fallback server shutting down too early.
Adding some debugging info to
auto_stop
, I found that using$PPID
for the fallback server did not actually get the PID of thecmsRun
process, but rather thesh
process spawned by thepopen
call. Apparently, thissh
process hangs around long enough for the unit test to complete, ifauto_stop
is delayed by a few seconds (in the case of Singularity reading from cvmfs, which is slightly slower than a local read). However, this is not reliable or general.Instead, I now pass the
cmsRun
PID directly when starting the fallback server. This works to avoid the previous failure (tested by settingPMAX=1
incmsTriton
). I've retained the valuePMAX=5
in this PR just in case some other instability might arise.PR validation:
Reran stress tests from #32576.
@makortel @silviodonato @qliphy it would be nice to get this into pre3 if the deadline has not passed (it's really just a minor bug fix).