Review graceful shutdown #1557

tcompa · 2024-06-06T13:56:07Z

I'm not sure that #1547 is working as expected, for a server running as

gunicorn \
	fractal_server.main:app --workers 2 \
	--worker-class fractal_server.gunicorn_fractal.FractalWorker \
	--logger-class fractal_server.gunicorn_fractal.FractalGunicornLogger \
	--bind 0.0.0.0:8000 --access-logfile - --error-logfile -

Let's review it in a real-life example.

The text was updated successfully, but these errors were encountered:

tcompa · 2024-06-06T14:00:09Z

Here is an example if I start the server and send a kill -6 signal to a worker pid

2024-06-06 13:58:23,856 - uvicorn.error - INFO - Application startup complete.
2024-06-06 13:58:23,879 - fractal_server.lifespan - INFO - End application startup
2024-06-06 13:58:23,879 - uvicorn.error - INFO - Application startup complete.

...

2024-06-06 13:58:39,462 - uvicorn.error - INFO - [FractalWorker.custom_handle_abort - pid=364635] Now send SIGTERM to process.
2024-06-06 13:58:39,506 - uvicorn.error - INFO - Shutting down
2024-06-06 13:58:39,607 - uvicorn.error - INFO - Waiting for application shutdown.
2024-06-06 13:58:39,607 - uvicorn.error - INFO - Application shutdown complete.
2024-06-06 13:58:39,608 - uvicorn.error - INFO - Finished server process [364635]
2024-06-06 13:58:39,4   - gunicorn.error - ERROR - [pid 364633] - Worker (pid:364635) was sent SIGTERM!


2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364638] - Booting worker with pid: 364638
2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364638] - [FractalWorker.init_signals - pid=364638] Set `custom_handle_abort` for SIGABRT
2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364633] - Handling signal: winch
2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364633] - Handling signal: winch
2024-06-06 13:58:40,4   - gunicorn.error - INFO - [pid 364638] - Started server process [364638]
2024-06-06 13:58:40,4   - gunicorn.error - INFO - [pid 364638] - Waiting for application startup.
2024-06-06 13:58:40,820 - fractal_server.lifespan - INFO - Start application startup
2024-06-06 13:58:40,821 - fractal_server.app.db - WARNING - SQLite is supported (for version >=3.37) but discouraged in production. Given its partial support for ForeignKey constraints, database consistency cannot be guaranteed.
2024-06-06 13:58:40,867 - fractal_server.lifespan - INFO - End application startup
2024-06-06 13:58:40,867 - uvicorn.error - INFO - Application startup complete.

tcompa · 2024-06-06T14:05:13Z

Got it: we are just not logging what is going on.

This has to do with

    config_uvicorn_loggers()
    logger.info("End application startup")
    reset_logger_handlers(logger)
    yield
    logger = get_logger("fractal_server.lifespan")
    logger.info("Start application shutdown")

Since we have a reset-handlers call, the next time we call get-logger there are no associated handlers.

A quick switch to set_logger leads to

2024-06-06 14:01:32,705 - fractal_server.lifespan - INFO - End application startup
2024-06-06 14:01:32,706 - uvicorn.error - INFO - Application startup complete.
2024-06-06 14:01:32,738 - fractal_server.lifespan - INFO - End application startup
2024-06-06 14:01:32,738 - uvicorn.error - INFO - Application startup complete.


2024-06-06 14:01:38,186 - uvicorn.error - INFO - [FractalWorker.custom_handle_abort - pid=364875] Now send SIGTERM to process.
2024-06-06 14:01:38,247 - uvicorn.error - INFO - Shutting down
2024-06-06 14:01:38,348 - uvicorn.error - INFO - Waiting for application shutdown.
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - Start application shutdown
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - Current worker with pid 364875 is shutting down. Current jobs: app.state.jobsV1=[], app.state.jobsV2=[]
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - Not running `cleanup_after_shutdown`
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - End application shutdown
2024-06-06 14:01:38,349 - uvicorn.error - INFO - Application shutdown complete.
2024-06-06 14:01:38,349 - uvicorn.error - INFO - Finished server process [364875]
2024-06-06 14:01:38,4   - gunicorn.error - ERROR - [pid 364873] - Worker (pid:364875) was sent SIGTERM!


2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364878] - Booting worker with pid: 364878
2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364878] - [FractalWorker.init_signals - pid=364878] Set `custom_handle_abort` for SIGABRT
2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364873] - Handling signal: winch
2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364873] - Handling signal: winch
2024-06-06 14:01:39,4   - gunicorn.error - INFO - [pid 364878] - Started server process [364878]
2024-06-06 14:01:39,4   - gunicorn.error - INFO - [pid 364878] - Waiting for application startup.
2024-06-06 14:01:39,546 - fractal_server.lifespan - INFO - Start application startup

tcompa · 2024-06-19T08:22:19Z

The missing logs are now included - as of #1569

tcompa · 2024-07-09T13:06:59Z

Today I observed a reentrant call inside <_io.BufferedWriter name='<stderr>'> upon ctrl-c. For the moment I fixed it by removing --worker-class fractal_server.gunicorn_fractal.FractalWorker, but we should review this.

If I re-add the worker class, I cannot always reproduce the error.

This is part of the error stack

    self.error_log.error(msg, *args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1506, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.emit(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1218, in emit
    StreamHandler.emit(self, record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1108, in emit
    self.handleError(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1021, in handleError
    sys.stderr.write('--- Logging error ---\n')
RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'>
Call stack:
  File "/somewhere/venv/bin/gunicorn", line 8, in <module>
    sys.exit(run())
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]", prog=prog).run()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 236, in run
    super().run()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 224, in run
    handler()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 261, in handle_int
    self.stop(False)
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 396, in stop
    time.sleep(0.1)
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
    self.reap_workers()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 554, in reap_workers
    self.log.error(msg)
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/glogging.py", line 272, in error
    self.error_log.error(msg, *args, **kwargs)
Message: 'Worker (pid:192744) was sent SIGINT!'

tcompa added testing testing App&Server App life cycle, workers, gunicorn, uvicorn labels Jun 6, 2024

jluethi added this to Fractal Project Management Jun 6, 2024

github-project-automation bot moved this to TODO in Fractal Project Management Jun 6, 2024

tcompa assigned mfranzon Jun 12, 2024

tcompa added the maintenance label Jul 17, 2024

tcompa added the runner label Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review graceful shutdown #1557

Review graceful shutdown #1557

tcompa commented Jun 6, 2024

tcompa commented Jun 6, 2024

tcompa commented Jun 6, 2024 •

edited

Loading

tcompa commented Jun 19, 2024

tcompa commented Jul 9, 2024

Review graceful shutdown #1557

Review graceful shutdown #1557

Comments

tcompa commented Jun 6, 2024

tcompa commented Jun 6, 2024

tcompa commented Jun 6, 2024 • edited Loading

tcompa commented Jun 19, 2024

tcompa commented Jul 9, 2024

tcompa commented Jun 6, 2024 •

edited

Loading