Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review graceful shutdown #1557

Open
tcompa opened this issue Jun 6, 2024 · 4 comments
Open

Review graceful shutdown #1557

tcompa opened this issue Jun 6, 2024 · 4 comments
Assignees
Labels
App&Server App life cycle, workers, gunicorn, uvicorn maintenance runner testing testing

Comments

@tcompa
Copy link
Collaborator

tcompa commented Jun 6, 2024

I'm not sure that #1547 is working as expected, for a server running as

gunicorn \
	fractal_server.main:app --workers 2 \
	--worker-class fractal_server.gunicorn_fractal.FractalWorker \
	--logger-class fractal_server.gunicorn_fractal.FractalGunicornLogger \
	--bind 0.0.0.0:8000 --access-logfile - --error-logfile -

Let's review it in a real-life example.

@tcompa tcompa added testing testing App&Server App life cycle, workers, gunicorn, uvicorn labels Jun 6, 2024
@tcompa
Copy link
Collaborator Author

tcompa commented Jun 6, 2024

Here is an example if I start the server and send a kill -6 signal to a worker pid

2024-06-06 13:58:23,856 - uvicorn.error - INFO - Application startup complete.
2024-06-06 13:58:23,879 - fractal_server.lifespan - INFO - End application startup
2024-06-06 13:58:23,879 - uvicorn.error - INFO - Application startup complete.

...

2024-06-06 13:58:39,462 - uvicorn.error - INFO - [FractalWorker.custom_handle_abort - pid=364635] Now send SIGTERM to process.
2024-06-06 13:58:39,506 - uvicorn.error - INFO - Shutting down
2024-06-06 13:58:39,607 - uvicorn.error - INFO - Waiting for application shutdown.
2024-06-06 13:58:39,607 - uvicorn.error - INFO - Application shutdown complete.
2024-06-06 13:58:39,608 - uvicorn.error - INFO - Finished server process [364635]
2024-06-06 13:58:39,4   - gunicorn.error - ERROR - [pid 364633] - Worker (pid:364635) was sent SIGTERM!


2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364638] - Booting worker with pid: 364638
2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364638] - [FractalWorker.init_signals - pid=364638] Set `custom_handle_abort` for SIGABRT
2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364633] - Handling signal: winch
2024-06-06 13:58:39,4   - gunicorn.error - INFO - [pid 364633] - Handling signal: winch
2024-06-06 13:58:40,4   - gunicorn.error - INFO - [pid 364638] - Started server process [364638]
2024-06-06 13:58:40,4   - gunicorn.error - INFO - [pid 364638] - Waiting for application startup.
2024-06-06 13:58:40,820 - fractal_server.lifespan - INFO - Start application startup
2024-06-06 13:58:40,821 - fractal_server.app.db - WARNING - SQLite is supported (for version >=3.37) but discouraged in production. Given its partial support for ForeignKey constraints, database consistency cannot be guaranteed.
2024-06-06 13:58:40,867 - fractal_server.lifespan - INFO - End application startup
2024-06-06 13:58:40,867 - uvicorn.error - INFO - Application startup complete.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 6, 2024

Got it: we are just not logging what is going on.

This has to do with

    config_uvicorn_loggers()
    logger.info("End application startup")
    reset_logger_handlers(logger)
    yield
    logger = get_logger("fractal_server.lifespan")
    logger.info("Start application shutdown")

Since we have a reset-handlers call, the next time we call get-logger there are no associated handlers.


A quick switch to set_logger leads to

2024-06-06 14:01:32,705 - fractal_server.lifespan - INFO - End application startup
2024-06-06 14:01:32,706 - uvicorn.error - INFO - Application startup complete.
2024-06-06 14:01:32,738 - fractal_server.lifespan - INFO - End application startup
2024-06-06 14:01:32,738 - uvicorn.error - INFO - Application startup complete.


2024-06-06 14:01:38,186 - uvicorn.error - INFO - [FractalWorker.custom_handle_abort - pid=364875] Now send SIGTERM to process.
2024-06-06 14:01:38,247 - uvicorn.error - INFO - Shutting down
2024-06-06 14:01:38,348 - uvicorn.error - INFO - Waiting for application shutdown.
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - Start application shutdown
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - Current worker with pid 364875 is shutting down. Current jobs: app.state.jobsV1=[], app.state.jobsV2=[]
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - Not running `cleanup_after_shutdown`
2024-06-06 14:01:38,348 - fractal_server.lifespan - INFO - End application shutdown
2024-06-06 14:01:38,349 - uvicorn.error - INFO - Application shutdown complete.
2024-06-06 14:01:38,349 - uvicorn.error - INFO - Finished server process [364875]
2024-06-06 14:01:38,4   - gunicorn.error - ERROR - [pid 364873] - Worker (pid:364875) was sent SIGTERM!


2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364878] - Booting worker with pid: 364878
2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364878] - [FractalWorker.init_signals - pid=364878] Set `custom_handle_abort` for SIGABRT
2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364873] - Handling signal: winch
2024-06-06 14:01:38,4   - gunicorn.error - INFO - [pid 364873] - Handling signal: winch
2024-06-06 14:01:39,4   - gunicorn.error - INFO - [pid 364878] - Started server process [364878]
2024-06-06 14:01:39,4   - gunicorn.error - INFO - [pid 364878] - Waiting for application startup.
2024-06-06 14:01:39,546 - fractal_server.lifespan - INFO - Start application startup

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 19, 2024

The missing logs are now included - as of #1569

@tcompa
Copy link
Collaborator Author

tcompa commented Jul 9, 2024

Today I observed a reentrant call inside <_io.BufferedWriter name='<stderr>'> upon ctrl-c. For the moment I fixed it by removing --worker-class fractal_server.gunicorn_fractal.FractalWorker, but we should review this.

If I re-add the worker class, I cannot always reproduce the error.

This is part of the error stack

    self.error_log.error(msg, *args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1506, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.emit(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1218, in emit
    StreamHandler.emit(self, record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1108, in emit
    self.handleError(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1021, in handleError
    sys.stderr.write('--- Logging error ---\n')
RuntimeError: reentrant call inside <_io.BufferedWriter name='<stderr>'>
Call stack:
  File "/somewhere/venv/bin/gunicorn", line 8, in <module>
    sys.exit(run())
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]", prog=prog).run()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 236, in run
    super().run()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 224, in run
    handler()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 261, in handle_int
    self.stop(False)
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 396, in stop
    time.sleep(0.1)
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
    self.reap_workers()
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 554, in reap_workers
    self.log.error(msg)
  File "/somewhere/venv/lib/python3.10/site-packages/gunicorn/glogging.py", line 272, in error
    self.error_log.error(msg, *args, **kwargs)
Message: 'Worker (pid:192744) was sent SIGINT!'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
App&Server App life cycle, workers, gunicorn, uvicorn maintenance runner testing testing
Projects
Development

No branches or pull requests

2 participants