Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF occurred in violation of protocol #719

Closed
muatik opened this issue Jan 5, 2016 · 26 comments
Closed

EOF occurred in violation of protocol #719

muatik opened this issue Jan 5, 2016 · 26 comments
Labels
PyVer: python3 Affects Python 3 Type: Bug Identified as a bug; needs a code change to fix
Milestone

Comments

@muatik
Copy link

muatik commented Jan 5, 2016

I am trying to run a socket server with ssl support. For this purpose I am usung flask-socketio which uses gevent behind. I think flask-socketio has nothing to do with ssl options, it is just passing to ssl configurations to gevent init.

Socket server with ssl support was possible, but after re-installing all the pip packages, I started to get the following error.

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gevent/server.py", line 102, in wrap_socket_and_handle
    ssl_socket = self.wrap_socket(client_socket, **self.ssl_args)
  File "/usr/local/lib/python2.7/dist-packages/gevent/_ssl2.py", line 410, in wrap_socket
    ciphers=ciphers)
  File "/usr/local/lib/python2.7/dist-packages/gevent/_ssl2.py", line 93, in __init__
    self.do_handshake()
  File "/usr/local/lib/python2.7/dist-packages/gevent/_ssl2.py", line 310, in do_handshake
    return self._sslobj.do_handshake()
SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol
<Greenlet at 0x7fbc02c4a9b0: <bound method WSGIServer.wrap_socket_and_handle of <WSGIServer at 0x7fbc03b9b110 fileno=9 address=0.0.0.0:5000>>(<socket at 0x7fbc02bf7590 fileno=76 sock=10.122.97, ('41.234.232.59', 40471))> failed with SSLError

is there any solution?

@twall
Copy link

twall commented Jan 12, 2016

I've seen this as well, although the error seems sporadic (I don't see it on all SSL requests).

@muatik
Copy link
Author

muatik commented Jan 12, 2016

Yes, I agree with you, it does not occur for all SSL requests. I can reproduce this error by refreshing page, which is making socket connection, very quickly. I guess, in this case requests does not complete, consequently error occurs.

I tackle this problem by placing nginx server as a proxy to receive ssl requests.

@maryokhin
Copy link

Upgraded from rc1 to rc3 and started receiving this error.

@jamadden
Copy link
Member

@maryokhin What version of Python are you on? Do you experience this issue in rc2?

What Python/gevent versions are others experiencing this on?

The sporadic nature of this makes it difficult to debug. A self-contained failing testcase would be very helpful.

@maryokhin
Copy link

@jamadden Python 3.5 + rc1 passes, rc2 & rc3 fails. Tested on whatever Ubuntu version Travis uses.

@jamadden
Copy link
Member

Tested on whatever Ubuntu version Travis uses.

@maryokhin Tested how? Is it separable?

@maryokhin
Copy link

@jamadden: It was a Travis build running a Docker container of Django 1.9.1 on gunicorn 19.4.5 using a Python 3.5 interpreter 😬 I understand that's not very reproducible or useful, but rc1 works under the same conditions and rc2/rc3 doesn't, I upgraded/downgraded back and fourth.

@jamadden
Copy link
Member

Thanks, that is helpful, especially because you see this on Python 3, because the only change to the SSL module for Python 3 between rc1 and rc2 was in the exceptions that get raised on timeouts. This traded one loop for another apparently equivalent loop, whose only apparent difference is that it properly respects the total timeout.

Posting the traceback you get on Python 3 might also be helpful.

@maryokhin
Copy link

Sure. I'm guessing this happens as soon as the first SSL request is made:

Traceback (most recent call last):
  File "/usr/src/app/manage.py", line 19, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.5/site-packages/django/core/management/__init__.py", line 327, in execute
    django.setup()
  File "/usr/local/lib/python3.5/site-packages/django/__init__.py", line 18, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/usr/local/lib/python3.5/site-packages/django/apps/registry.py", line 108, in populate
    app_config.import_models(all_models)
  File "/usr/local/lib/python3.5/site-packages/django/apps/config.py", line 202, in import_models
    self.models_module = import_module(models_module_name)
  File "/usr/local/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 662, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/usr/src/app/socore/models.py", line 2250, in <module>
    class ScheduledPushMessage(models.Model):
  File "/usr/src/app/socore/models.py", line 2268, in ScheduledPushMessage
    topic = models.CharField(max_length=50, choices=get_sns_topics())
  File "/usr/src/app/socore/models.py", line 2220, in get_sns_topics
    topics = nb.list_topics()
  File "/usr/src/app/sonotifications/sns_backend.py", line 33, in list_topics
    sns_topics = self.connection.get_all_topics()
  File "/usr/local/lib/python3.5/site-packages/boto/sns/connection.py", line 118, in get_all_topics
    return self._make_request('ListTopics', params)
  File "/usr/local/lib/python3.5/site-packages/boto/sns/connection.py", line 757, in _make_request
    path=path, params=params)
  File "/usr/local/lib/python3.5/site-packages/boto/connection.py", line 1116, in make_request
    return self._mexe(http_request)
  File "/usr/local/lib/python3.5/site-packages/boto/connection.py", line 1030, in _mexe
    raise ex
  File "/usr/local/lib/python3.5/site-packages/boto/connection.py", line 943, in _mexe
    request.body, request.headers)
  File "/usr/local/lib/python3.5/http/client.py", line 1083, in request
    self._send_request(method, url, body, headers)
  File "/usr/local/lib/python3.5/http/client.py", line 1128, in _send_request
    self.endheaders(body)
  File "/usr/local/lib/python3.5/http/client.py", line 1079, in endheaders
    self._send_output(message_body)
  File "/usr/local/lib/python3.5/http/client.py", line 913, in _send_output
    self.send(message_body)
  File "/usr/local/lib/python3.5/http/client.py", line 885, in send
    self.sock.sendall(data)
  File "/usr/local/lib/python3.5/site-packages/gevent/_ssl3.py", line 278, in sendall
    return socket.sendall(self, data, flags)
  File "/usr/local/lib/python3.5/site-packages/gevent/_socket3.py", line 350, in sendall
    data_sent += self.send(data_memory[data_sent:], flags, timeout=timeleft)
  File "/usr/local/lib/python3.5/site-packages/gevent/_ssl3.py", line 241, in send
    return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1846)

@jamadden jamadden added Type: Bug Identified as a bug; needs a code change to fix PyVer: python3 Affects Python 3 labels Feb 3, 2016
@jamadden jamadden added this to the 1.1 milestone Feb 3, 2016
@jamadden
Copy link
Member

jamadden commented Feb 3, 2016

The error in wrap_socket_and_handle/_do_handshake is easily reproducible: You get that exact error if the client closes the connection before the handshake has completed. That's not that unlikely a scenario in a browser making multiple requests to a possibly loaded server.

The handshake process is also entirely in C and bypasses the Python timeout issue that was changed, so the sendall issue seems unrelated.

I'm having difficulties reproducing the sendall issue.

@jamadden
Copy link
Member

jamadden commented Feb 3, 2016

Continuing on the sendto error:

The Python-level SSLEOFError with that message is raised from the C module _ssl.c:PySSL_SetError when the openssl object reports a SSL_ERROR_SYSCALL from SSL_get_error with an error number of 0. openssl documents that it does that under exactly the circumstances the error string says (an EOF occurred in violation of the protocol). However, that's not entirely clear from the code...

Looking more closely at the implementation of SSL_get_error, there are two basic times when SSL_ERROR_SYSCALL can be returned:

  1. We're trying to read/write, and the IO object reports that it needs to do "special" IO work, but it doesn't describe what that work is; the only two types of special work handled are a socket needing to connect or accept.
  2. Any other random unhandled case.

Let's consider branch (1) for now (as a catch-all, branch (2) is not that helpful), and because we're probably trying to write, based on the location of the error. The only place I can find that puts the IO object into the "special" state is part of the handshake process. Either it needs a connect (which appears to be propagated along, I hope) or it needs to do a certificate lookup (which is not propagated in any way). The connect case is handled by Python with a distinct error message.

So that leaves us with the certificate case during a handshake (which I think can be re-negotiated at almost any time?), or "any unknown error" (specifically, we didn't need to read or write or explicitly do an cert lookup---that's checked after the read/write cases---and the socket isn't being closed).

I haven't been able to reproduce this using either a Python server or the openssl s_server, sending data of various sizes from small to very large under OS X or a recent Ubuntu.

Is anyone else able to reproduce this reliably? @maryokhin can you still reproduce it (perhaps it was a transient SNS issue)? Can you reproduce it outside of the travis environment, or hitting a different availability zone?

Other thoughts?

@kurtbrose
Copy link

Not sure of this is helpful, but I've dealt with open SSL a bit.

The most common way (in my experience) you get the "eof in violation of protocol" message is when the client fails to send an SSL shutdown, but just closes the socket when communication is complete.

There is an option "suppress_ragged_eofs"

@kurtbrose
Copy link

From SSL module docs:

The parameter suppress_ragged_eofs specifies how the SSLSocket.read() method should signal unexpected EOF from the other end of the connection. If specified as True (the default), it returns a normal EOF (an empty bytes object) in response to unexpected EOF errors raised from the underlying socket; if False, it will raise the exceptions back to the caller.

@jamadden
Copy link
Member

jamadden commented Feb 5, 2016

Thanks. But as the quoted docs note, that parameter defaults to True, and I don't see anywhere in the http module where that is changed, or even an easy way for a higher level client to do so (short of subclassing and monkey-patching).

@jamadden
Copy link
Member

jamadden commented Feb 5, 2016

In the stdlib (and in gevent's modified copy of it), suppress_ragged_eofs is also only used by socket.read; socket.write and socket.send ignore that parameter.

@kurtbrose
Copy link

Nuts.

Were you able to reproduce the error in 61c6a5a?

(If so I can grab a packet capture.)

@jamadden
Copy link
Member

jamadden commented Feb 5, 2016

Sadly I have not been able to reproduce the error with that code or numerous variations in it, including using an OpenSSL server, various data sizes and timeouts, etc.

@kurtbrose
Copy link

I've seen this happen "around" high level APIs by failing to call shutdown() on the underlying SSL structure.

Basically, imagine a test script that uses urllib / httplib to send some HTTP request, then immediately exits. Not only is this a problem for SSL, I believe in TCP cases the request may also have data loss.

(I guess HTTP insulates from having to understand this somewhat since it follows a model where client controls connection set up and tear down, and also initiates all requests.)

I've never looked too closely at the standard library ssl module before, but that looks kind of bad: https://hg.python.org/cpython/file/2.7/Lib/ssl.py#l789 to ensure against data loss every SSL socket should have SSL_shutdown called on it before sockets are closed. (https://www.openssl.org/docs/manmaster/ssl/SSL_shutdown.html). But it looks like standard library only does this in the case that unwrap() is called.

So, that may be what the original reporter was doing. (I've left a line comment on the test commit -- I can try reproducing later today maybe.)

@kurtbrose
Copy link

Hmm.... wait maybe not, because the stack trace had that in the middle of do_handshake, so this must be something different,

@jamadden
Copy link
Member

jamadden commented Feb 5, 2016

I think the original do_handshake case is pretty adequately explained by a client disconnect. But the second sendto case is unexplained and thus more worrying to me (all I can get a client disconnect to do there is raise the expected Broken Pipe error).

@tsavola
Copy link

tsavola commented Feb 18, 2016

The following two code snippets trigger the bug every time with Python 3.5. Tested on Ubuntu 15.10 (amd64).

Buggy code using send

import gevent.monkey
gevent.monkey.patch_all()

import socket, ssl
sock = socket.create_connection(("example.org", 443))
sock = ssl.wrap_socket(sock)
sock.send(b"")
Traceback (most recent call last):
  File "bug1.py", line 7, in <module>
    sock.send(b"")
  File "/home/user/bug/geventinstall/lib/python3.5/site-packages/gevent-1.1rc4-py3.5-linux-x86_64.egg/gevent/_ssl3.py", line 337, in send
    return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1847)
  • It still fails if a timeout is set when creating the connection (may be interesting regarding the second code snippet).
  • It works if the last line is changed to send at least one byte.

Buggy code using sendall with timeout

import gevent.monkey
gevent.monkey.patch_all()

import socket, ssl
sock = socket.create_connection(("example.org", 443), timeout=60)
sock = ssl.wrap_socket(sock)
sock.sendall(b"")
Traceback (most recent call last):
  File "bug2.py", line 7, in <module>
    sock.sendall(b"")
  File "/home/user/bug/geventinstall/lib/python3.5/site-packages/gevent-1.1rc4-py3.5-linux-x86_64.egg/gevent/_ssl3.py", line 374, in sendall
    return socket.sendall(self, data, flags)
  File "/home/user/bug/geventinstall/lib/python3.5/site-packages/gevent-1.1rc4-py3.5-linux-x86_64.egg/gevent/_socket3.py", line 353, in sendall
    data_sent += self.send(data_memory[data_sent:], flags, timeout=timeleft)
  File "/home/user/bug/geventinstall/lib/python3.5/site-packages/gevent-1.1rc4-py3.5-linux-x86_64.egg/gevent/_ssl3.py", line 337, in send
    return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1847)
  • It works if the timeout is removed.
  • It works if the last line is changed to send at least one byte.

@tsavola
Copy link

tsavola commented Feb 18, 2016

Unsurprisingly, the monkey patching is unnecessary. It works in the same way if gevent.socket and gevent.ssl are used directly.

@tsavola
Copy link

tsavola commented Feb 18, 2016

When looking at Python 3.5's _ssl__SSLSocket_write_impl in Module/_ssl.c, and reading the SSL_write manpage, it seems that gevent's SSLObject.send implementation should never pass empty data to the self._sslobj.write call, like it does now.

tsavola pushed a commit to ninchat/gevent that referenced this issue Feb 18, 2016
@jamadden
Copy link
Member

Thanks, that was the missing piece! I'm adding test cases and fixes to gevent now.

Note that SSLSocket.send(b'') raises SSLEOFError when you use the standard library, whereas SSLSocket.sendall(b'') does not. gevent's send method currently raises the SSLEOFError, so only sendall is broken (compared to the standard library).

$ python3.5
Python 3.5.1 (default, Dec 11 2015, 09:20:00)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket, ssl
>>> sock = socket.create_connection(("example.org", 443))
>>> sock = ssl.wrap_socket(sock)
>>> sock.send(b"")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 856, in send
    return self._sslobj.write(data)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 581, in write
    return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1846)
>>> sock = socket.create_connection(("example.org", 443), timeout=60)
>>> sock = ssl.wrap_socket(sock)
>>> sock.sendall(b"")
0
>>>

@SuperMasterBlasterLaser
Copy link

This problem still persists.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 128, in ack_log_error
self.ack(multiple=multiple)
  File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 123, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
  File "/usr/local/lib/python3.10/site-packages/amqp/channel.py", line 1407, in basic_ack
return self.send_method(
  File "/usr/local/lib/python3.10/site-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
  File "/usr/local/lib/python3.10/site-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
  File "/usr/local/lib/python3.10/site-packages/amqp/transport.py", line 347, in write
self._write(s)
  File "/usr/local/lib/python3.10/site-packages/amqp/transport.py", line 595, in _write
n = write(s)
  File "/usr/local/lib/python3.10/site-packages/gevent/_ssl3.py", line 420, in write
return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2396)

While using Celery + Gevent + RabbitMQ

I have placed a lot of logs. The interesting thing is that code inside this task finishes but gevent worker itself hangs and does not send success results to RabbitMQ.

@tommasofavaron1
Copy link

Celery

I have the same celery + Gevent + RabbitMQ problem, do you fix the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PyVer: python3 Affects Python 3 Type: Bug Identified as a bug; needs a code change to fix
Projects
None yet
Development

No branches or pull requests

8 participants