Closed
Description
Describe the bug
RaspberryMatic seems not accept new connections, as long as the old one have not been terminated.
To Reproduce
Steps to reproduce the behavior:
- connect RaspberryMatic to e.g. Home Assistant
- configure some sensors to be monitored by your control software
- restart your RaspberryMatic Software
Expected behavior
A reconnect should be successful.
System information (please complete the following information):
- Version RaspberryMatic 3.51.6.20200420
- Hardware RaspberryPi3 B+
- for reference: Homematic Version 0.110.1
Additional context
This issue has also been discussed in https://community.home-assistant.io/t/homematic-reconnect-does-not-work/161853/18
Activity
danielperna84 commentedon May 22, 2020
To clarify, this is about XML-RPC. But only for HmIP. There are no problems with the old wired and wireless protocols.
That being said, I don't believe RaspberryMatic can do anything about this.
n0l0cale commentedon May 22, 2020
thought maybe we can come a step further when more guys are involved.....
jens-maus commentedon May 22, 2020
@n0l0cale Please provide adequate example XMLRPC communication output between home assistent and the CCU/RaspberryMatic/HmIPServer where the issue is demonstrated. Only with detailed information you will have any chance that this issue will get fixed by eQ3.
n0l0cale commentedon May 22, 2020
@danielperna84 - could you please give me assistance with providing such a sample? I think you are deeper in this than I am....
danielperna84 commentedon May 22, 2020
@n0l0cale
I'll see if I can come up with a script that continuously reproduces the error. Maybe over the weekend.
danielperna84 commentedon May 22, 2020
Ok, so I have created the following test-script: https://gist.github.com/danielperna84/28653b1ebf760064cdbecf5f4828f5f3
Adjust the variables at the top (the IP addresses mainly), then execute it with
python3 hmip-xml-rpc-bug.py
. Don't forget to allow the incoming traffic on the specified port.On the CCU / RaspberryMatic you'll get something the following when the second
init
request has been sent:This only happens for HmIP. Doing the same with port 2001 shows no errors in the
hmserver.log
(although I don't get ANY logs for 2001).At the bottom of the script the wait times could be increased. What will be observed is, that after the first
init
we receive events from the devices (if we wait long enough for events to happen obviously). After the secondinit
we don't receive any events anymore. I assume this is a result of the exception.That being said, the reason for this behavior seems to be, that we don't just return an empty list to the CCU after the
init
- it already has the device descriptions from the firstinit
. If aclearDevices()
is added after the firsttime.sleep(5)
, we don't get any errors on the CCU during the secondinit
. This fits the observation in Home Assistant, which needs to be restarted to receive events again. The restart of Home Assistant clears the local devices, which is why upon the next startup it will be in the state like this script is during the firstinit
.So to sum this up: the HMIPServer on the CCU fails to accept init-requests if the
listDevices
-call it makes to our server returns more than just an empty list. With the old RF and wired devices this works without any issues.From my point of view this definitely is a bug, because there would be no point in asking the remote server for the known devices if the presence of devices leads to a failure. On top of that, the CCU should not require the remote-end to clear its knowledge about present devices and set them up all the way from the beginning.
As far as I am aware of, the XML-RPC API documentation does not state, that the format of the response to the
listDevices
call has to match a specific format. Hence copying what the CCU itself returns when callinglistDevices
(that's essentially what the script is doing) should not lead to such an error. And again, this works for the old wired and RF interfaces. So for HmIP it should either behave the same way, or the limitations need to be communicated clearly.Edit:
One more detail I found while looking at this on RaspberryMatic with strace is, that the response of our server always gets cut to
8192
bytes. The (redacted) trace (done on a different machine) looks like this:Obviously the XML is bad since it ends abruptly. So the exception makes sense. The question is why the CCU only receives the first 8192 bytes. I also traced the other end, and the data that gets sent as a response is not truncated. This leads me to believe, that on the CCU there seems to be some buffer-limitation for incoming data.
n0l0cale commentedon Jun 3, 2020
@jens-maus hi man - should I support you in writing a bug report to the upstream project? Do you have any contact details or do you want to do it by your own? Do you need maybe any further information?
jens-maus commentedon Jun 3, 2020
@n0l0cale Nope. here is the correct place for this kind of bug report on HmIPServer because eq3 ist more or less listening/reading here when I point them at this ticket (will do). So thanks for the infos here. As soon as I had time to reproduce it I will flag it as reproduced and then its' eq3 task to investigate+fix.
jens-maus commentedon Jun 3, 2020
@danielperna84 I finally tried to reproduce the issue using your
hmip-xml-rpc-bug.py
script and here are my results:I was able to reproduce these error outputs in
hmserver.log
as soon as an listDevices() answer is sent to HmIPServer with a non-empty device list. So yes, there indeed seems to be an issue in dealing with receiving a list of devices with a listDevices() rpc execution in HmIPServer.Please note that the BidCos-RF service on Port 2001 is not provided by HmIPServer but the
rfd
service which in fact in a C++ tool and output its debug/error output directly to syslog, hence you have to look into/var/log/messages
for potential error messages for the rfd/BidCos-RF service on Port 2001.This I could not reproduce here. In fact, even after the second init() I perfectly receive events, at least for the virtual devices I could still trigger in the WebUI.
Can you please provide example output of your script showing the exact payload of the error? Would really help if one can see that after the second init() no events are received anymore. As said, I can perfectly use the WebUI here and do a
PRESS_SHORT
orPRESS_LONG
on the internal/virtual devices the WebUI provides and then I can see these events in the python script, even after the second init.I fully agree with you that it should be perfectly able to directly provide a device list in the listDevices() answer and the XMLRPC service of HmIPServer should then just diff the provided devices and just call newDevices() with the devices missing. However, please note that the XMLRPC Service in HmIPServer is unfortunately quite legacy and has some shortcomings/bugs. But I will try to get eQ3 informed and see that this issue will be resolved. As a workaround (until the issue is fixed), I would suggest that you change the initialization in home assistent that no device list is provided in the listDevices() execution for the HmIP service query and that you do the diffing (old vs new devices) on your own. AFAIK ioBroker and other third-party services are doing the same and do not rely on the XMLRPC service for doing the listDevices() based diffing as one would expect. See here, for example:
https://github.com/ioBroker/ioBroker.hm-rpc/blob/master/hm-rpc.js#L1013
There, ioBroker.hm-rpc is doing the same for the HmIP service and I think for the exact same reason.
Nope, the format should be perfectly fine as you provided it, at least AFAICS.
Thanks for that one. This indeed looks like there is a buffer limiting factor here that could be one of the main reasons why using listDevices() with the HmIPServer XMLRPC service is not working correctly ATM. I will forward these observations to my eQ3 contacts and hope they will fix it in the next release.
16 remaining items