-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
best effort subscription not working between two computers #30
Comments
We're running through an unmanaged switch. Both computers are plugged into it, and there's no gateway. |
That configuration seems like it should work. |
What configuration are you running? |
The same. |
I've emailed you a link to our build. |
Your build also works for me. Would it be possible to get a network packet capture taken on one of the two hosts? Start the capture, run the two test programs, wait for a bit (30 seconds?), then stop capture... ctucker@ubuntu_2:~/asi_ros2$ ros2 topic echo /chatter std_msgs/String data: 'Hello World: 3' data: 'Hello World: 4' data: 'Hello World: 5' data: 'Hello World: 6' data: 'Hello World: 7' |
on_listener_computer.pcapng.tar.gz It doesn't look like I was seeing any of the autodiscovery from the other computer, but maybe I was just looking at it wrong. |
Yep. Can you take a capture on the other computer? |
test_two.tar.gz |
In that last set of captures, it looks like discovery completed successfully, and I can see that there was a match on the /chatter topic. However, no DATA messages show up at all. |
the builtin ufw is the only one that I know of, and it's disabled on both computers. And messages do get through if we subscribe reliable. It's the just the best effort subscription (echo) that doesn't work. |
Hmmm. I get very different captures when I run the two programs:
They create only a single DDS DataWriter / DataReader on the "/chatter" topic, and none of the others that I see in your capture[s] ( for example, "/talker/get_parametersReply", "/talker/get_parameter_typesReply", etc). Are you running a different test? |
Ah. The other computer was running the cpp talker by accident. That includes parameter services. The python nodes don't. We could make another capture without it if that helps. |
OK, that explains it, I just wanted to make sure I was looking at the right thing. |
I still can't reproduce this locally...
Then, set the DDS_DEBUG environment variable to 7, and run the test:
And, for completeness, you could do the same on the 'echo' side. I would expect the log to look a little like this:
|
Clark, I've been working with Bryant on this issue. Here are the logs:
We really appreciate your help on this. Let me know if there is anything else we can do to help resolve this. thanks. |
OK. That's very helpful. I can verify that the talker is sending samples in both scenarios. They are sent over multicast (and apparently not received). When matched with the listener (reliable), we also send a heartbeat (multicast + unicast). This allows the listener to NACK the missing sample which is then [re]sent via unicast. When matched with echo (best_effort), the sample is sent over multicast only. This as in the listener scenario, is not received. So, the question is, why are the multicast 'chatter' samples not being received at the listener/echo machine? [The earlier captures show that at least some of the 'discovery' data is successfully transferred...] Could you rerun the echo scenario with an additional debug setting:
And a slightly different grep:
This should show us specifically which interface[s] coredx is trying to write to. |
Here you go. Thank you for the quick response! Also, for what it's worth, talker is running on the 172.31,255.112 computer, and the listener is running on the 172.31.255.103 computer. |
Cool, thanks. Could you send the 'talker' side as well? |
My bad. We ran both talker and echo again. |
I think I've got it. Because the two computers share a 'common' IP address [172.17.0.1], we are incorrectly(?) inferring that the two applications (talker + echo) are hosted on the same computer. This impacts how we write multicast packets, resulting in the observed behavior.
|
So I'm confused about this 'common' ip address. In all the logs that we've sent you, All other NICs were disabled, leaving only the connection on the 172.31.255.1/24 subnet. Where is this 172,17.0.1 address coming from? Is that the UDP multicast address? Thanks for your helping me understand. |
So setting the |
CoreDX queries the OS for all the 'up' network interfaces.
|
And, by default, we will make use of all 'up' interfaces. I'm glad to hear that the setting COREDX_IP_ADDR worked. |
So we both do have docker installed which is using that 172.17.0.1 ip address. Let me try disabling that network interface and try that again. Do you have docker installed on your two test machines as well? |
Nope. Just a single interface. |
We just removed the docker ip interface and all appears to be working correctly. Even if docker is installed on one computer then coredx works fine. If I understand correctly, and correct me if I'm wrong, coredx checks the ip address of the publisher and subscriber to determine if they are on the same computer or not. However in cases where docker is installed, coredx will always assume that the publisher and subscriber are on the same machine. Could it be changed to use something more unique like a mac address instead? Thank you for your help! |
In general, I think your analysis is correct. However, I would say it slightly differently to indicate that it is not really tied to Docker, and that the behavior is not mandatory: Each CoreDX participant checks the IP address of each discovered peer participant to determine if they are on the same computer or not. In cases where identical IP addresses are detected, CoreDX will, by default, assume that the two participants are on the same machine. This default behavior can be disabled with the Concerning using MAC address for this test: The only information we are guaranteed to have about a peer is IP address. We don't have any information about the MAC address of discovered peers, otherwise that might be a better test. |
Okay. I understand. Thanks again for your help and quick replies! |
OK, Thanks for your patience and help as we worked through this! I really appreciate it! |
What works:
ros2 run demo_nodes_py talker
on one computerros2 run demo_nodes_py listener
on another computerThis seems to work because talker publishes as reliable and listener subscribes as reliable
What doesn't work:
ros2 run demo_nodes_py talker
on one computerros2 topic echo /chatter std_msgs/String
on another computerIt seems like this not working is related to the fact that topic echo subscribes as best effort. If this is run on a single computer, everything works fine, but something about best effort subscription isn't working between computers.
This is a major roadblock that will keep us from updating to bouncy.
The text was updated successfully, but these errors were encountered: