-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop trying to "preserve" iptables counters that are always 0 #110328
Conversation
@danwinship: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
58ea852
to
af23807
Compare
I have the feeling that the parsing will be "less solid" now 😬 |
It seems strange to worry that the new code will be less correct than the old code, given that the old code was not even being tested against valid data... |
/retest-required |
ah, is a conclusion based on observation, we didn't have any bug with the old code ... at least, not any that we know :) |
Discussion from https://bugzilla.redhat.com/show_bug.cgi?id=2058444 which is unfortunately not public: Dan Winship 2022-05-31 10:45:54 EDT Comment 45 Phil, can you confirm the results of these experiments against the actual code to make sure I'm not missing anything subtle?
ie, the rule has non-0 counters, but the chain always shows [0:0].
(kube-proxy does "iptables-save" for two reasons: (a) so it can set counters correctly on restore, (b) so it can see if there are stale KUBE chains lying around that need to be deleted. But it seems like (a) actually never really worked, and we don't need to do (b) on every sync...) Phil Sutter 2022-05-31 16:30:39 EDT Comment 46
There is no such thing as "chain-level counts". In iptables nomenclature, these values are "policy counters" and are therefore relevant only with chains that support a policy - user-defined ones of course do not. The iptables-restore code is clear there: counter values are parsed only if '
This was true for
Yes, I can confirm this and it seems to be a bug in libiptc.
IIRC, '
So (a) requires non-flushing restore to leave other chains' counters alone and is broken with iptables-legacy? Dan Winship 2022-06-01 08:44:09 EDT Comment 47
No, kube-proxy / kubernetes itself doesn't care about the counters. It's just trying to be a good citizen and not break the counters in case the user was using them for something else. At least... I think that's what it's doing? I haven't done the git archaeology to figure out exactly what the original intent of using " But regardless of what the intent was, the code is not actually doing anything useful; our iptables-restore input only ever contains user-defined chains, not built-in ones, so we never actually preserve any counters. We just do a bunch of extra work to keep track of the fact that all of our chains have " (The bug that resets the counters with iptables-legacy might be something that annoys some user somewhere, but AFAICT no one ever filed a bug report about it...) |
af23807
to
7624045
Compare
LGTM |
The iptables and ipvs proxies have code to try to preserve certain iptables counters when modifying chains via iptables-restore, but the counters in question only actually exist for the built-in chains (eg INPUT, FORWARD, PREROUTING, etc), which we never modify via iptables-restore (and in fact, *can't* safely modify via iptables-restore), so we are really just doing a lot of unnecessary work to copy the constant string "[0:0]" over from iptables-save output to iptables-restore input. So stop doing that. Also fix a confused error message when iptables-save fails.
The test was calling GetChainLines() on invalid pseudo-iptables-save output where most of the lines were indented. GetChainLines() happened to still parse this "correctly", but it would be better to be testing it on actually-correct data.
We don't need to parse out the counter values from the iptables-save output (since they are always 0 for the chains we care about). Just parse the chain names themselves. Also, all of the callers of GetChainLines() pass it input that contains only a single table, so just assume that, rather than carefully parsing only a single table's worth of the input.
7624045
to
7c27cf0
Compare
@danwinship seems we can stop passing kubernetes/pkg/proxy/iptables/proxier.go Line 1504 in a3da8d7
|
yeah... I didn't bother because it doesn't simplify the code any (it makes it two characters longer!) I guess we could just remove the argument from the function entirely, but... |
nah, I just found it , it can be a follow up |
/lgtm I think that this changes are ok to merge |
/lgtm |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
@thockin had mentioned a while back that he was pretty sure that all iptables counters are always 0 and we're doing pointless busywork pretending otherwise, and this turns out to be mostly correct.
Specifically, the chain-level counters that our code deals with (as opposed to the rule-level counters that we ignore) are only used for the built-in chains (
INPUT
,PREROUTING
, etc) that have a default policy, and they count the number of packets/bytes that hit that default policy (ie, that fall off the end of the chain rather than matching a terminal rule). For user-created chains, these counters will always be 0, and can't even manually be set to any other value (ie, if you try to set them in theiptables-restore
input, the values you provide will be ignored). This is true for both iptables-legacy and iptables-nft.We never pass
iptables-restore
any input that refers to the built-in chains, so all of our efforts to preserve the counters are pointless. And we don't have to worry about "but what if we wanted to change things in the future so that we did refer to those chains?" because there's no way we could refer to the built-in chains in ouriptables-restore
input without creating race conditions that might accidentally overwrite concurrent modifications to those chains by other users. (This is why the code still usesipt.EnsureChain
/ipt.EnsureRule
for some things rather than doing everything withipt.Restore
.)Special notes for your reviewer:
(After this PR, the only thing we use the
iptables-save
output for is to figure out which service/endpoint chains we need to delete. We could theoretically figure that out without usingiptables-save
, but that would require us to more-or-less keep a copy of the current known iptables state in memory at all times. And we'd still need theiptables-save
version at startup time anyway. A better optimization would be to change it to only do stale-chain cleanup periodically (eg, once a minute) rather than on every resync; the stale chains wouldn't have any rules pointing to them, so they wouldn't cause any problems beyond memory usage. #110334 does that.)Does this PR introduce a user-facing change?
/sig network
/priority important-longterm