Getting payloads when asked for them when not using postgres for payload storage #501
Description
When getting a payload request from a proposer we check Redis, Memcached, and Postgres for the payload. If we're not finding the payload in any of them, we check if we even received a bid. Now, normally, with all payloads stored in Postgres this would be fine (save a small race condition). If we do find a bid, we have a problem, if we don't find a bid, we never received the bid and it's understandable we don't have a payload.
However, if one turns off payload storage in postgres, you're suddenly in a bind. Redis doesn't store every payload, only those which were at one time the top bid of the relay. So now if Redis doesn't have the payload, you cannot tell whether you never got the bid, or managed to store the bid but not the payload.
You can see the commit I wrote in our fork to see how we're currently dealing with this conundrum. We basically assume it's not the critical case, and keep an eye on Redis errors. Our general position is:
- Not great 😅 , but not terrible. Our logs show Redis failing due to an error eight times in the past month due to a timeout, and of those times it is unclear how many, if any, happen to fail at the critical point. This would mean a pipeline with a SET payload command before SET bid command gets shipped, gets executed, but just at the border of those two fails to complete.
- If you don't care about latency, you can just use postgres and avoid the headache. If you care about latency and you want to improve this situation, there are multiple options. One is to store every payload in a faster store. Like a second Redis or Memcached. This is almost what memcached currently does, as it skips whatever Redis skips (after we solve Fix false positive in boolean flag #500 to be precise). This is especially prudent if you'd like to serve bids your relay doesn't relay, but others have, that the proposer asks you for anyway. Another option is to not serve other relays bids, track which bids you've served, and that way it is clear whether we should or should not have a payload, without relying on how storage is handled exactly.
You can see the commit I wrote to have the logging and responses make more sense in our postgres-payloads-disabled case.
Sidenote: the code notes more recent versions of mev-boost don't ask relays for payloads for bids they didn't receive from those relays, but this is naturally hard to rely on, given the wide and sensitive deployment of mev-boost. Nevermind operators may disagree with this stance and fork to have more resilient get payload logic.