Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Updating Counter Cache After Commit (to avoid deadlocks) #263

Closed
jbritten opened this issue Jul 1, 2019 · 14 comments
Closed

Support Updating Counter Cache After Commit (to avoid deadlocks) #263

jbritten opened this issue Jul 1, 2019 · 14 comments

Comments

@jbritten
Copy link

jbritten commented Jul 1, 2019

Support updating the counter cache after commit (outside the primary transaction) where the SQL UPDATE calls would be less susceptible to deadlocks.

Let's say we've got a Campaign model which can have many Subscribers. We store lots of campaign metrics in a separate CampaignMetrics summary table. When a subscriber confirms their email address we may see a MySQL transaction such as the following, which illustrates counter_culture updating various counts as well as additional proprietary work being performed:

BEGIN
SELECT 1 AS one FROM `subscribers` WHERE (`subscribers`.`email` = 'user@host.com' AND `subscribers`.`campaign_id` = 1001) LIMIT 1
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
UPDATE `subscribers` SET `confirmed_at` = '2019-07-01 16:34:04', `status` = 'confirmed', `updated_at` = '2019-07-01 16:34:04' WHERE `subscribers`.`id` = 2001
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `confirmed_subscribers_count` = COALESCE(`confirmed_subscribers_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `unconfirmed_subscribers_count` = COALESCE(`unconfirmed_subscribers_count`, 0) - 1 WHERE `campaign_metrics`.`id` = 1001
 ... (some additional proprietary work) ...
COMMIT

There are other actions a Subscriber could do which will also update data in the CampaignMetrics table. For example, opening an email received would update the email_opens_count on the CampaignMetrics table with a transaction such as the following:

BEGIN
SELECT 1 AS one FROM `subscribers` WHERE (`subscribers`.`email` = 'user@host.com' AND `subscribers`.`campaign_id` = 1001) LIMIT 1
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `email_opens_count` = COALESCE(`email_opens_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
 ... (some additional proprietary work) ...
COMMIT

Now, at scale when many concurrent activities are occurring, such as many subscribers confirming their email address, opening emails, clicking emails, etc. deadlocks such as the following can occur when updating the CampaignMetrics summary table:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-07-01 13:17:44 2b373c029700
*** (1) TRANSACTION:
TRANSACTION 2384728875, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2
MySQL thread id 1358, OS thread handle 0x2b34803cb700, query id 8039221 10.0.102.43 deploy updating
UPDATE `campaign_metrics` SET `confirmed_subscribers_count` = COALESCE(`confirmed_subscribers_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 663 page no 110 n bits 109 index `PRIMARY` of table `database_production`.`campaign_metrics` trx id 2384728875 lock_mode X locks rec but not gap waiting

Yes, SQL calls to update the counter cache are atomic; however, transactions must be short (lock fewer rows for the smallest amount of time as much as possible). Supporting an optional configuration flag to execute updating the counter cache after commit would yield the following SQL, which would be much less susceptible to deadlocks:

BEGIN
SELECT 1 AS one FROM `subscribers` WHERE (`subscribers`.`email` = 'user@host.com' AND `subscribers`.`campaign_id` = 1001) LIMIT 1
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
UPDATE `subscribers` SET `confirmed_at` = '2019-07-01 16:34:04', `status` = 'confirmed', `updated_at` = '2019-07-01 16:34:04' WHERE `subscribers`.`id` = 2001
 ... (some additional proprietary work) ...
COMMIT
BEGIN
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `confirmed_subscribers_count` = COALESCE(`confirmed_subscribers_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
COMMIT
@magnusvk
Copy link
Owner

magnusvk commented Jul 2, 2019

Cross-posting this from #120 to explain why this was removed:

Interestingly, calling the counter update after commit was first added because of deadlock issues with Postgres. It's somewhat ironic that it's now MySQL causing a similar issue.

In any case, the history of removing this comes from a bugfix to allow multiple saves in one transaction, see 81dfbf5. Plus, in the absence of issues like this with the database layer, I don't see why you'd ever want to push the counter cache update outside of the transaction. But then again, you are seeing database issues.

If you can figure out how to write a test for this behavior, I think I'd probably be down to re-add this functionality, given that the original code for this was quite straightforward.

This is the first I'm hearing of this issue, so not sure how widespread it is, but of course still would be good to figure out how to address this for you.

@magnusvk
Copy link
Owner

magnusvk commented Jul 2, 2019

I'm trying to think how we could add a test for this—probably with a test model with different callbacks? Have one after_save callback that runs before the commit and stores the current value of the counter cache in an ivar. Then we can test that ivar to make sure the counter doesn't change before the commit, and test that it changed after the commit normally in the test?

@xtagon
Copy link

xtagon commented Aug 27, 2019

I'm seeing similar deadlocks (Postgres) in my app. They're infrequent and I'm still trying to figure out how to reproduce in a test.

@magnusvk
Copy link
Owner

Just for the record—if we can come up with a sensible PR that adds this back as an option, I'm down to merge that. But I won't have time to work on it myself.

@avit
Copy link

avit commented Feb 3, 2021

I don't think this problem is unique to CounterCulture, or even caused by it. Still, this could be made an option I suppose, or deferred to before commit like touch: true does.

@jbritten, if you can make your association updates happen in a consistent order (generally, starting from the lowest model in your belonging hierarchy and going up) then you can eliminate those deadlocks. (I wrote a blog post about Transaction deadlocks on ActiveRecord associations in case it helps anyone else debug these.)

In your example transaction you have:

BEGIN
UPDATE `subscribers` ... WHERE `subscribers`.`id` = 2001   -- lock A
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001   -- lock B
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001
COMMIT

This is the order of the row locks, from when rows are first updated in the transaction (the third update doesn't matter, as it reuses an existing lock). You must have another transaction elsewhere that is updating these rows in the opposite order. e.g.

BEGIN
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001   -- lock B
UPDATE `subscribers` ... WHERE `subscribers`.`id` = 2001   -- lock A
COMMIT

I'm calling this second one "out of order", since if I'm assuming correctly, your belonging hierarchy starts from subscriber and belongs_to :campaign_metric. That's the common sequence you should follow for best results. You can fix this either by changing the update order if you can, or else by explicitly adding an earlier subscriber.lock! / Subscriber.lock.find(id) to claim a lock on that row first:

BEGIN
SELECT * FROM `subscribers` WHERE `subscribers`.`id` = 2001 FOR UPDATE   -- lock A
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001   -- lock B
UPDATE `subscribers` ... WHERE `subscribers`.`id` = 2001
COMMIT

ActiveRecord has a bug related to this when using touch: true, which also executes within the transaction but with several records in indeterminate order.

@lightyrs
Copy link

lightyrs commented Feb 4, 2021

@avit

Very interesting blog post and rails PR.
I'm wondering if the same logic of your PR could be applied specifically to the transaction logic of this gem.

@xhs345
Copy link

xhs345 commented Feb 26, 2021

I would also like to see the after_commit option being added back in, since this was for me the original reason to consider this library.

In our case we have the following deadlock scenario (based on my current understanding):

  1. Long running job A starts transaction
  2. Short job B starts to update some fields on users table (using update_all), but gets stuck because of A
  3. Job A inserts data which triggers an update on the users table counter cache.
  4. Deadlock occurs and in some cases B gets rolled back

@magnusvk
Copy link
Owner

magnusvk commented Mar 2, 2021

Hey guys—I took a quick stab at adding this back, see #309. Let's see if the tests go green on that, and I'd love some extra eyes on that PR, too.

@jbritten
Copy link
Author

jbritten commented Mar 3, 2021

Hey @magnusvk, I really appreciate you taking the time to add this back! I'm testing your 'execute-after-commit' branch and getting the following error:

! Unable to load application: LoadError: cannot load such file -- after_commit_action

Looks like a missing dependency.

@magnusvk
Copy link
Owner

magnusvk commented Mar 4, 2021

@jbritten so the thing is that you only need that gem if you set the execute_after_commit option to true, so I don't want to make it a straight-up gem dependency. I just pushed a commit that adds a more helpful error message. But the upshot is that you'll manually have to include the after_commit_action gem in your dependencies.

@jbritten
Copy link
Author

jbritten commented Mar 5, 2021

@magnusvk got it; thanks for the more helpful error message. I've been running this branch in staging for 2 days and seems to be working as expected.

@magnusvk
Copy link
Owner

@jbritten any problems with this branch? I’m thinking I should merge and release as it seems to be working.

@jbritten
Copy link
Author

@magnusvk I've had the branch running in our production app for over a week and haven't encountered issues yet. I'd say go ahead and merge and release.

@magnusvk
Copy link
Owner

Awesome, thanks for the update. Just released this as gem version 2.8.0. See documentation here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants