Fixes #733 by refactoring the tagged query #744

lachie · 2019-09-20T08:14:47Z

Also adds a simple spec for .tagging.

Also adds a simple spec for .tagging.

thomasdziedzic

Looks good, and thank you for adding a test!

I'm guessing the sql for this will probably look something like:

select distinct s.title
from stories s
join taggings t
on s.id = t.story_id
where s.score >= 0
and t.tag_id in (1, 2)

But another way of writing the above would be:

select s.title
from stories s
where s.score >= 0
and s.id in (select story_id from taggings where tag_id in (1, 2))

Do you know how the performance comparison would look like?

pushcx · 2019-09-21T15:54:44Z

I'm thrilled to see the spec, thanks especially for adding a test. And to add on to @thomasdziedzic's comment, I've slowly learned to be suspicious when I see distinct - when I've used it, I often later learn the hard way that I was papering over a misunderstanding in what the query was doing. I think the important thing about the second query is that it expresses a slightly different meaning. The first one is "give me the stories and their taggings", the second is "give me the stories in the set of stories with these tags". We have dupes because the first one is not exactly what we mean to do.

lachie · 2019-09-22T23:15:56Z

Hey,
Thanks for the feedback, I've switched it to the subquery version.

I didn't do any irl benchmarking, but here's a quick explain of both queries (the current query generated via #to_sql)

MariaDB [lobsters_dev]> explain SELECT DISTINCT `stories`.* FROM `stories` INNER JOIN `taggings` ON `taggings`.`story_id` = `stories`.`id` WHERE `stories`.`merged_story_id` IS NULL AND `stories`.`is_expired` = FALSE AND ((CAST(upvotes AS signed) - CAST(downvotes AS signed)) >= 0) AND `taggings`.`tag_id` IN (1, 2) ORDER BY `stories`.`created_at` DESC;
+------+-------------+----------+--------+-------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+--------------------------------------------------------+
| id   | select_type | table    | type   | possible_keys                                                                 | key                | key_len | ref                            | rows | Extra                                                  |
+------+-------------+----------+--------+-------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+--------------------------------------------------------+
|    1 | SIMPLE      | taggings | range  | story_id_tag_id,taggings_tag_id_fk                                            | taggings_tag_id_fk | 8       | NULL                           | 4    | Using index condition; Using temporary; Using filesort |
|    1 | SIMPLE      | stories  | eq_ref | PRIMARY,is_idxes,index_stories_on_is_expired,index_stories_on_merged_story_id | PRIMARY            | 8       | lobsters_dev.taggings.story_id | 1    | Using where                                            |
+------+-------------+----------+--------+-------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+--------------------------------------------------------+

MariaDB [lobsters_dev]> explain select s.title from stories s where (s.upvotes-s.downvotes) >= 0 and s.id in (select story_id from taggings where tag_id in (1, 2));
+------+--------------+-------------+--------+------------------------------------+--------------------+---------+--------------------------------+------+-----------------------+
| id   | select_type  | table       | type   | possible_keys                      | key                | key_len | ref                            | rows | Extra                 |
+------+--------------+-------------+--------+------------------------------------+--------------------+---------+--------------------------------+------+-----------------------+
|    1 | PRIMARY      | <subquery2> | ALL    | distinct_key                       | NULL               | NULL    | NULL                           | 4    |                       |
|    1 | PRIMARY      | s           | eq_ref | PRIMARY                            | PRIMARY            | 8       | lobsters_dev.taggings.story_id | 1    | Using where           |
|    2 | MATERIALIZED | taggings    | range  | story_id_tag_id,taggings_tag_id_fk | taggings_tag_id_fk | 8       | NULL                           | 4    | Using index condition |
+------+--------------+-------------+--------+------------------------------------+--------------------+---------+--------------------------------+------+-----------------------+
3 rows in set (0.001 sec)

I haven't used mysql in a minute so don't really have an intuition on which is better from eyeballing.

The subquery one is more elegant in any case, so here it is!

pushcx · 2019-09-24T02:49:35Z

Yep, they're basically equivalent... except the big difference is that distinct prompts MySQL to do an extra sort pass, that's what the Using filesort note is about.

This looks like it'll execute two queries, so there's an extra round-trip to the server. Can you express this in one, perhaps with ActiveRecord#merge?

lachie · 2019-09-24T04:26:10Z

Hi,
I did a #to_sql to verify that its one query which resembles the subquery suggestion:

# assuming submitter, tag1, tag2
repo = StoryRepository.new(submitter)
tagged = repo.tagged([tag1, tag2])
puts tagged.to_sql
#=>

SELECT `stories`.* FROM `stories` 
WHERE `stories`.`merged_story_id` IS NULL 
AND `stories`.`is_expired` = FALSE 
AND ((CAST(upvotes AS signed) - CAST(downvotes AS signed)) >= 0) 
AND `stories`.`id` IN (SELECT `taggings`.`story_id` FROM `taggings` WHERE `taggings`.`tag_id` IN (3, 4)) 
ORDER BY `stories`.`created_at` DESC

Sorry, is that what you meant by one query?

pushcx · 2019-10-01T02:10:01Z

Huh. Yeah, that's exactly what I meant, and I pulled your branch locally to test it out. I'm surprised that the Tagging.select doesn't trigger an independent query. Rails must be doing something very clever under the covers to recognize the query hasn't been evaluated yet and that it can fold it in.

Well, wonderful! Merging. Thanks so much for running this all the way down with a test.

Fixes lobsters#733 by adding .distinct to the tagged query

204da0c

Also adds a simple spec for .tagging.

thomasdziedzic approved these changes Sep 20, 2019

View reviewed changes

lachie added 2 commits September 23, 2019 09:05

switch to subquery version

c9dce9d

improve the variable name for understandability

0ed3d9a

lachie changed the title ~~Fixes #733 by adding .distinct to the tagged query~~ Fixes #733 by refactoring the tagged query Sep 23, 2019

pushcx merged commit 225900e into lobsters:master Oct 1, 2019

pushcx pushed a commit that referenced this pull request Jan 7, 2022

Fixes #733 by refactoring the tagged query (#744)

4919359

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #733 by refactoring the tagged query #744

Fixes #733 by refactoring the tagged query #744

lachie commented Sep 20, 2019

thomasdziedzic left a comment

pushcx commented Sep 21, 2019

lachie commented Sep 22, 2019

pushcx commented Sep 24, 2019

lachie commented Sep 24, 2019 •

edited

Loading

pushcx commented Oct 1, 2019

Fixes #733 by refactoring the tagged query #744

Fixes #733 by refactoring the tagged query #744

Conversation

lachie commented Sep 20, 2019

thomasdziedzic left a comment

Choose a reason for hiding this comment

pushcx commented Sep 21, 2019

lachie commented Sep 22, 2019

pushcx commented Sep 24, 2019

lachie commented Sep 24, 2019 • edited Loading

pushcx commented Oct 1, 2019

lachie commented Sep 24, 2019 •

edited

Loading