Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Indexable visibility columns #17757

Open
wants to merge 58 commits into
base: trunk
Choose a base branch
from
Open

Add Indexable visibility columns #17757

wants to merge 58 commits into from

Conversation

diedexx
Copy link
Member

@diedexx diedexx commented Dec 2, 2021

Context

  • We lack an easy way to determine whether archives have posts while querying Indexables. We need to know this when we generate XML Sitemaps with Indexables or when altering the noindex value for empty archives (empty user and term archives are now soft-404s, which are set to noindex)

Summary

This PR can be summarized in the following changelog entry:

  • Add new columns to the Indexable model: number_of_publicly_viewable_posts and is_publicly_viewable
  • Deprecate the use of is_public and has_public_posts on the Indexable model.

Relevant technical choices:

  • The return value of the adapter's select_one function is cast, because it returned an stdClass while all surrounding code depends on it being an array. (it wasn't used before)
  • The version number of most indexable builders has been increased to ensure that all relevant Indexables will be rebuilt and that they will receive values for the new columns.
  • I don't watch for changes in visibility for posttypes and taxonomies as there are no hooks for this and rarely happens. If this happens, a reindex is required for the is_publicly_viewable to be accurate for term, post-type and post Indexables.
  • I use as little NULL values as possible for the new columns. Sometimes we have to:
    • The date-archive indexable can't have a reasonable value for number_of_publicly_viewable_posts, as the number of posts varies based on which date query is used.
    • The author is_publicly_viewable changes depending on a global Yoast SEO setting (which disables author pages). If we were to include that in author indexables, we'd have to rebuild all authors when that setting is changed which may is a huge performance issue for larger sites.
  • I started including password protected posts from the aggregate queries that are used for getting the object_last_modified and object_published_at. The reason being that password protected posts do show up in the archives and do get indexed just like any other post.
  • I deprecate is_public instead of removing. The model is exposed through our surfaces, so removing the is_public property is a BC break. is_public is still actively being set when (re)indexing. But all related code has been deprecated.
  • When the has_public_posts property is used, the code shows a deprecation warning and returns the value for number_of_publicly_viewable_posts posts instead. This should yield the same result.
  • Add a new function in the Indexable_Repository: query_where_noindex. This is meant to be used for sitemaps later.
  • is_post_publicly_viewable has been copied to the post-builder because it was introduced in WP 5.7 and we still support 5.6.
  • I've introducedset_dreprecated_property and marked it with @internal so we can keep deprecated properties up to date during the deprecation period without throwing any deprecation warnings.

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

Prep

  • Login as a user who can author posts (author, editor or administrator).
  • Use a database client like TablePlus, SequelPro or MySQL Workbench to connect to the database.
  • Open the table that ends with _yoast_indexables.
  • Keep it open; you'll be referencing this a lot during testing.

Database migration

  • Check if the indexables table now has two new columns: number_of_publicly_viewable_posts and is_publicly_viewable.
  • Check that is_publicly_viewable is NULL for all indexables.
  • Check that number_of_publicly_viewable_posts is populated and has the correct value. It should be either NULL, 0 or 1:
    • 0 for the home page and system-page indexables (OR the acutal number of publicly viewable posts in the home page. This value is updated rather quickly)
    • 0 for author indexables that authored no posts or only posts that are not published or have robots set to noindex for that post or are password protected.
    • 1 for author indexables that authored at least one post that is published and has robots set to index for that post and isn't password protected.
    • NULL for author indexables that authored at posts that are published and have robots set to default for that posttype and aren't password protected.
    • NULL for post, term and post-type archive indexables
    • 0 for attachment posts that don't have a parent; the attachment isn't linked to a post.
    • 0 for attachment posts that are linked to a post that is not published or have robots set to noindex for that post or is password protected.
    • 1 for attachment posts that are linked to a post that is published and has robots set to index for that post and isn't password protected.
    • NULL for attachment posts that are linked to a post that is published and have robots set to default for that posttype and isn't password protected.

Posts

  • Publish a new post.
  • in the database, find the post indexable row.
  • Verify that the is_publicly_viewable is 1
  • Make the post private.
  • Verify that the is_publicly_viewable is 0
  • Find a posttype that doesn't isn't accessible from the frontend (e.g. WooCommerce orders). You can create your own with a plugin like CPT UI by selecting "false" for the "public" value.
  • Publish a new post of that private posttype.
  • Verify that the is_publicly_viewable is 0
  • Verify that number_of_public_posts is 0 for your new posts.

Date archives

  • Go to the Search appearance settings > Archives
  • Disable date archves and save
  • In the database, find the date-archive indexable row.
  • see that the is_publicly_viewable is set to 0
  • Reinable the date archives and see that the value is back to 1

System pages

  • Go to the Search appearance settings > Archives
  • Under "Special pages", adjust the titles for the 404 page.
  • In the database, find the system-page indexable row.
  • see that is_publicly_viewable is set to 1

Home page

  • Go to wp-admin/options-reading.php (Settings > Reading) and select "Your latest posts" next to "Your homepage displays" if that hasn't been the case already.
  • Publish a new post
  • in the database, find the home-page indexable row.
  • Verify that the number_of_publicly_viewable value now matches the number of posts that is visible in your home page for not logged in users (use an incognito window)
  • Make that post private
  • Verify that the number_of_publicly_viewable goes down and the object_last_modified reflects the moment you made the post private.

Authors

  • Publish a new post.
  • in the database, find the user you are logged in to.
  • Verify that the number_of_publicly_viewable value now matches the number of posts that is visible in your author archive for not logged in users (use an incognito window)
  • Make that post private
  • Verify that the number_of_publicly_viewable goes down and the object_last_modified reflects the moment you made the post private.

Post types

  • Publish a new post
  • in the database, find the post-type-archive indexable row.
  • Verify that the number_of_publicly_viewable value now matches the number of posts that is visible in your post type archive for not logged in users (use an incognito window)
  • Make that post private
  • Verify that the number_of_publicly_viewable goes down and the object_last_modified reflects the moment you made the post private.

Terms

  • Create a new category
  • in the database, find the term indexable row.
  • Verify that the number_of_publicly_viewable is 0.
  • Add your new category to a published post.
  • Verify that the number_of_publicly_viewable value now matches the number of posts that is visible in your term archive for not logged in users (use an incognito window). The value is 1.
  • Make the post private.
  • Verify that the number_of_publicly_viewable goes down and the object_last_modified reflects the moment you made the post private.

Reindexing

  • In the SEO > Tools admin screen, hit "Start SEO data optimization".
  • In the database, all indexables should be updated and match the same conditions as the test cases above.

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

  • N/A - covered in test instructions.

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects other environments and needs to be tested there.

Documentation

  • I have written documentation for this change.

Quality assurance

  • I have tested this code to the best of my abilities
  • I have added unittests to verify the code works as intended
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.

Fixes FIX-21
Fixes FIX-22

Related to

This caused issues with the rename_column, as column_info expects an array
If a replacement is given the ORM uses the replacement
automatically.
is_public is replaced by a different approach in querying data and
by is_publicly_viewable. If we would immediately replace is_public
with is_publicly_viewable, that would be a BC break.
Password protected posts still end up in the post-type- and termarchives
and are publicly queryable. The difference is that the content is password
protected. Because if this detail, these posts should be noindexed, but the
archives should still update their data because the protected posts are
still shown there
This prevents unsetting the last modified date on archive indexables when the last
post of the post-type or taxonomy is removed. Also prevents moving the date back
in time when performing a reindex action.
@diedexx diedexx added the changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog label Dec 2, 2021
@diedexx diedexx added this to the 17.9 milestone Dec 2, 2021
Copy link
Contributor

@leonidasmi leonidasmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the has_public_posts property is used, the code shows a deprecation warning and returns the value for number_of_publicly_viewable_posts posts instead. This should yield the same result.

If that's what we want, then it's not currently happening, since has_public_posts was a boolean value and number_of_publicly_viewable_posts is an integer.

Also, if we want a match between the old has_public_posts value and the new number_of_publicly_viewable_posts one, I can imagine a case where these won't match and that is when we add images/attachments in a post.

Doing so, we will create an indexable for that attachment with number_of_publicly_viewable_posts = 0 as per the new code
but if I read the old code right, we would create the indexable for that attachment with has_public_posts = 1 as long as that post where the attachment was added is public.

Comment on lines 402 to 405
$post_type = get_post_type( $post );
$post_status = get_post_status( $post );

return is_post_type_viewable( $post_type ) && $this->is_post_status_viewable( $post_status );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefix those functions with \ likeso:

		$post_type   = \get_post_type( $post );
		$post_status = \get_post_status( $post );

		return \is_post_type_viewable( $post_type ) && $this->is_post_status_viewable( $post_status );


$replacements = \array_merge( $post_statuses, [ $author_id ] );
$replacements = \array_merge( $post_statuses, [ $author_id ], $post_types );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a check if $post_types is array here, before merging it to the $replacement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! that could've been a nasty bug. I added validation higher up in the get_author_post_types so we're sure that we get an array.

Comment on lines +112 to +114
$indexable->object_published_at = $aggregates->first_published_at;
$indexable->object_last_modified = max( $indexable->object_last_modified, $aggregates->most_recent_last_modified );
$indexable->number_of_publicly_viewable_posts = $aggregates->number_of_public_posts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is repeated 4 times throughout the builders.

Maybe create a separate function to avoid Code Duplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider it, but decided to not change more than I had to. I want to better abstract this when we revise the builders entirely. Do you think that's ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, for sure it's ok 👍 :)

@@ -18,13 +18,13 @@ class Indexable_Builder_Versions {
*/
protected $indexable_builder_versions_by_type = [
'date-archive' => self::DEFAULT_INDEXABLE_BUILDER_VERSION,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something here, but why don't we up the builder version for dates?

My thinking is that since we add the is_publicly_viewable property, we should up this too, but maybe I don't get it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! I missed that one. Thanks for noticing!

@diedexx diedexx requested a review from leonidasmi December 10, 2021 13:31
Copy link
Contributor

@leonidasmi leonidasmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR: 👍

With these new changes, we also take care of this remark, except the last part about attachments' has_public_posts BC break, but as per our conversation with @diedexx that's ok.

lib/model.php Outdated Show resolved Hide resolved

$sql = "
SELECT MAX(p.post_modified_gmt) AS last_modified, MIN(p.post_date_gmt) AS published_at
SELECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: how about getting post ID's for an author first and iterating over those - using primary keys seems more efficient than where-ing over a set of fields.

If we use the same version number for two separate model changes in
two separate releases, the later release won't receive automatic
model updates because the version hasn't changed as of the previous release.
Adding a comment of the version number, will cause a git merge conflict,
which forces you to look at what the correct version number should be.
@diedexx diedexx mentioned this pull request Dec 20, 2021
7 tasks
@diedexx
Copy link
Member Author

diedexx commented Dec 20, 2021

During acceptance together with @increddibelly we found a bug:
When you remove a term from a post, the number_of_publicly_viewable_posts of that term doesn't get updated.

@diedexx diedexx removed this from the 17.9 milestone Dec 21, 2021
@IreneStr
Copy link
Contributor

@diedexx Is this still relevant, or can it be closed? :)

@diedexx
Copy link
Member Author

diedexx commented Sep 23, 2022

Yes, this is still relevant for indexable sitemaps

Copy link

A merge conflict has been detected for the proposed code changes in this PR. Please resolve the conflict by either rebasing the PR or merging in changes from the base branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog merge conflict
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants