Skip to content
This repository has been archived by the owner on Oct 4, 2022. It is now read-only.

Keyword is highlighted more times than stated in analysis #2155

Open
dariaknl opened this issue Feb 20, 2019 · 2 comments
Open

Keyword is highlighted more times than stated in analysis #2155

dariaknl opened this issue Feb 20, 2019 · 2 comments

Comments

@dariaknl
Copy link
Contributor

Tested with WP 5.0.3 and 10.1 beta2.

How can we reproduce this behavior?

  1. Enter the following text

Maecenas apple a auctor mi. Etiam sapien nulla, eleifend quis convallis sed, sodales vel felis. Vestibulum ante psum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

apple

Fusce malesuada lectus  bi sit ametapple cursus iaculis. Donec sit amet dignissim lectus. Aenean at ante quis quam eleifend volutpat vitae quis nulla. Vestibulum ultrices lectus nec lacus pulvappleinar lobortis. Pellentesque vehicula nulla eu interdum blandit. Curabitur eleifend nulla leo. Aliquam vehicula lacus id orci euismod, non rhoncus lacus cursus. Fusce vitae arcu quis felis ornare sagittis. Proin ultricies purus et molestie. Maecenas apple a auctor mi. Etiam sapien nulla, eleifend quis convallis sed, sodales vel felis. Vestibulum ante psum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

  1. Fill in keyword "apple".
  2. Check assessment:

screenshot 2019-02-20 at 13 41 23

  1. Toggle eye marker:

screenshot 2019-02-20 at 13 42 00

It is expected to see only word "apple" being highlighted 3 times.

@moorscode moorscode transferred this issue from Yoast/wordpress-seo Feb 20, 2019
@nataliashitova
Copy link
Contributor

Interesting fact: the spurious highlights only occur when there is a heading apple. If I remove the heading or I make it apple bla bla or Apple, only correct instances of the keyphrase are highlighted.

@nataliashitova
Copy link
Contributor

Mystery solved

The current mechanism of highlighting keyphrases is as follows.

  1. Search for all possible ways the words from the keyphrase occur in the text. In this example, this step will result in just apple.
  2. Take sentences one by one and hang highlight-tags on words from the keyphrase in them. In this example, this step will result in
[
   "Maecenas <yoastmark class='yoast-text-mark'>apple</yoastmark> a auctor mi.", 
   "<yoastmark class='yoast-text-mark'>apple</yoastmark>",
   "Maecenas <yoastmark class='yoast-text-mark'>apple</yoastmark> a auctor mi."
]
  1. Mark any instances of these sentences in the text. This is the moment when it goes wrong: The word "ametapple" is considered to include the sentence
    "<yoastmark class='yoast-text-mark'>apple</yoastmark>".

How bad is that?

The problem occurs if the following two conditions are fulfilled:

  1. There exists a sentence with keyphrase words in the text.
  2. There exists a word in the text that includes this sentence entirely (case-sensitive).

It's hard to believe that this happens with a regular text. However, if the keyphrase includes somewhat shorter (non-function) words and if one / some of these words are used alone in a sentence (for instance, in a heading), there is a risk of a problem demonstrated by the issue.

We at Team Lingo believe it's an edge case, which is unlikely to bother a lot of users and which can wait for the new Tree Parser to solve it. @moorscode do you agree?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants