Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

add Module API callback for user directory search #12318

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

Samonitari
Copy link

Signed-off-by: Krisztian Szegi k.git.hub@meszaros-szegi.cloud

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

Signed-off-by: Krisztian Szegi <k.git.hub@meszaros-szegi.cloud>
@Samonitari Samonitari requested a review from a team as a code owner March 29, 2022 06:40
@@ -97,6 +108,23 @@ async def search_users(
"""
results = await self.store.search_user_dir(user_id, search_term, limit)

merged_results = results["results"]
existing_user_ids = [user["user_id"] for user in merged_results]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging the external user directory with synapse's works so that: if user_id match -> user match.
I don't know if it is realistic that the display name or avatar is different in the "backend", but this seems like the safe(r) thing to do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we'd want the external backend to have the last word here, so I think we can just call results["results"].update(external_users).

@Samonitari
Copy link
Author

I'd like this to be merged, so #12247 could be closed.
I could then add the callback to matrix-synapse-ldap3 auth backend, making the latter a 100% replacement for the dead ma1sd, at least in a LDAP based setup.

@dklimpel
Copy link
Contributor

Is there no docs needed for synapse/docs/modules/?

@Samonitari
Copy link
Author

Fair enough - I'll add the docs, and remove the commented out loop!

Copy link
Contributor

@babolivier babolivier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I've left a few points I'd like to see addressed before we can merge this. I'd also like to see some tests for this new callback - see this snippet for a few examples of tests that involve module callbacks. I'd also like to see some documentation about this, but from the previous comments on this PR it looks like you're already on it.

@@ -28,6 +28,9 @@

logger = logging.getLogger(__name__)

# Types for callbacks to be registered via the module api
ON_SEARH_USERS_CALLBACK = Callable[[str, str, int], Awaitable[Optional[SearchResult]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ON_SEARH_USERS_CALLBACK = Callable[[str, str, int], Awaitable[Optional[SearchResult]]]
ON_SEARCH_USERS_CALLBACK = Callable[[str, str, int], Awaitable[Optional[SearchResult]]]

@@ -28,6 +28,9 @@

logger = logging.getLogger(__name__)

# Types for callbacks to be registered via the module api
ON_SEARH_USERS_CALLBACK = Callable[[str, str, int], Awaitable[Optional[SearchResult]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like this callback's name to better reflect what it does. on_action implies that the callback allows modules to react after action has happened (which is also the logic other most on_[...] callbacks follow, with an exception or two). In this case, it implies that this callback is called after a user directory search is complete.

In this case, maybe something like get_external_results would cut it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I didn't consider the sequential meaning on_[...] would bring
get_external_results feels a bit too general, and external potentially isn't the most correct either, as these users could be already mirrored to database.
SEARCH_USERS_CALLBACK simply?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_users feels to me like it can be a bit misleading as the callback is only for searching users that are on an separate platform (ie not Matrix users that are either local to the homeserver or share rooms with the server).
search_external_users could be a good middle ground, wdyt?

@@ -97,6 +108,23 @@ async def search_users(
"""
results = await self.store.search_user_dir(user_id, search_term, limit)

merged_results = results["results"]
existing_user_ids = [user["user_id"] for user in merged_results]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we'd want the external backend to have the last word here, so I think we can just call results["results"].update(external_users).

Comment on lines 114 to 120
external_users = await callback(user_id, search_term, limit)
if external_users is not None:
merged_results += [
user
for user in external_users["results"]
if user["user_id"] not in existing_user_ids
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some error handling around this so that a faulty module doesn't completely disable user search. See this snippet for inspiration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, correcting

]

results["results"] = merged_results
results["limited"] = len(merged_results) > limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes much sense to update limited like this. We do this in search_user_dir because we request limit + 1 elements from the database (but even that is a bit meh), but this doesn't really apply here. I think we should just leave results["limited"] as it is.

Copy link
Author

@Samonitari Samonitari Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did only really understood the logic behind limited after I committed.
SQL query does not report if the limit was needed or not - so the +1 is a workaround so we can now if there are more users. And that is a bit "meh":
Maybe the workaround in synapse/storage/databases/main/user_directory could also strip the the last found result if query returned limit + 1, as: technically we maybe are falsely reporting limited when in fact, there is exactly limit+1 users.

OTOH, the correct merging would be maybe:

  • if limited is false before merge:
    • pass limit - database_result_size to callback
    • callback should also return limit - database_result_size + 1 if more users are found than requested
    • check the callback's returned result size, set the limited accordingly, and merge all or all - 1
  • if limited is true before merge, don't merge at all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this would mean that if the database query yielded a limited result set we would only show results in the external backend if their user ID appears in the result set from the database. That's assuming we change the merge logic as suggested in #12318 (comment). Otherwise it means we don't show results (and presumably don't call the callback at all) unless the database doesn't manage to provide a limit amount of results.

This I guess would make sense even though it's not a super obvious behaviour at first. Though it sounds to me like it would fill a rather niche case; in my experience it would be more logical to expect the other way around, with the custom external backend being the source of truth and falling back to Synapse's generic search to fill in the blanks. I think doing it this way would be more helpful.

@babolivier babolivier self-requested a review April 14, 2022 13:09
Copy link
Contributor

@babolivier babolivier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry about the delay in my review, this had fallen off my radar somehow. I've added a few thoughts, though as mentioned before I'd also like to see some test for this. The callback will also need to be documented, see https://github.com/matrix-org/synapse/tree/master/docs/modules for reference.

@@ -28,6 +28,9 @@

logger = logging.getLogger(__name__)

# Types for callbacks to be registered via the module api
ON_SEARH_USERS_CALLBACK = Callable[[str, str, int], Awaitable[Optional[SearchResult]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_users feels to me like it can be a bit misleading as the callback is only for searching users that are on an separate platform (ie not Matrix users that are either local to the homeserver or share rooms with the server).
search_external_users could be a good middle ground, wdyt?

]

results["results"] = merged_results
results["limited"] = len(merged_results) > limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this would mean that if the database query yielded a limited result set we would only show results in the external backend if their user ID appears in the result set from the database. That's assuming we change the merge logic as suggested in #12318 (comment). Otherwise it means we don't show results (and presumably don't call the callback at all) unless the database doesn't manage to provide a limit amount of results.

This I guess would make sense even though it's not a super obvious behaviour at first. Though it sounds to me like it would fill a rather niche case; in my experience it would be more logical to expect the other way around, with the custom external backend being the source of truth and falling back to Synapse's generic search to fill in the blanks. I think doing it this way would be more helpful.

@janpawellek
Copy link

Cool, thank you for the PR 🥳

We also need this feature, thus your PR looks very promising. Keep up your good work!
Looking forward to get it into Synapse 😃

@DMRobertson DMRobertson added the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Aug 23, 2022
@clokep
Copy link
Member

clokep commented Oct 19, 2022

@Samonitari or @janpawellek -- do you wish to continue this PR? It looks like it needs some updates and the feedback addressed.

@Samonitari
Copy link
Author

Yes, but to be a bit more direct - and to not make any bloody estimation lies - sometime this year I'll do this.
I'll keep you updated, but the next two weeks I am consolidating two hosting servers into one - I am halfway there!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants