-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues & some fixes #375
Comments
Cool, thanks for doing this investigation @dbiollo ! Are you intending to develop and submit PRs for any of these issues? I'm happy to provide advice on how to best address some of these... |
Here's some notes on these various items: (1) Case-insensitive index on UserProfile.email. I believe that the index in (2) Realm.get_admin_users* Zulip is not using the groups support of Guardian, so I think that would be a fine patch to add. (3) Filtering UserProfile on is_active, is_bot Those seem like reasonable indexes to add. (4) Caching entire company directory can break memcache Roughly how many users are required for this to become an issue? I think the places where this is used do require all the users in the company, so we'd either need to change the UI for larger realms or do something else reasonable (e.g. if this were the only issue we could get a multiplicative factor by compressing or encoding in something more efficient than JSON strings) (5) alert_words_in_realm fetches too much data alert_words is not a heavily used feature, so that approach would be reasonable. (6) DevAuth feature makes Login page too slow Capping the query at 100 seems very reasonable here. (7) json_events_register - unused?
(8) realm_user_count - replace python object count with DB count query I think we may want to be thoughtful about this one -- the current code doesn't talk to the database because of the memcached entry discussed in (4). I think you're right that regardless the current check is inefficient, but |
Also, @dbiollo if the way you're testing for these issues is something we could script and include in the automated tests that we run on every Zulip change, that would be awesome -- I always find it's best with performance work to have tools making sure issues don't regress in the future :) |
Hey @timabbott, my plan is to eventually plugin this into performance tests. No Pull Requests expected for a while; though I have submitted the CLA for my company to approve. Hopefully by January :/ Feel free to incorporate any of the non-controversial changes. RE: (3) - The indexes in 0003_custom_indexes are great but only apply to zerver_message & zerver_stream. The same case-insensitive index is needed for zerver_userprofile and probably zerver_realm, zerver_preregistrationuser given that there are case-insensitive queries of the form: "xxxx__iexact". RE: (4) - It seems to break around 10-12K users. Was wondering if you think that Streams & Realms should have some configurable or hard coded limits to subscription size / membership, at least until such time as the scale issues are all fixed? For example, a Realm with 10K users and some public streams with all of those 10K users will kill the system ... I've been adding LIMIT 1000 in a lot of places, which makes it responsive enough. |
OK, good to know; time permitting I'll probably get to several of these before January. Regarding limits, I definitely think having (configurable) limits would be a valuable feature to help prevent people from accidentally getting themselves into serious trouble! |
@dbiollo I opened #392 with versions of the fixes for issues 1, 2, 3, 5, 6, 7, and 8; I'd love it if you could take a quick look at it and test it in your environment! For (4), I think we may need that cache for performance reasons in an actually active realm. It could be valuable to look at whether it makes sense to e.g. gzip-compress the content; I should also note that if we merge #378, we won't have memcached's 1MB limit of object sizes. |
@dbiollo so looking at issue 4, I think there's a few things that would need to happen if we wanted to remove that cache: (A) I pushed a small commit to https://github.com/timabbott/zulip/tree/tabbott-user-dicts removing some of the users of get_active_user_dicts_for_realm (B) With that change, most of the zerver/lib/actions.py accesses can just use (C) The bugdown use of it is for matching users by short or long name in @-mentions. When we solve #374, that should resolve this issue since we shouldn't need to be doing a lookup through all full names. (D) The call in streams.py is legacy code predating the notifications_stream feature (instead sending a PM to every user!) that we should probably just remove at this point. (E) We'd still need to use the cache for I've posted #442 to handle (A) and (D) since they're unconditional improvements to the code that don't involve a complex transition. |
@dbiollo How are things going? :) In case you'd like to talk synchronously about this: We're hosting a livechat "office hour" at the open Zulip realm https://zulip.tabbott.net on Monday, August 15th, 10am-11am Pacific Time / 1pm-2pm Eastern Time / 10:30-11:30pm India time. Please come by if you want to talk about these fixes in realtime -- or about the experimental new desktop app, helping us figure out what new users need to know, replacing memcached with Redis, etc. We'd be happy to see you. |
I've been testing out zulip with large data sets (1000's of users, streams, etc). While some of the performance problems require UI design changes, there are a few optimizations that can be made directly on the backend. Here the small ones I've found so far:
1. Case-insensitive index on UserProfile.email.
Code like the following does a O(n) lookup on the UserProfile table:
All it needs for indexed-based DB look up is an index like what is in 0003_custom_indexes.py:
There might be other tables (prereg user?) that need this index. Alternatively, Postgres has a CITEXT column type for Case-Insensitive text.
2. Realm.get_admin_users*
Turns out that this authorization check is rather expensive when you have a lot of users. Primarily slow due to an OR with the group_permissions table. I could be wrong but it appears that Zulip doesn't use the group permissions of guardian. Best to check, but adding the "with_group_users=False" made this an order of magnitude faster:
3. Filtering UserProfile on is_action, is_bot
Indexing these columns helps performance in a few cases, as it is a common filter. Needs corresponding migration file.
4. Caching entire company directory can break memcache
Memcache has a reasonable 1MB size limit per cached object. When you have a company with 1000's of users, the following caching code breaks, getting error 37 from Memcache. It's more scalable to refactor the code to not fetch all company users in one go, but at least to fix the broken memcache, the cache on this function could simply be disabled:
5. alert_words_in_realm fetches too much data
The default behaviour of this function is to fetch all records for the realm, which is a lot of data for large companies. Assuming that usage of Alert Words is sparse, then filtering to exclude users who don't have any results in a much faster query.
6. DevAuth feature makes Login page too slow
DevAuthBackend is pretty handy, but the login page will be pretty much unusable once you get a few thousand UserProfiles in the DB. Makes sense to cap the query at 100? Also sorting by email can be done on the DB as a minor improvement.
7. json_events_register - unused?
This function might not be used? At least that's what my old comment says (in my local copy).
zulip/zerver/views/__init__.py
Line 1157 in e957399
8. realm_user_count - replace python object count with DB count query
This will be much faster for large realms.
The text was updated successfully, but these errors were encountered: