-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retention-period: Add archiving expired messages tool by retention pe… #2172
Conversation
Automated message from Dropbox CLA bot @kkanahin, it looks like you've already signed the Dropbox CLA. Thanks! |
ba76d27
to
542bc24
Compare
zerver/tests/test_retention.py
Outdated
ArchiveUserMessage.objects | ||
.filter(user_profile__realm=self.mit_realm) | ||
.order_by('id') | ||
.values_list('id', flat=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are kinda long; might be worth writing some more test helper functions to simplify repetitive stuff.
I feel like these tests use the pattern list(foo.order_by('id').values_list('id', flat=True))
where foo
is a query that maybe we should make the API here be e.g. sorted_ids(ArchiveUserMessage.objects .filter(user_profile__realm=self.mit_realm))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will split this tests
Thanks for working on this @kkanahin! I did a read through this, and it's looking really good; I posted one comment above on the tests; will review more closely later this week. To what extent do you have outstanding TODOs for this? The main things I notice are:
It's fine to work on those as added commits on top, though. |
@timabbott Your TODO items are correct and will be done in next iteration. |
ade8034
to
9c944ea
Compare
Should we use the Django ORM for the SQL queries here? |
@rishig Django orm doesn't support moving from one table to other in one query. It recieves all data then convert them to list (I mean bulk_create) and insert to the new table. In my methods it is done in one query withouth python processing. |
For this operation, which is probably going to be our single most expensive type of SQL operation that Zulip does, I think it totally makes sense to do it in SQL. |
Makes sense. Thanks! |
756ee71
to
d441c68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good @kkanahin! I posted several comments on the details; can you rebase past the recent models.py
comment changes to use typing.Text
instead of six.text_type
?
Also, it'd be great if you added a design document in docs/retention.md, with notes on the various design considerations that we've discussed; similar in level of detail to e.g. docs/conversion.md
.
MAILTO=root | ||
|
||
# Remove old archive data . Time is in UTC. | ||
0 2 * * * zulip cd /home/zulip/deployments/current && python manage.py remove_old_archive_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put these two cron jobs in the same file; that makes it much more clear that they and related and the relative time offsets are important.
Also, can you update this to by compliant with 207cf63 (i.e. run ./manage.py
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
zerver/lib/retention.py
Outdated
arc_attachments.delete() | ||
|
||
|
||
def delete_archived_data(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should have a clearer name, e.g. delete_expired_archived_data
(and similarly for the attachments
one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
self.assertEqual( | ||
ArchiveAttachment.objects.count(), | ||
0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where this verifies that the attachments are actually deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is checked do not deleting archive attachments if exists some messages(not archived) which is linked with this archive attachments. And deleting them if all related messages are archived and can be removed.
zproject/settings.py
Outdated
@@ -1075,3 +1075,4 @@ def get_secret(key): | |||
PROFILE_ALL_REQUESTS = False | |||
|
|||
CROSS_REALM_BOT_EMAILS = set(('feedback@zulip.com', 'notification-bot@zulip.com')) | |||
ARCHIVED_DATA_RETENTION_DAYS = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be user-configurable and thus should be added to DEFAULT_SETTINGS rather than set unconditionally here; see http://zulip.readthedocs.io/en/latest/settings.html for details on why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
zerver/tests/test_retention.py
Outdated
@@ -251,3 +288,39 @@ def test_archive_message_tool(self): | |||
self._check_archive_messages_ids_by_realm( | |||
exp_msgs_ids_dict['mit_msgs_ids'], self.mit_realm) | |||
|
|||
def test_moving_attachments_to_archive_withouth_removing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test name is mispelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
zerver/lib/retention.py
Outdated
# type: () -> None | ||
removing_attachments = Attachment.objects.filter( | ||
messages__isnull=True, id__in=ArchiveAttachment.objects.all()) | ||
removing_attachments.delete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this attachments_to_remove
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
dad56bf
to
b252053
Compare
@timabbott please review this pull request. I've added documentation page for retention feature and need your opinion about it. And could you review my answer for attachment test case. |
b252053
to
bfbd788
Compare
The document could use a pass for English grammar, but otherwise looks good; I can get another contributor to tweak that. I'll review the new code in detail in the next few days, but the one feature that I think this is still missing is a plan / tools for how we would unarchive some messages in the event that we discovered there had been a bug in this logic; it seems likely we'll eventually want to do that, and I'm thinking it might be wise to write the code+tests for it now rather than when we have incident. |
b5d760e
to
8b5f282
Compare
I've fixed English grammar. Regarding restoring tool I will develop it. |
2a49c81
to
09780c0
Compare
2f3bb49
to
870513d
Compare
@timabbott, I've rebased this PR. |
96730af
to
0159565
Compare
Now that we've merged the message deletion commit, can you rebase this to integrate with those changes @kkanahin? Thanks! |
@timabbott sure |
- Add tools for moving expired messages to archive tables. - Add tools for deleting expired messages from Message and UserMessage tables. - Add tests to check archiving and deleting tools. - Add manage command to archive expired messages. Fixes zulip#106
- Add archiving attachments to archiving tool. - Add tests. Fixes zulip#106
- Add retention period for archived data to django settings. - Add method for removing archived data. - Add manage command for removing old archived data - Add test to check removing archived data. Fixes zulip#106
- Add cron job for moving expired messages with attachments to archive tables. - Add cron job for removing old archived data from db and remove attachments files. Fixes zulip#106
- Add tool to restore archived messages. - Add tool to restore archived user_messages. - Add tool to restore archived attachments and archived manytomany related records. - Add management console command to restore archived data by realm. - Add documentation for restoring tools - Add tests for restoring tools. Fixes zulip#106
- Add additional comments. - Rephrase comments to make them more clear. - Fix typos. Fixes zulip#106
- Rewrite retention tools with `dry-run` option. Split queries to selection part and executable part. - Add `dry-run` option to retention managment commands. - Add tests for `dry-run` option. Fixes zulip#106
0159565
to
ce055a4
Compare
@timabbott rebased |
Heads up @kkanahin, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the |
Heads up @kkanahin, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the |
What's the current state here? |
This PR is mostly complete, but hasn't been integrated due to insufficient time for careful testing. It wouldn't be hard to rebase, but it's critical before we merge this that appropriate care is taken. The main concrete thing I know about is that we'll need to make an archive-version of the |
Closing in favor of the current version #12356. |
…riod
removing from original Message and UserMessage db tables.