Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retention-period: Add archiving expired messages tool by retention pe… #2172

Closed
wants to merge 8 commits into from

Conversation

kkanahin
Copy link
Collaborator

@kkanahin kkanahin commented Nov 1, 2016

…riod

  • Add models to store expired messages and expired user messages data after
    removing from original Message and UserMessage db tables.
  • Add django migrations to create tables for archive models.
  • Add tools for moving expired messages to archive tables.
  • Add tools for deleting expired messages from Message and UserMessage tables.
  • Add tests to check archiving and deliting tools.
  • Add manage command to archive expired messages.

@smarx
Copy link

smarx commented Nov 1, 2016

Automated message from Dropbox CLA bot

@kkanahin, it looks like you've already signed the Dropbox CLA. Thanks!

ArchiveUserMessage.objects
.filter(user_profile__realm=self.mit_realm)
.order_by('id')
.values_list('id', flat=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are kinda long; might be worth writing some more test helper functions to simplify repetitive stuff.

I feel like these tests use the pattern list(foo.order_by('id').values_list('id', flat=True)) where foo is a query that maybe we should make the API here be e.g. sorted_ids(ArchiveUserMessage.objects .filter(user_profile__realm=self.mit_realm))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will split this tests

@timabbott
Copy link
Member

Thanks for working on this @kkanahin! I did a read through this, and it's looking really good; I posted one comment above on the tests; will review more closely later this week.

To what extent do you have outstanding TODOs for this? The main things I notice are:

  • We still need to add the management command code for actually deleting old ArchivedFoo stuff, with support for a retention duration (would just make it a server setting in the zproject/settings.py defaults at 30 days or something; see docs/settings.md for details) of the ArchivedFoo tables.
  • We need to add ArchivedAttachment support (using Attachment.Messages to track whether the last message touching an attachment has been archived).

It's fine to work on those as added commits on top, though.

@kkanahin
Copy link
Collaborator Author

kkanahin commented Nov 2, 2016

@timabbott Your TODO items are correct and will be done in next iteration.

@kkanahin kkanahin force-pushed the retention_period branch 6 times, most recently from ade8034 to 9c944ea Compare November 7, 2016 10:56
@rishig
Copy link
Member

rishig commented Nov 8, 2016

Should we use the Django ORM for the SQL queries here?

@kkanahin
Copy link
Collaborator Author

kkanahin commented Nov 8, 2016

@rishig Django orm doesn't support moving from one table to other in one query. It recieves all data then convert them to list (I mean bulk_create) and insert to the new table. In my methods it is done in one query withouth python processing.

@timabbott
Copy link
Member

For this operation, which is probably going to be our single most expensive type of SQL operation that Zulip does, I think it totally makes sense to do it in SQL.

@rishig
Copy link
Member

rishig commented Nov 8, 2016

Makes sense. Thanks!

@kkanahin kkanahin force-pushed the retention_period branch 2 times, most recently from 756ee71 to d441c68 Compare November 9, 2016 15:50
Copy link
Member

@timabbott timabbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good @kkanahin! I posted several comments on the details; can you rebase past the recent models.py comment changes to use typing.Text instead of six.text_type?

Also, it'd be great if you added a design document in docs/retention.md, with notes on the various design considerations that we've discussed; similar in level of detail to e.g. docs/conversion.md.

MAILTO=root

# Remove old archive data . Time is in UTC.
0 2 * * * zulip cd /home/zulip/deployments/current && python manage.py remove_old_archive_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put these two cron jobs in the same file; that makes it much more clear that they and related and the relative time offsets are important.

Also, can you update this to by compliant with 207cf63 (i.e. run ./manage.py)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

arc_attachments.delete()


def delete_archived_data():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should have a clearer name, e.g. delete_expired_archived_data (and similarly for the attachments one)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

self.assertEqual(
ArchiveAttachment.objects.count(),
0
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this verifies that the attachments are actually deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is checked do not deleting archive attachments if exists some messages(not archived) which is linked with this archive attachments. And deleting them if all related messages are archived and can be removed.

@@ -1075,3 +1075,4 @@ def get_secret(key):
PROFILE_ALL_REQUESTS = False

CROSS_REALM_BOT_EMAILS = set(('feedback@zulip.com', 'notification-bot@zulip.com'))
ARCHIVED_DATA_RETENTION_DAYS = 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be user-configurable and thus should be added to DEFAULT_SETTINGS rather than set unconditionally here; see http://zulip.readthedocs.io/en/latest/settings.html for details on why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -251,3 +288,39 @@ def test_archive_message_tool(self):
self._check_archive_messages_ids_by_realm(
exp_msgs_ids_dict['mit_msgs_ids'], self.mit_realm)

def test_moving_attachments_to_archive_withouth_removing(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test name is mispelled

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# type: () -> None
removing_attachments = Attachment.objects.filter(
messages__isnull=True, id__in=ArchiveAttachment.objects.all())
removing_attachments.delete()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this attachments_to_remove.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@kkanahin kkanahin force-pushed the retention_period branch 4 times, most recently from dad56bf to b252053 Compare December 1, 2016 15:29
@kkanahin
Copy link
Collaborator Author

kkanahin commented Dec 1, 2016

@timabbott please review this pull request. I've added documentation page for retention feature and need your opinion about it. And could you review my answer for attachment test case.
Also I've added front-end part for retention period in admin settings.

@timabbott
Copy link
Member

The document could use a pass for English grammar, but otherwise looks good; I can get another contributor to tweak that.

I'll review the new code in detail in the next few days, but the one feature that I think this is still missing is a plan / tools for how we would unarchive some messages in the event that we discovered there had been a bug in this logic; it seems likely we'll eventually want to do that, and I'm thinking it might be wise to write the code+tests for it now rather than when we have incident.

@kkanahin kkanahin force-pushed the retention_period branch 2 times, most recently from b5d760e to 8b5f282 Compare December 2, 2016 10:05
@kkanahin
Copy link
Collaborator Author

kkanahin commented Dec 2, 2016

I've fixed English grammar. Regarding restoring tool I will develop it.

@kkanahin kkanahin force-pushed the retention_period branch 2 times, most recently from 2a49c81 to 09780c0 Compare December 12, 2016 06:10
@kkanahin
Copy link
Collaborator Author

@timabbott, I've rebased this PR.

@kkanahin kkanahin force-pushed the retention_period branch 2 times, most recently from 96730af to 0159565 Compare May 4, 2017 06:55
@timabbott
Copy link
Member

Now that we've merged the message deletion commit, can you rebase this to integrate with those changes @kkanahin? Thanks!

@kkanahin
Copy link
Collaborator Author

@timabbott sure

kkanahin added 8 commits May 30, 2017 14:24
- Add tools for moving expired messages to archive tables.
- Add tools for deleting expired messages from Message and UserMessage tables.
- Add tests to check archiving and deleting tools.
- Add manage command to archive expired messages.

Fixes zulip#106
- Add archiving attachments to archiving tool.
- Add tests.

Fixes zulip#106
- Add retention period for archived data to django settings.
- Add method for removing archived data.
- Add manage command for removing old archived data
- Add test to check removing archived data.

Fixes zulip#106
- Add cron job for moving expired messages with attachments
  to archive tables.
- Add cron job for removing old archived data from db and
  remove attachments files.

Fixes zulip#106
- Add tool to restore archived messages.
- Add tool to restore archived user_messages.
- Add tool to restore archived attachments and
  archived manytomany related records.
- Add management console command to restore archived
  data by realm.
- Add documentation for restoring tools
- Add tests for restoring tools.

Fixes zulip#106
- Add additional comments.
- Rephrase comments to make them more clear.
- Fix typos.

Fixes zulip#106
- Rewrite retention tools with `dry-run` option. Split queries
  to selection part and executable part.
- Add `dry-run` option to retention managment commands.
- Add tests for `dry-run` option.

Fixes zulip#106
@kkanahin kkanahin force-pushed the retention_period branch from 0159565 to ce055a4 Compare May 30, 2017 09:20
@kkanahin
Copy link
Collaborator Author

@timabbott rebased

@zulipbot
Copy link
Member

Heads up @kkanahin, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

@zulipbot
Copy link
Member

zulipbot commented Nov 21, 2017

Heads up @kkanahin, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

@DoCode
Copy link

DoCode commented Aug 10, 2018

What's the current state here?

@timabbott
Copy link
Member

timabbott commented Aug 10, 2018

This PR is mostly complete, but hasn't been integrated due to insufficient time for careful testing. It wouldn't be hard to rebase, but it's critical before we merge this that appropriate care is taken.

The main concrete thing I know about is that we'll need to make an archive-version of the Reaction model.

@timabbott
Copy link
Member

Closing in favor of the current version #12356.

@timabbott timabbott closed this May 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants