Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app: generic approach to prevent addition of duplicated api jobs #975

Merged
merged 7 commits into from
May 28, 2020

Conversation

redshiftzero
Copy link
Contributor

@redshiftzero redshiftzero commented Mar 21, 2020

Description

This is a generic solution for preventing the addition of duplicate jobs that we need for increasing the sync speed later, ref #738.

The basic idea is:

  • All API jobs inherit from a new class SingleObjectApiJob which has an attribute uuid which is the UUID that should be unique for that job: for source-related jobs like the source deletion job it's the source UUID, for the reply download job it's the reply UUID, and so on.
  • We check when adding jobs to the queue that the job isn't either enqueued or currently executing.
  • We use signals to trigger jobs to be added to the queue manager

Test Plan

  1. NUM_SOURCES=200 make dev (this is basically a number of sources that is large enough such that it will take more than 15 seconds to download all messages/replies, and duplicate jobs will be attempted to be added in the next 15 second sync period)
  2. Sign into the client with debug level logging (LOGLEVEL=DEBUG ./run.sh).

on master

Monitor the logs, you should see Added MessageDownloadJob('31ba1bc9-6687-4111-9826-776be899f45a', 5) to queue. During the next sync period, a bunch more Added MessageDownloadJob will occur for UUIDs that have already been added.

on this branch

Monitor the logs, you should not see duplicate jobs get added. You'll see: Added MessageDownloadJob('31ba1bc9-6687-4111-9826-776be899f45a', 5) to queue during the first sync, and then the next time that UUID appears, it should either be the download processing OR Duplicate job MessageDownloadJob('31ba1bc9-6687-4111-9826-776be899f45a', 5), skipping.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

  • I have tested these changes in the appropriate Qubes environment
  • I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
  • These changes should not need testing in Qubes

If these changes add or remove files other than client code, packaging logic (e.g., the AppArmor profile) may need to be updated. Please check as applicable:

  • I have submitted a separate PR to the packaging repo
  • No update to the packaging logic (e.g., AppArmor profile) is required for these changes
  • I don't know and would appreciate guidance

@redshiftzero redshiftzero added this to the Post-Beta milestone Mar 21, 2020
@redshiftzero redshiftzero changed the title app: prevent addition of duplicated api jobs app: generic approach to prevent addition of duplicated api jobs Mar 21, 2020
@redshiftzero
Copy link
Contributor Author

redshiftzero commented Mar 25, 2020

We also need this for the case where we have a very large number of either sources or submissions: any remaining jobs in the queue when the next sync kicks off with the current logic get added again, resulting in the unnecessary repeated redownload of items, potentially several times (for 15 second sync time and 1000 sources with 4 items per source conversation, this happened multiple times).

@redshiftzero redshiftzero force-pushed the prevent-duplicate-jobs branch from 3ae6d99 to e214328 Compare March 26, 2020 15:10
@redshiftzero redshiftzero changed the title app: generic approach to prevent addition of duplicated api jobs [wip] app: generic approach to prevent addition of duplicated api jobs Mar 26, 2020
@redshiftzero
Copy link
Contributor Author

redshiftzero commented Apr 2, 2020

@creviera in standup the other day made an important point (h/t to @rmol for pointing me to the relevant bit of code after standup) - for download jobs, the actual downloads won’t happen due to this logic, although the download jobs do get added (what I was seeing in my comment above).

As such, I was considering whether I should close or defer this PR, even though imho having duplicate jobs like this in the network queue is not great design, but before doing so I thought I’d do a quick investigation to determine - just how many unnecessary jobs will get added and how long does it take for them to get cleared out after initial sync in a realistic scenario?

For 1000 sources with 2 submissions and 2 replies each, here’s what happens (note this is not over tor and this is using the 0.1.6 tag):

11:04:09,775 - I login to the client with a clean database
11:04:23,093 - sync begins
11:05:08,105 - download jobs start getting added to the queue and begin to be processed
11:20:17,087 - all data is downloaded, now the duplicate jobs start getting processed
11:25:51,754 - duplicate jobs are finished processing, the queue is now available again for legitimate message/reply downloads

So the queue was blocked to message/reply download for ~5 minutes (in my test, 21220 unnecessary jobs were processed) while all these duplicated jobs were cleared out. The length of time that the queue is blocked to message/reply download will be longer if there were many more messages/replies, or if the time to download an object is longer as the number of jobs processed per sync period is lower, so more duplicate jobs will be added, which would be true if this were happening over tor.

As such, we should still enforce the constraint of disallowing duplicate jobs else legitimate message/reply downloads can be significant delayed (potentially many minutes) due to duplicate jobs consuming resources.

For the interested observer if you want to repeat this test I used this diff:

diff --git a/securedrop_client/api_jobs/downloads.py b/securedrop_client/api_jobs/downloads.py
index aab51d6..930491e 100644
--- a/securedrop_client/api_jobs/downloads.py
+++ b/securedrop_client/api_jobs/downloads.py
@@ -123,9 +123,11 @@ class DownloadJob(ApiJob):
         db_object = self.get_db_object(session)

         if db_object.is_decrypted:
+            logger.debug('OBJECT {} IS DECRYPTED, SKIPPING'.format(db_object.uuid))
             return db_object.uuid

         if db_object.is_downloaded:
+            logger.debug('OBJECT {} IS DOWNLOADED, SKIPPING'.format(db_object.uuid))
             self._decrypt(db_object.location(self.data_dir), db_object, session)
             return db_object.uuid

@redshiftzero redshiftzero force-pushed the prevent-duplicate-jobs branch 2 times, most recently from a60bf89 to 8e65cec Compare April 2, 2020 17:17
@redshiftzero redshiftzero removed this from the 0.2.0 milestone Apr 21, 2020
@eloquence
Copy link
Member

eloquence commented Apr 22, 2020

Given the potential for negative side effects of duplicate jobs on instances with large source counts, we're still treating this as a stretch goal for the 0.2.0 release during the 4/22-5/6 sprint, with @redshiftzero continuing to lead improvement of this PR.

@redshiftzero redshiftzero force-pushed the prevent-duplicate-jobs branch 2 times, most recently from 8a3f0d5 to c4ded3b Compare May 19, 2020 19:52
@redshiftzero redshiftzero changed the title [wip] app: generic approach to prevent addition of duplicated api jobs app: generic approach to prevent addition of duplicated api jobs May 19, 2020
@redshiftzero redshiftzero marked this pull request as ready for review May 19, 2020 20:49
@redshiftzero
Copy link
Contributor Author

(This is ready for review, but since it's a larger change, we should release 0.2.0 first)

@eloquence
Copy link
Member

^ Per the above, priority is on release (e.g. fixes for any issues discovered in QA); otherwise we've committed to a time-boxed (4 hour) review of this PR during the 5/20-6/3 sprint, which may or may not result in it getting merged.

Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass. plan to test and review more later after initial discussions begin.

UUID of the item (source, reply, submission, etc.) that this item
corresponds to. We track this to prevent the addition of duplicate jobs.
'''
self.uuid = uuid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, since we have an api job that performs a task on many objects, i.e. MetadataSyncJob, which gets all remote source data, we can't just extend ApiJob to have a uuid. the rest of our jobs are "single object" jobs, so they'll each have a uuid associated with them. this does get me thinking about batch jobs and checking for duplicate lists of uuids to prevent duplicate batch jobs, but we don't have to worry about this yet.

also, since this change prevents any SingleObjectApiJob from entering the queue if we're already performing an operation on that object, i.e. if a job with that object's uuid is already in the queue, we have to make sure we only ever derive from this class if we are sure there are no other type of operations that could happen on the same object simultaneously.

securedrop_client/api_jobs/updatestar.py Show resolved Hide resolved
Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add log lines to where we make sure we don't make an api call for jobs that have already been download and decrypted:

if db_object.is_decrypted:
return db_object.uuid
if db_object.is_downloaded:
self._decrypt(db_object.location(self.data_dir), db_object, session)
return db_object.uuid

This will be useful for debugging #1095 later.

sssoleileraaa
sssoleileraaa previously approved these changes May 27, 2020
Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sssoleileraaa sssoleileraaa dismissed their stale review May 27, 2020 19:20

we need to allow multiple star jobs

ensures that duplicate jobs of the same type with the same uuid
attribute cannot be added to the queue.

this commit also updates mypy such that we can use ignore[override]

Thread behavior on master -

Sync thread A:
A1: Sync
A2: Signal thread B to run B1
[…A1 runs on schedule every 15 seconds…]

GUI thread B:
B1: Add jobs for any items that are not downloaded by calling `ApiJobQueue.enqueue` which calls `RunnableQueue.add_job`

queue thread C:
C1: Wait until we have at least one job in the queue, and then grab the highest priority job in the queue (`RunnableQueue.queue.get`)
C2: If timeouts keep happening, add job again (`RunnableQueue._re_add_job` which calls `RunnableQueue.queue.put_nowait`)

Explanation -

While we are accessing the queue state from multiple threads (either B or C above), that’s OK because we’re using Python’s synchronized queue class. If we want to prevent duplicate jobs however, we need to ensure that two threads can’t add the same job, even if both threads try to add the same job at the same time.

In this diff, that is done by holding a mutex when we read or mutate the queue state. Here the queue state means:
*  `RunnableQueue.queue.queue` (“yo dawg...") which has a list of `(priority, job)` tuples containing the jobs that are _not_ in progress,
*  the job that’s in progress which is kept on `RunnableQueue.current_job`.

We hold this mutex when we call `RunnableQueue.add_job` / `RunnableQueue._re_add_job` and when we update `RunnableQueue.current_job`.
@redshiftzero redshiftzero force-pushed the prevent-duplicate-jobs branch from 5e60c0d to f9e85f0 Compare May 28, 2020 21:30
Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sssoleileraaa sssoleileraaa merged commit e88da72 into master May 28, 2020
@sssoleileraaa sssoleileraaa deleted the prevent-duplicate-jobs branch May 28, 2020 21:51
@eloquence eloquence mentioned this pull request Jul 8, 2020
4 tasks
@eloquence eloquence mentioned this pull request Jul 16, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants