app: generic approach to prevent addition of duplicated api jobs #975

redshiftzero · 2020-03-21T00:48:01Z

Description

This is a generic solution for preventing the addition of duplicate jobs that we need for increasing the sync speed later, ref #738.

The basic idea is:

All API jobs inherit from a new class SingleObjectApiJob which has an attribute uuid which is the UUID that should be unique for that job: for source-related jobs like the source deletion job it's the source UUID, for the reply download job it's the reply UUID, and so on.
We check when adding jobs to the queue that the job isn't either enqueued or currently executing.
We use signals to trigger jobs to be added to the queue manager

Test Plan

NUM_SOURCES=200 make dev (this is basically a number of sources that is large enough such that it will take more than 15 seconds to download all messages/replies, and duplicate jobs will be attempted to be added in the next 15 second sync period)
Sign into the client with debug level logging (LOGLEVEL=DEBUG ./run.sh).

on `master`

Monitor the logs, you should see Added MessageDownloadJob('31ba1bc9-6687-4111-9826-776be899f45a', 5) to queue. During the next sync period, a bunch more Added MessageDownloadJob will occur for UUIDs that have already been added.

on this branch

Monitor the logs, you should not see duplicate jobs get added. You'll see: Added MessageDownloadJob('31ba1bc9-6687-4111-9826-776be899f45a', 5) to queue during the first sync, and then the next time that UUID appears, it should either be the download processing OR Duplicate job MessageDownloadJob('31ba1bc9-6687-4111-9826-776be899f45a', 5), skipping.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

I have tested these changes in the appropriate Qubes environment
I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
These changes should not need testing in Qubes

If these changes add or remove files other than client code, packaging logic (e.g., the AppArmor profile) may need to be updated. Please check as applicable:

I have submitted a separate PR to the packaging repo
No update to the packaging logic (e.g., AppArmor profile) is required for these changes
I don't know and would appreciate guidance

redshiftzero · 2020-03-25T21:27:26Z

We also need this for the case where we have a very large number of either sources or submissions: any remaining jobs in the queue when the next sync kicks off with the current logic get added again, resulting in the unnecessary repeated redownload of items, potentially several times (for 15 second sync time and 1000 sources with 4 items per source conversation, this happened multiple times).

redshiftzero · 2020-04-02T15:56:41Z

@creviera in standup the other day made an important point (h/t to @rmol for pointing me to the relevant bit of code after standup) - for download jobs, the actual downloads won’t happen due to this logic, although the download jobs do get added (what I was seeing in my comment above).

As such, I was considering whether I should close or defer this PR, even though imho having duplicate jobs like this in the network queue is not great design, but before doing so I thought I’d do a quick investigation to determine - just how many unnecessary jobs will get added and how long does it take for them to get cleared out after initial sync in a realistic scenario?

For 1000 sources with 2 submissions and 2 replies each, here’s what happens (note this is not over tor and this is using the 0.1.6 tag):

11:04:09,775 - I login to the client with a clean database
11:04:23,093 - sync begins
11:05:08,105 - download jobs start getting added to the queue and begin to be processed
11:20:17,087 - all data is downloaded, now the duplicate jobs start getting processed
11:25:51,754 - duplicate jobs are finished processing, the queue is now available again for legitimate message/reply downloads

So the queue was blocked to message/reply download for ~5 minutes (in my test, 21220 unnecessary jobs were processed) while all these duplicated jobs were cleared out. The length of time that the queue is blocked to message/reply download will be longer if there were many more messages/replies, or if the time to download an object is longer as the number of jobs processed per sync period is lower, so more duplicate jobs will be added, which would be true if this were happening over tor.

As such, we should still enforce the constraint of disallowing duplicate jobs else legitimate message/reply downloads can be significant delayed (potentially many minutes) due to duplicate jobs consuming resources.

For the interested observer if you want to repeat this test I used this diff:

diff --git a/securedrop_client/api_jobs/downloads.py b/securedrop_client/api_jobs/downloads.py
index aab51d6..930491e 100644
--- a/securedrop_client/api_jobs/downloads.py
+++ b/securedrop_client/api_jobs/downloads.py
@@ -123,9 +123,11 @@ class DownloadJob(ApiJob):
         db_object = self.get_db_object(session)

         if db_object.is_decrypted:
+            logger.debug('OBJECT {} IS DECRYPTED, SKIPPING'.format(db_object.uuid))
             return db_object.uuid

         if db_object.is_downloaded:
+            logger.debug('OBJECT {} IS DOWNLOADED, SKIPPING'.format(db_object.uuid))
             self._decrypt(db_object.location(self.data_dir), db_object, session)
             return db_object.uuid

eloquence · 2020-04-22T22:44:51Z

Given the potential for negative side effects of duplicate jobs on instances with large source counts, we're still treating this as a stretch goal for the 0.2.0 release during the 4/22-5/6 sprint, with @redshiftzero continuing to lead improvement of this PR.

redshiftzero · 2020-05-19T20:52:25Z

(This is ready for review, but since it's a larger change, we should release 0.2.0 first)

eloquence · 2020-05-20T21:26:35Z

^ Per the above, priority is on release (e.g. fixes for any issues discovered in QA); otherwise we've committed to a time-boxed (4 hour) review of this PR during the 5/20-6/3 sprint, which may or may not result in it getting merged.

sssoleileraaa

did a first pass. plan to test and review more later after initial discussions begin.

sssoleileraaa · 2020-05-21T18:14:23Z

securedrop_client/api_jobs/base.py

+        UUID of the item (source, reply, submission, etc.) that this item
+        corresponds to. We track this to prevent the addition of duplicate jobs.
+        '''
+        self.uuid = uuid


i see, since we have an api job that performs a task on many objects, i.e. MetadataSyncJob, which gets all remote source data, we can't just extend ApiJob to have a uuid. the rest of our jobs are "single object" jobs, so they'll each have a uuid associated with them. this does get me thinking about batch jobs and checking for duplicate lists of uuids to prevent duplicate batch jobs, but we don't have to worry about this yet.

also, since this change prevents any SingleObjectApiJob from entering the queue if we're already performing an operation on that object, i.e. if a job with that object's uuid is already in the queue, we have to make sure we only ever derive from this class if we are sure there are no other type of operations that could happen on the same object simultaneously.

securedrop_client/api_jobs/updatestar.py

securedrop_client/api_jobs/base.py

sssoleileraaa

Could you add log lines to where we make sure we don't make an api call for jobs that have already been download and decrypted:

securedrop-client/securedrop_client/api_jobs/downloads.py

Lines 125 to 130 in 409db11

    
           if db_object.is_decrypted: 
        
               return db_object.uuid 
        
           if db_object.is_downloaded: 
        
               self._decrypt(db_object.location(self.data_dir), db_object, session) 
        
               return db_object.uuid

This will be useful for debugging #1095 later.

sssoleileraaa

lgtm

we need to allow multiple star jobs

securedrop_client/api_jobs/updatestar.py

ensures that duplicate jobs of the same type with the same uuid attribute cannot be added to the queue. this commit also updates mypy such that we can use ignore[override] Thread behavior on master - Sync thread A: A1: Sync A2: Signal thread B to run B1 […A1 runs on schedule every 15 seconds…] GUI thread B: B1: Add jobs for any items that are not downloaded by calling `ApiJobQueue.enqueue` which calls `RunnableQueue.add_job` queue thread C: C1: Wait until we have at least one job in the queue, and then grab the highest priority job in the queue (`RunnableQueue.queue.get`) C2: If timeouts keep happening, add job again (`RunnableQueue._re_add_job` which calls `RunnableQueue.queue.put_nowait`) Explanation - While we are accessing the queue state from multiple threads (either B or C above), that’s OK because we’re using Python’s synchronized queue class. If we want to prevent duplicate jobs however, we need to ensure that two threads can’t add the same job, even if both threads try to add the same job at the same time. In this diff, that is done by holding a mutex when we read or mutate the queue state. Here the queue state means: * `RunnableQueue.queue.queue` (“yo dawg...") which has a list of `(priority, job)` tuples containing the jobs that are _not_ in progress, * the job that’s in progress which is kept on `RunnableQueue.current_job`. We hold this mutex when we call `RunnableQueue.add_job` / `RunnableQueue._re_add_job` and when we update `RunnableQueue.current_job`.

sssoleileraaa

lgtm

redshiftzero added this to the Post-Beta milestone Mar 21, 2020

redshiftzero changed the title ~~app: prevent addition of duplicated api jobs~~ app: generic approach to prevent addition of duplicated api jobs Mar 21, 2020

redshiftzero added the scalability label Mar 25, 2020

redshiftzero force-pushed the prevent-duplicate-jobs branch from 3ae6d99 to e214328 Compare March 26, 2020 15:10

redshiftzero changed the title ~~app: generic approach to prevent addition of duplicated api jobs~~ [wip] app: generic approach to prevent addition of duplicated api jobs Mar 26, 2020

eloquence modified the milestones: Post-Initial Beta (during the pilot), 0.2.0 Apr 1, 2020

redshiftzero force-pushed the prevent-duplicate-jobs branch 2 times, most recently from a60bf89 to 8e65cec Compare April 2, 2020 17:17

redshiftzero removed this from the 0.2.0 milestone Apr 21, 2020

redshiftzero force-pushed the prevent-duplicate-jobs branch 2 times, most recently from 8a3f0d5 to c4ded3b Compare May 19, 2020 19:52

redshiftzero changed the title ~~[wip] app: generic approach to prevent addition of duplicated api jobs~~ app: generic approach to prevent addition of duplicated api jobs May 19, 2020

redshiftzero marked this pull request as ready for review May 19, 2020 20:49

redshiftzero requested review from sssoleileraaa and kushaldas as code owners May 19, 2020 20:49

sssoleileraaa reviewed May 21, 2020

View reviewed changes

securedrop_client/api_jobs/base.py Show resolved Hide resolved

eloquence mentioned this pull request May 22, 2020

securedrop-client 0.2.0 #1083

Merged

29 tasks

redshiftzero added this to the 0.3.0 milestone May 26, 2020

eloquence mentioned this pull request May 26, 2020

During reply download, seeing a lot of "Could not emit reply_download_failed: 'NoResultFound' object has no attribute 'uuid'" errors #1095

Open

sssoleileraaa self-requested a review May 26, 2020 22:20

sssoleileraaa suggested changes May 26, 2020

View reviewed changes

sssoleileraaa previously approved these changes May 27, 2020

View reviewed changes

sssoleileraaa suggested changes May 27, 2020

View reviewed changes

securedrop_client/api_jobs/updatestar.py Show resolved Hide resolved

redshiftzero added 7 commits May 28, 2020 17:30

app: use condition variable to wait until queue is not empty

5d239b3

app, test: use signals/slots to trigger adding jobs to queue

c77e88d

deps: keep Linux pip deps in sync with mac pip deps

de7e089

app: RunnableQueue.add_job doesn't need to be a slot

9ea037b

app: add additional logging for early returns in *DownloadJob objs

4a0690b

app: allow duplicate star jobs

f9e85f0

redshiftzero force-pushed the prevent-duplicate-jobs branch from 5e60c0d to f9e85f0 Compare May 28, 2020 21:30

sssoleileraaa approved these changes May 28, 2020

View reviewed changes

sssoleileraaa merged commit e88da72 into master May 28, 2020

sssoleileraaa deleted the prevent-duplicate-jobs branch May 28, 2020 21:51

eloquence mentioned this pull request Jul 8, 2020

Release SecureDrop Client 0.2.1 #1125

Closed

4 tasks

eloquence mentioned this pull request Jul 16, 2020

SecureDrop Client 0.2.1 #1127

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app: generic approach to prevent addition of duplicated api jobs #975

app: generic approach to prevent addition of duplicated api jobs #975

redshiftzero commented Mar 21, 2020 •

edited

Loading

redshiftzero commented Mar 25, 2020 •

edited

Loading

redshiftzero commented Apr 2, 2020 •

edited

Loading

eloquence commented Apr 22, 2020 •

edited

Loading

redshiftzero commented May 19, 2020

eloquence commented May 20, 2020

sssoleileraaa left a comment

sssoleileraaa May 21, 2020

sssoleileraaa left a comment

sssoleileraaa left a comment

sssoleileraaa left a comment

	if db_object.is_decrypted:
	return db_object.uuid

	if db_object.is_downloaded:
	self._decrypt(db_object.location(self.data_dir), db_object, session)
	return db_object.uuid

app: generic approach to prevent addition of duplicated api jobs #975

app: generic approach to prevent addition of duplicated api jobs #975

Conversation

redshiftzero commented Mar 21, 2020 • edited Loading

Description

Test Plan

on master

on this branch

Checklist

redshiftzero commented Mar 25, 2020 • edited Loading

redshiftzero commented Apr 2, 2020 • edited Loading

eloquence commented Apr 22, 2020 • edited Loading

redshiftzero commented May 19, 2020

eloquence commented May 20, 2020

sssoleileraaa left a comment

Choose a reason for hiding this comment

sssoleileraaa May 21, 2020

Choose a reason for hiding this comment

sssoleileraaa left a comment

Choose a reason for hiding this comment

sssoleileraaa left a comment

Choose a reason for hiding this comment

sssoleileraaa left a comment

Choose a reason for hiding this comment

redshiftzero commented Mar 21, 2020 •

edited

Loading

on `master`

redshiftzero commented Mar 25, 2020 •

edited

Loading

redshiftzero commented Apr 2, 2020 •

edited

Loading

eloquence commented Apr 22, 2020 •

edited

Loading