Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google drive permission sync cleanup #2749

Merged
merged 1 commit into from
Oct 9, 2024
Merged

Conversation

hagen-danswer
Copy link
Contributor

Description

Removed the trashed check since there is no trash check at the connector level either

Copy link

vercel bot commented Oct 9, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 9, 2024 9:02pm

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This pull request removes the trashed check from the Google Drive permission sync process in the doc_sync.py file, potentially affecting how document permissions are handled for deleted or inaccessible files.

  • Removed trashed check in _fetch_permissions_paginated function, which may lead to fetching permissions for inaccessible documents
  • Adjusted error handling to log warnings for 404 (not found) and 403 (access denied) HTTP errors
  • Renamed file_id to drive_file_id for consistency throughout the file
  • Refactored permission fetching logic for improved readability and maintainability

1 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings

Comment on lines 117 to 119
google_drive_creds, _ = get_google_drive_creds(
raw_credentials_json, scopes=FETCH_PERMISSIONS_SCOPES
credentials_json,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The 'scopes' parameter has been removed from the get_google_drive_creds call. Verify that the necessary scopes are still being applied.

@@ -72,25 +71,6 @@ def _fetch_permissions_paginated(
) -> Iterator[dict[str, Any]]:
next_token = None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this check because if the connector is indexing trashed documents, then permissions should be applied to those documents.

If we want to filter out trashed documents, it should be done at the connector level

@@ -99,18 +79,25 @@ def _fetch_permissions_paginated(
drive_service.permissions()
.list(
fileId=drive_file_id,
fields="permissions(id, emailAddress, role, type, domain)",
fields="permissions(emailAddress, type, domain)",
supportsAllDrives=True,
pageToken=next_token,
)
.execute()
)
)()
except HttpError as e:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not to raise or break for these errored responses is up for debate

@@ -123,12 +110,12 @@ def _fetch_permissions_paginated(
def _fetch_google_permissions_for_document_id(
db_session: Session,
drive_file_id: str,
raw_credentials_json: dict[str, str],
credentials_json: dict[str, str],
company_google_domains: list[str],
) -> ExternalAccess:
# Authenticate and construct service
google_drive_creds, _ = get_google_drive_creds(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we let the get_google_drive_creds function determine the scopes (it already supports this)

@@ -123,12 +110,12 @@ def _fetch_permissions_paginated(
def _fetch_google_permissions_for_document_id(
db_session: Session,
drive_file_id: str,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer variable name

Copy link
Contributor

@yuhongsun96 yuhongsun96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hagen-danswer hagen-danswer added this pull request to the merge queue Oct 9, 2024
Merged via the queue into main with commit 804de32 Oct 9, 2024
7 checks passed
@hagen-danswer hagen-danswer deleted the google-drive-perm-sync-fix branch October 17, 2024 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants