Skip to content

Commit

Permalink
Deid 1346 pre commit hook should only detect pii and not entities (#7) (
Browse files Browse the repository at this point in the history
#10)

* Added a sensible default for the entity list

* removed the mention of endpoint and demo repo stuff

* removing unnecessary function

* Adding default in args

* Added back the demo repo related instructions, will address in a separate ticket

* keeping get_payload function, useful for blocked list and any other future configs

* updated formatting
  • Loading branch information
ketakipai authored Jan 10, 2023
1 parent e2dd7ec commit ad9a3e4
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 23 deletions.
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A pre-commit hook to check for PII in your code. The hook is configured in each repository you would like to scan for PII and runs automatically every time you commit to your repo. It will check for PII in all staged files.

Note that PII detection isn't done locally, instead any files part of the commit are sent via a POST request, either to a self-hosted instance of Private AI's PII detection container, or Private AI's cloud endpoint.
Note that PII detection isn't done locally, instead any files part of the commit are sent via a POST request, to a self-hosted instance of Private AI's PII detection container.

This integration only works with the 3.0 version of Private AI's container.

Expand All @@ -20,7 +20,7 @@ This integration requires an endpoint to make requests against. For instructions
```
repos:
- repo: https://github.com/privateai/pai-pre-commit-hook.git
rev: v1.0-beta
rev: v1.2-beta
hooks:
- id: pii-check
args:
Expand All @@ -38,7 +38,7 @@ repos:
```
4. Run 'pre-commit install' from inside the git repo where you want to use this hook.
5. Replace 'URL' with the url of where your container is hosted.\
eg. http://localhost:8080/v3/process_text for a container running locally or https://api.private-ai.com/deid/v3/process_text for Private AI's cloud endpoint.
eg. http://localhost:8080/v3/process_text for a container running locally.
6. Create a .env file and add your API_KEY like so:\
API_KEY=`<put your API KEY here>`
7. Replace 'ENV_FILE_PATH' with the path to your .env file.
Expand All @@ -65,7 +65,6 @@ After the above steps your project structure should look like this:
1 directory, 4 files
```


## Usage

The below steps describe how to use the hook on a sample repo provided by Private AI.
Expand All @@ -81,7 +80,7 @@ The below steps describe how to use the hook on a sample repo provided by Privat
`PII found - type: NAME_GIVEN, line number: 2, file: sample_code.py, start index: 28, end index: 34` \
`PII found - type: AGE, line number: 4, file: sample_code.py, start index: 9, end index: 11` \
`PII found - type: AGE, line number: 4, file: sample_code.py, start index: 13, end index: 15` \
`PII found - type: AGE, line number: 4, file: sample_code.py, start index: 17, end index: 19`
`PII found - type: AGE, line number: 4, file: sample_code.py, start index: 17, end index: 19`
6. Now let's add `PII_CHECK:OFF` and `PII_CHECK:ON` markers around both PII instances. You can add these as comments.
7. Run the git commit command again
8. The commit should now complete successfully
Expand Down
37 changes: 19 additions & 18 deletions pii_check/pii_check_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def get_payload(content, enabled_entity_list, blocked_list):
"accuracy": "high",
},
}

if enabled_entity_list:
payload["entity_detection"]["entity_types"] = [{"type": "ENABLE", "value": enabled_entity_list}]

Expand All @@ -38,10 +38,7 @@ def get_flagged_lines(files):
lines = fp.readlines()
start_flag = False
for number, line in enumerate(lines, 1):
if (
"PII_CHECK:OFF" in line.replace(" ", "").strip()
and not start_flag
):
if "PII_CHECK:OFF" in line.replace(" ", "").strip() and not start_flag:
start = number
start_flag = True
if "PII_CHECK:ON" in line.replace(" ", "").strip() and start_flag:
Expand Down Expand Up @@ -136,7 +133,19 @@ def main():
parser = argparse.ArgumentParser(description="pre-commit hook to check for PII")
parser.add_argument("--url", type=str, required=True)
parser.add_argument("--env-file-path", type=str, required=True)
parser.add_argument("--enabled-entities", type=str, nargs="+")
parser.add_argument(
"--enabled-entities",
type=str,
nargs="+",
default=[
"PASSWORD",
"BANK_ACCOUNT",
"CREDIT_CARD",
"CREDIT_CARD_EXPIRATION",
"CVV",
"ROUTING_NUMBER",
],
)
parser.add_argument("--blocked-list", type=str, nargs="+")
args = parser.parse_args()

Expand All @@ -146,19 +155,11 @@ def main():
if "API_KEY" in os.environ:
API_KEY = os.environ["API_KEY"]
else:
sys.exit("Your .env file is missing or does not contain API_KEY")
sys.exit("Your .env file is missing from the provided path or does not contain API_KEY")

enabled_entity_list = (
[item.upper() for item in args.enabled_entities]
if args.enabled_entities
else []
)

blocked_list = (
[blocked for blocked in args.blocked_list]
if args.blocked_list
else []
)
enabled_entity_list = [item.upper() for item in args.enabled_entities]

blocked_list = [blocked for blocked in args.blocked_list] if args.blocked_list else []

check_for_pii(args.url, API_KEY, enabled_entity_list, blocked_list)

Expand Down

0 comments on commit ad9a3e4

Please sign in to comment.