Skip to content

Commit

Permalink
Tooling for automated detection of malware (pypi#7377)
Browse files Browse the repository at this point in the history
* Add new models for malware detection. (pypi#7118)

* Add new models for malware detection.

Fixes pypi#7090 and pypi#7092.

* Code review changes.

- FK on release_file.id field instead of md5
- Change message type from String to Text
- Change Enum class in model to singular form

* Add admin interface to view and enable checks (pypi#7134)

* Add admin interface to view and enable checks

- Implement list, detail and change_state views (pypi#7133)
- Add unit tests for check admin view

* Add comprehensive test coverage for check admin

* Add initial hook-based check execution mechanism (pypi#7160)

* Add initial hook-based check execution mechanism

* scratch/poc

* Add initial hook-based check execution mechanism

* Use sqlalchemy event hooks for malware checks

* Fix unit tests

* Add enum for MalwareCheckObjectType

* Add unit tests for init.

* Add tests for tasks, services, and utils.

Also, some small bugfixes in MalwareCheckFactory and the
get_enabled_checks method.

* Fix spurious task test.

* Add missing drop enum to downgrade function.

* Added TODO to dev/environment

* Be more explicit in check lookup

Co-authored-by: Ernest W. Durbin III <ewdurbin@gmail.com>

* Add malware check syncing mechanism (pypi#7190)

* Add malware check syncing mechanism

* Code review changes.

* Refactor MalwareCheckBase. Fixes pypi#7091. (pypi#7196)

* Refactor MalwareCheckBase. Fixes pypi#7091.

Add Foreign Keys in MalwareVerdicts for other types of objects
(Releases, Projects).

* Change verdict dict to kwargs.

* Add wipe-out functionality (pypi#7202)

* Add wipe-out functionality

Related: pypi#7133

* Call list explicitly

* Add rudimentary verdicts view. Progress on pypi#6062. (pypi#7207)

* Add rudimentary verdicts view. Progress on pypi#6062.

Also, add some better testing logic for wiped_out condition.

* Code review changes.

- Conditionally show fields that are populated
- JSON pretty formatting

* Fix unit test bug.

- Use `get` instead of `filter` to look up verdict by pkey.

* simplify unit tests for verdicts view

* introduce malware queue (pypi#7227)

* introduce malware queue

* correct syntax, apparently list of tuples documented doesn't work.

* Add backfill functionality to check admin pypi#7094 (pypi#7232)

* Add backfill functionality to check admin pypi#7094

- Add backfill task
- Change lookup of checks to check_name instead of id
- Load checks that are also in "evaluation" state

* Add unit tests for backfill.

- Log number of runs executed by backfill
- Perform basic validation on sample_rate input
- Clean up other testing logic.

* Remove superfluous 'all()'

* Code review changes.

- Set backfill size to a fix number, not configurable via web ui.
- Backfill task enqueues run_check tasks
- Only retry if `check.run` fails, not if loading the check fails.
- Use exponential backoff for retries.

* Update warehouse/admin/templates/admin/malware/checks/detail.html

Co-Authored-By: Ernest W. Durbin III <ewdurbin@gmail.com>

Co-authored-by: Ernest W. Durbin III <ewdurbin@gmail.com>

* Refactor testing logic pypi#7098 (pypi#7257)

- Add `schedule` field to MalwareCheck model pypi#7096
- Move ExampleCheck into tests/common/ to remove test dependency from
prod code
- Rename functions and classes to differentiate between "hooked" and
"scheduled" checks

* Event-based Malware check (pypi#7249)

* requirements: Introduce yara

* [WIP] malware/check: SetupPatternCheck

In progress.

Introduces SetupPatternCheck, an implementation of an event-based
check that scans the `setup.py`s of release files for suspicious
patterns.

* malware/checks: Give MalwareCheckBase.run/scan args, kwargs

* malware: Add check preparation

Fiddle with the check/run signature a bit more.

* malware/checks: Unpack file path correctly

* docker-compose: Override FILES_BACKEND for worker

The worker needs to be able to see the "files" virtual host
during development so that malware checks can fetch their underlying
release files.

* [WIP] malware/checks: setup.py extraction

* malware/checks: setup_patterns: Fix enum, seek

* malware/checks: setup_patterns: Apply YARA rules

Each rule match becomes a verdict.

* malware/checks: setup_patterns: Prefer get over filter

* warehouse/{admin,malware}: Consistent enum names

Also enforce uniqueness for enum values.

* warehouse/{admin,malware}: More enum changes

* tests: Update admin, malware tests

* tests: Fix enum, more test fixes

* tests: Add prepare tests

* malware/changes: base: Unpack id correctly

* tests: Begin adding SetupPatternCheck tests

* malware/checks: setup_patterns: Fix enum

* tests: More SetupPatternCheck tests

* warehouse/malware: setup_patterns: Fix enums

* tests: More SetupPatternCheck tests

* tests: Add license header

* malware/checks: setup_patterns: Add TODO

* tests: More SetupPatternCheck tests

* tests: More SetupPatternCheck tests

* tests: Complete extraction tests for SetupPatternCheck

* tests: Fix test

* malware/checks: Add docstring for prepare

* malware/checks: blacken

* malware/checks: Document, expand YARA rules

* tests, warehouse: Restructure utilities

* malware: Order some enums, reduce SetupPatternCheck verdicts

* malware/models: Add missing __lt__

* malware/checks: Always embed the model object in the prepared arguments

Use it instead of performing a DB request in the check itself.

* malware/checks: Avoid raw bytes

* malware/changes: Remove unused import

* tests: Fixup malware tests

* warehouse/malware: blacken

* tests: Fill in malware coverage

* tests, warehouse: Add a benign verdict for SetupPatternCheck

* tests: blacken

* Implement scheduled checks pypi#7093 (pypi#7271)

* Implement scheduled checks pypi#7093

- Rename `run_backfill` to `run_evaluation` in admin malware view
- Modify `run` and `scan` method signatures to accept `**kwargs`
- Extend `run_check` to accomodate scheduled check functionality

* Reduce unit test flakiness

* Code review changes.

Also replace `check.hooked_object` with `check.hooked_object.value` in
check detail template.

* tests, warehouse: enum fixes

* Fix lint error

Co-authored-by: William Woodruff <william@yossarian.net>

*  Add verdicts view filtering capabilities pypi#6062. (pypi#7322)

*  Add verdicts view filtering capabilities pypi#6062.

* Code review changes.

- Refactor tests to be parametrized.
- Pass `_query` to `route_path` in template.
- Remove `is None` from filter query, it adds nothing.

* Add verdict administrator review. Fixes pypi#6062. (pypi#7339)

* Add verdict administrator review. Fixes pypi#6062.

- Add new `admin.verdicts.review` endpoint
- Change layout of verdict list and detail view and add forms
- Change sort order of the MalwareChecks, and update the tests

* Code review changes.

- Rename MalwareVerdict field `administrator_verdict` to `reviewer_verdict`.
- Change verdict review permission from `admin` to `moderator`.

* Misc cleanup and TODOs on malware checks. (pypi#7355)

* Misc cleanup and TODOs on malware checks.

    - Change backfill function to invoke `IMalwareCheckService` interface
    - Add support for `kwargs to `IMalwareCheckService` interface
    - Rename variable from reserved word `file` to `release_file`
    - Add `FatalCheckException` for non-retryable exceptions
    - Replace `MALWARE_CHECK_BACKEND` in dev/environment

* Make `IMalwareService` the entrypoint for `run_check`

- Add `run_scheduled_check` task that invokes this interface.
- Remove useless utility method
- Move `FatalCheckException` into warehouse/malware/errors.py.

* malware/checks: PackageTurnover skeleton (pypi#7321)

* malware/checks: PackageTurnover skeleton

* malware/checks: PackageTurnover: Add NOTE

* malware/checks: PackageTurnoverCheck: more work

* tests: blacken

* malware/checks: More PackageTurnoverCheck work

* malware/checks: Blacken

* malware/checks: Blacken

* package_turnover: Promote from indeterminate to threat

* tests: Begin adding package_turnover tests

* tests: Add remaining package_turnover tests

* tests: Drop unused imports

* warehouse: Drop (ww) from NOTE

* checks/package_turnover: Drop NOTE

Co-authored-by: Cristina <hi@xmunoz.com>
Co-authored-by: William Woodruff <william@yossarian.net>
  • Loading branch information
3 people authored Feb 18, 2020
1 parent 3f0d4e0 commit 557ca0e
Show file tree
Hide file tree
Showing 59 changed files with 4,174 additions and 4 deletions.
1 change: 1 addition & 0 deletions Procfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ release: bin/release
web: bin/start-web python -m gunicorn.app.wsgiapp -c gunicorn.conf.py warehouse.wsgi:application
web-uploads: bin/start-web python -m gunicorn.app.wsgiapp -c gunicorn-uploads.conf.py warehouse.wsgi:application
worker: bin/start-worker celery -A warehouse worker -Q default -l info --max-tasks-per-child 32
worker-malware: bin/start-worker celery -A warehouse worker -Q malware -l info --max-tasks-per-child 32
worker-beat: bin/start-worker celery -A warehouse beat -S redbeat.RedBeatScheduler -l info
3 changes: 3 additions & 0 deletions bin/release
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ set -eo pipefail

# Migrate our database to the latest revision.
python -m warehouse db upgrade head

# Insert/upgrade malware checks.
python -m warehouse malware sync-checks
2 changes: 2 additions & 0 deletions dev/environment
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ MAIL_BACKEND=warehouse.email.services.SMTPEmailSender host=smtp port=2525 ssl=fa

BREACHED_PASSWORDS=warehouse.accounts.NullPasswordBreachedService

MALWARE_CHECK_BACKEND=warehouse.malware.services.PrinterMalwareCheckService

METRICS_BACKEND=warehouse.metrics.DataDogMetrics host=notdatadog

STATUSPAGE_URL=https://2p66nmmycsj3.statuspage.io
Expand Down
1 change: 1 addition & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ services:
env_file: dev/environment
environment:
C_FORCE_ROOT: "1"
FILES_BACKEND: "warehouse.packaging.services.LocalFileStorage path=/var/opt/warehouse/packages/ url=http://files:9001/packages/{path}"
links:
- db
- redis
Expand Down
1 change: 1 addition & 0 deletions requirements/main.in
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,6 @@ typeguard
webauthn
whitenoise
WTForms>=2.0.0
yara-python
zope.sqlalchemy
zxcvbn
14 changes: 14 additions & 0 deletions requirements/main.txt
Original file line number Diff line number Diff line change
Expand Up @@ -594,6 +594,20 @@ wired==0.2.1 \
wtforms==2.2.1 \
--hash=sha256:0cdbac3e7f6878086c334aa25dc5a33869a3954e9d1e015130d65a69309b3b61 \
--hash=sha256:e3ee092c827582c50877cdbd49e9ce6d2c5c1f6561f849b3b068c1b8029626f1
yara-python==3.11.0 \
--hash=sha256:105d851e050b32951ee577148c7f1b18c0a7c64432fef8159069191d522fba86 \
--hash=sha256:1d35c7f606465015de02143dfa4e1ad2f4ee85fdb5d5af756b51b2bac62ac7bc \
--hash=sha256:24cd492d6bf8ecedb128f5b02886770be9df03bd1b84ab06a978d45bb1a8ff92 \
--hash=sha256:58cfc837e7769811afbfb19b1db952ec01e50cdbf9df576fb587e1e343694526 \
--hash=sha256:5b8d708751a66d1507d819218d06baccdf5527c147c2bd3062f087e2f367a17d \
--hash=sha256:6f90bb264470235549e1bb4e355fa82895409cd46f27aceecaddfbf55e66ed71 \
--hash=sha256:70d39c2238c5854e7cd8f11595317dc4d89417e88035d8acca24bcc58a93150f \
--hash=sha256:8d255349d69d833bca604b4215bdf499c87357172512273feb934f6442b8e6b2 \
--hash=sha256:8e44f9600607cb1d74a0f26df5d0a1c06ea54f4601206124f47f1bbb58e6a374 \
--hash=sha256:9e4fafc327e3a343c545dcf5f173fa8bc712aebffe5f034d205c0bac1f1c5df6 \
--hash=sha256:c919ee656139ed46a0056e8a3de179bbc98d42a2be6fb85c95b1e2ec65396b34 \
--hash=sha256:e4124414d3cff9a10669569a89f585f81c8114b283ab48b2e756e0347a89de0a \
--hash=sha256:f104f0bb21a0867f22e750bb4e05de629ec9f37facc84daf963385a86371b0d9
zipp==2.1.0 \
--hash=sha256:ccc94ed0909b58ffe34430ea5451f07bc0c76467d7081619a454bf5c98b89e28 \
--hash=sha256:feae2f18633c32fc71f2de629bfb3bd3c9325cd4419642b1f1da42ee488d9b98
Expand Down
14 changes: 14 additions & 0 deletions tests/common/checks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .hooked import ExampleHookedCheck # noqa
from .scheduled import ExampleScheduledCheck # noqa
40 changes: 40 additions & 0 deletions tests/common/checks/hooked.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from warehouse.malware.checks.base import MalwareCheckBase
from warehouse.malware.errors import FatalCheckException
from warehouse.malware.models import VerdictClassification, VerdictConfidence


class ExampleHookedCheck(MalwareCheckBase):

version = 1
short_description = "An example hook-based check"
long_description = "The purpose of this check is to test the \
implementation of a hook-based check. This check will generate verdicts if enabled."
check_type = "event_hook"
hooked_object = "File"

def __init__(self, db):
super().__init__(db)

def scan(self, **kwargs):
file_id = kwargs.get("obj_id")
if file_id is None:
raise FatalCheckException("Missing required kwarg `obj_id`")

self.add_verdict(
file_id=file_id,
classification=VerdictClassification.Benign,
confidence=VerdictConfidence.High,
message="Nothing to see here!",
)
37 changes: 37 additions & 0 deletions tests/common/checks/scheduled.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from warehouse.malware.checks.base import MalwareCheckBase
from warehouse.malware.models import VerdictClassification, VerdictConfidence
from warehouse.packaging.models import Project


class ExampleScheduledCheck(MalwareCheckBase):

version = 1
short_description = "An example scheduled check"
long_description = "The purpose of this check is to test the \
implementation of a scheduled check. This check will generate verdicts if enabled."
check_type = "scheduled"
schedule = {"minute": "0", "hour": "*/8"}

def __init__(self, db):
super().__init__(db)

def scan(self, **kwargs):
project = self.db.query(Project).first()
self.add_verdict(
project_id=project.id,
classification=VerdictClassification.Benign,
confidence=VerdictConfidence.High,
message="Nothing to see here!",
)
63 changes: 63 additions & 0 deletions tests/common/db/malware.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import datetime

import factory
import factory.fuzzy

from warehouse.malware.models import (
MalwareCheck,
MalwareCheckObjectType,
MalwareCheckState,
MalwareCheckType,
MalwareVerdict,
VerdictClassification,
VerdictConfidence,
)

from .base import WarehouseFactory
from .packaging import FileFactory


class MalwareCheckFactory(WarehouseFactory):
class Meta:
model = MalwareCheck

name = factory.fuzzy.FuzzyText(length=12)
version = 1
short_description = factory.fuzzy.FuzzyText(length=80)
long_description = factory.fuzzy.FuzzyText(length=300)
check_type = factory.fuzzy.FuzzyChoice(list(MalwareCheckType))
hooked_object = factory.fuzzy.FuzzyChoice(list(MalwareCheckObjectType))
schedule = {"minute": "*/10"}
state = factory.fuzzy.FuzzyChoice(list(MalwareCheckState))
created = factory.fuzzy.FuzzyNaiveDateTime(
datetime.datetime.utcnow() - datetime.timedelta(days=7)
)


class MalwareVerdictFactory(WarehouseFactory):
class Meta:
model = MalwareVerdict

check = factory.SubFactory(MalwareCheckFactory)
release_file = factory.SubFactory(FileFactory)
release = None
project = None
manually_reviewed = True
reviewer_verdict = factory.fuzzy.FuzzyChoice(list(VerdictClassification))
classification = factory.fuzzy.FuzzyChoice(list(VerdictClassification))
confidence = factory.fuzzy.FuzzyChoice(list(VerdictConfidence))
message = factory.fuzzy.FuzzyText(length=80)
full_report_link = None
details = None
1 change: 1 addition & 0 deletions tests/common/db/packaging.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ class Meta:

release = factory.SubFactory(ReleaseFactory)
python_version = "source"
filename = factory.fuzzy.FuzzyText(length=12)
md5_digest = factory.LazyAttribute(
lambda o: hashlib.md5(o.filename.encode("utf8")).hexdigest()
)
Expand Down
3 changes: 3 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,9 @@ def app_config(database):
"files.backend": "warehouse.packaging.services.LocalFileStorage",
"docs.backend": "warehouse.packaging.services.LocalFileStorage",
"mail.backend": "warehouse.email.services.SMTPEmailSender",
"malware_check.backend": (
"warehouse.malware.services.PrinterMalwareCheckService"
),
"files.url": "http://localhost:7000/",
"sessions.secret": "123456",
"sessions.url": "redis://localhost:0/",
Expand Down
23 changes: 23 additions & 0 deletions tests/unit/admin/test_routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,4 +123,27 @@ def test_includeme():
pretend.call("admin.flags.edit", "/admin/flags/edit/", domain=warehouse),
pretend.call("admin.squats", "/admin/squats/", domain=warehouse),
pretend.call("admin.squats.review", "/admin/squats/review/", domain=warehouse),
pretend.call("admin.checks.list", "/admin/checks/", domain=warehouse),
pretend.call(
"admin.checks.detail", "/admin/checks/{check_name}", domain=warehouse
),
pretend.call(
"admin.checks.change_state",
"/admin/checks/{check_name}/change_state",
domain=warehouse,
),
pretend.call(
"admin.checks.run_evaluation",
"/admin/checks/{check_name}/run_evaluation",
domain=warehouse,
),
pretend.call("admin.verdicts.list", "/admin/verdicts/", domain=warehouse),
pretend.call(
"admin.verdicts.detail", "/admin/verdicts/{verdict_id}", domain=warehouse
),
pretend.call(
"admin.verdicts.review",
"/admin/verdicts/{verdict_id}/review",
domain=warehouse,
),
]
Loading

0 comments on commit 557ca0e

Please sign in to comment.