Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MVP of AWS AutoDiscovery #1936

Merged
merged 37 commits into from
Oct 20, 2023

Conversation

o1oo11oo
Copy link
Contributor

@o1oo11oo o1oo11oo commented Sep 7, 2023

Description

This is the MVP of the AWS AutoDiscovery, which implements a starting point for automatically monitoring AWS resources. The MVP only tracks containers running on ECS, everything else will come at a later point. See #1894 for more details.

As mentioned in the linked issue, excluded from the scope of the MVP are:

  • AWS state synchronization and initial sync
  • Local message buffering and reordering (EventBridge does not guarantee order)
  • K8s health check endpoint
  • Scans in different namespace
  • Proper retry and requeuing for requests that resulted in k8s errors

Closes #1894

Checklist

  • Test your changes as thoroughly as possible before you commit them. Preferably, automate your test by unit/integration tests.
  • Make sure that all your commits are signed-off and that you are added to the Contributors file.
  • Make sure that all CI finish successfully.
  • Optional (but appreciated): Make sure that all commits are Verified.

@o1oo11oo o1oo11oo added enhancement New feature or request auto-discovery go Pull requests that update Go code labels Sep 7, 2023
@o1oo11oo o1oo11oo self-assigned this Sep 7, 2023
@netlify
Copy link

netlify bot commented Sep 7, 2023

Deploy Preview for docs-securecodebox canceled.

Name Link
🔨 Latest commit b6ba7cd
🔍 Latest deploy log https://app.netlify.com/sites/docs-securecodebox/deploys/65325f12088f41000895a52e

@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 3 times, most recently from c3744bf to a45cce0 Compare September 22, 2023 14:11
@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from a45cce0 to ebc7491 Compare September 26, 2023 17:36
@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 2 times, most recently from 96d190d to db64253 Compare October 12, 2023 17:54
@o1oo11oo o1oo11oo marked this pull request as ready for review October 13, 2023 07:36
@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 3 times, most recently from b7807ad to f66452f Compare October 13, 2023 17:19
auto-discovery/cloud-aws/README.md Outdated Show resolved Hide resolved
auto-discovery/cloud-aws/main.go Outdated Show resolved Hide resolved
auto-discovery/cloud-aws/templates/service/service.yaml Outdated Show resolved Hide resolved
auto-discovery/cloud-aws/main.go Outdated Show resolved Hide resolved
auto-discovery/cloud-aws/suite_test.go Outdated Show resolved Hide resolved
auto-discovery/cloud-aws/README.md Outdated Show resolved Hide resolved
@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch 2 times, most recently from fc06ada to 7635a9d Compare October 19, 2023 16:56
@o1oo11oo o1oo11oo changed the title Add initial WIP version of AWS autodiscovery Add MVP of AWS AutoDiscovery Oct 19, 2023
@o1oo11oo o1oo11oo requested a review from J12934 October 19, 2023 17:05
@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from 7635a9d to ca73a55 Compare October 19, 2023 17:18
Lukas Fischer added 9 commits October 20, 2023 12:57
This is a WIP implementation of the AWS autodiscovery, showing basic
functionality for getting changes from AWS and updating kubernetes
resources. At this point only Scans are created, not ScheduledScans, and
they are never deleted either.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Previously this package used client-go, but all other parts of the
secureCodeBox use the client from controller-runtime. Using the same
client everywhere simplifies maintenance.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
"Register" applies more to CustomResourceDefinitions rather than the
actual CustomResources themselves.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Decouple the AWS and Kubernetes implementations by using an interface
for the Kubernetes Reconciler. This should simplify testing later.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Previously the AWS monitor would only react to RUNNING state changes,
now it also detects STOPPED events and also forwards those to the
kubernetes Reconciler. This is still at a Task-level, which works, but
is somewhat generic, there might be ScheduledScans that would not be
needed anymore if the containers have already stopped.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Ignore errors that come from duplicate scan creation/deletion, as those
might result from the same container being used multiple times. Of
course this means there is still no proper handling of multiple
containers, which might lead to scans getting deleted too early.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Remember how many containers using one image are running and only delete
the scan when the last one has been deleted. This prevents the scan from
getting deleted after the first instance of an image stops and instead
defers it till the last one stops.

Kind of moot at this point because this MVP still uses scans instead of
scheduled scans, but needed for heading in the right direction.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
The point of the AutoDiscovery is to continuously monitor the resources
it tracks. This is done by creating ScheduledScans when they appear and
deleting them when they disappear.

For SBOMs this is not particularly useful, in most cases (excluding
upgrades to trivy-sbom), generating an SBOM for the same image and
digest should result in the same SBOM. Generating only Scans for SBOM
targets instead of ScheduledScans would make everything a lot more
complicated though, so they are also covered by ScheduledScans.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Previously the AutoDiscovery would track the state of containers by
comparing the lastStatus of the whole task. This worked somewhat, but
ran into problems quickly.

Technically it created too many scheduled scans, if some of a task's
containers stopped before the others the scheduled scans would be kept
till the whole task was finished.

"RUNNING" events could come in multiple times, any time after the
containers had started and the task changed without them stopping. There
was no way to differentiate between a new state change, or one seen
before. This led to containers getting counted too often and the scans
never getting deleted.

In this version the state of the individual containers is tracked, which
allows more fine-grained scan creation, but also brings other problems:

- very short-lived containers never have the status RUNNING, therefore
  detect that they ran by storing which containers were pending
- we don't want scheduled scans around for containers that already sent
  a STOPPED so those are one-off scans, but those should also get
  cleaned up at some point
- although unlikely, containers might go backwards in the lifecycle,
  either because something is acting weird or because we were not
  running or missed events for other reasons, so these also need to be
  handled somewhat sensibly
- one of these cases not really handled right now is a STOPPED for a
  container never seen before because containers are not stored forever
  so most likely it is just a container we already saw a STOPPED for

In general this version works decently well with most setups, there are
some rough edges with the one-off scans and image reference handling
though. Since containers can be specified with or without a tag, those
will currently be handled as different images, even if both use latest.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Lukas Fischer added 23 commits October 20, 2023 12:57
Instead of manual print statements use proper logging. The other
autodiscovery also uses the controller-runtime integrated zap logger, so
to keep things simple also use it here.

The loglevels are a bit different though, not sure how they work when
using it with kubebuilder, but the kubernetes AutoDiscovery uses
loglevels a lot higher than what gets printed using normal options.
Therefore this only uses Info and V(1) aka Debug.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Use envtest to test the kubernetes part of the service against an actual
kubernetes control plane. This is the same way the kubernetes
autodiscovery and the operator are tested, but here it is used without
controller-gen or kustomize because this is not a kubebuilder project
without any custom resources.

The current tests are only a small demonstration to connect to the
envtest control plane. It is not clear yet where the envtest connection
should be made, in the tests for the kubernetes module, or the tests for
main. To me it seems like the tests for the kubernetes module should
test only that, but the tests for main should test the combination of
the aws and kubernetes parts, and then both test suites need to access a
kubernetes control plane. If envtest supports multiple different ones
that could work, otherwise it might need to be shared.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
The initial iteration of the tests already used envtest to test the
behaviour of the implementation against an actual kubernetes API server,
but tested only the kubernetes side of the service.

This restructures the tests to cover the whole service:
- main_test contains the integation tests for the service, a running
  AWSMonitor gets fed fake messages through a mocked sqs connection and
  sends requests to the reconciler, which then sets up ScheduledScans
  in the envtest k8s API server, where the tests check the results
- aws_test contains unit tests for aws only functions
- kubernetes_test contins unit tests for kubernetes only functions

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Some functions were only exported to be accessible from the tests, but
it makes more sense to keep them private and use whitebox tests instead.
For that the unit tests now use the non _test package.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Instead of redefining all the types for the contents of the messages
from AWS, use the existing types from the SDK to unmarshal messages.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Previously the ScheduledScans the autodiscovery would create were
hardcoded to trivy-sbom-image. Even for an MVP this is not that helpful
though. This makes the scans and some other options configurable in much
the same way they are for the kubernetes autodiscovery, by reading from
a yaml config file which gets supplied to a commandline flag.

The kubernetes autodiscovery with its kubebuilder structure uses
[ComponentConfig][1] for this, which is deprecated and creates options to
be consumed by the manager, which this project does not use. To keep the
whole experience similar, the [kubernetes yaml package][2] is used to
provide the same parsing behaviour. It converts the file to json first,
which means structs do not need yaml tags in addition to json tags.

To keep code duplication minimal, the cloud autodiscovery reuses parts
of the kubernetes autodiscovery for the config. The ScanConfigs are part
of configv1 and the whole templating code is imported to provide the
same convenience of templated scan definitions. Some functions that are
privately defined in the controllers are copied over, in the future
these might get relocated to a shared util package.

The auto-discovery-cloud-aws-config.yaml file contains an example config
that can be used for local development. Other than for the kubernetes
autodiscovery just using this file is not enough though, because the
settings to connect to the SQS queue on AWS are individual. Those can
also be specified through environment variables, which override the
values specified in the config file.

[1]: https://kubebuilder.io/component-config-tutorial/tutorial
[2]: https://pkg.go.dev/sigs.k8s.io/yaml

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
To properly react to OS signals, use the signal handler provided by
controller-runtime and the WithContext functions from the AWS SDK.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Make sure the scans we create can actually be created, even though the
cloud autodiscovery does not care about this at the moment.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Add the necessary structure for making this a proper auto discovery
service: Makefile, Dockerfile, Helm Chart. The files are adapted from
the Kubernetes AutoDiscovery.

Since the Cloud-AWS AutoDiscovery does not need to monitor resources in
kubernetes and only creates scans, it does not need a ClusterRole, a
Role and RoleBinding for the namespace it creates the scans in is
enough. The current set of permissions is probably too large, this can
be restricted further later.

For connecting to AWS there are no good defaults, so other than the
Kubernetes AutoDiscovery, there cannot be a completely ready dev-config.
To ease that, the Cloud-AWS AutoDiscovery allows specifying AWS
connection options in multiple ways, either through the config or as
environment variables. When the helm-deploy Makefile target is used,
values for SQS_QUEUE_URL and AWS_REGION get picked up from the shell's
environment and set as Helm values. AWS credentials need to be provided
as a Kubernetes secret, which can be configured through values under
awsAuthentication. Credential values are optional, the service can also
connect to AWS using only part of the credentials or through IAM roles
if it is running on AWS itself.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Write basic information about the Cloud-AWS AutoDiscovery. This
especially includes how to set up the necessary resources in AWS so that
the AutoDiscovery can be used.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Feels like there should be integration tests as well, the Kubernetes
AutoDiscovery only runs them together with the PullSecretsExtractor.
At the moment there are none though, so this will have to do.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Straight copy-paste from the Kubernetes AutoDiscovery with updated
snapshot.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
This simplifies debugging the AutoDiscovery service when running it
independenly. Unfortunately it seems to be impossible to get
setup-envtest to install the binaries to a predictable path, the only
argument it accepts is --bin-dir, but it still appends /k8s/<os things>.
Moving the binaries manually is a bit of a pain. Vscode cannot set env
variables from script outputs though, so just running setup-envtest also
does not work.

To solve this, a preLaunchTask is used, which invokes a Makefile target
to create an environment file which contains the path setup-envtest
comes up with, which can then be read by vscode. Kind of unnecessary,
but works.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Sometimes when containers on ECS exit very quickly, they never enter the
RUNNING status and instead go directly from PENDING to STOPPED.
Previously the AutoDiscovery would track all the containers that were
PENDING and create individual scans for those that transitioned to
STOPPED. This was a bit of a hack though, because then those scans would
never get cleaned up and it would require additional configuration.

This instead removes the detection of short lived containers for the
MVP, it might get added back at a later point.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Expand the envtest integration tests so that they send multiple messages
to simulate the whole lifecycle of an ECS task to verify that every
event gets handled correctly.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
_service_ is already a defined term in kubernetes, the file named
service.yaml before contains the deployment. The name was a straight
copy from the kubernetes AutoDiscovery, where it is namend manager.yaml,
with the manager -> service rename that this implementation uses.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
The docs did not make clear enough that the AWS Cloud AutoDiscovery is
in a prerelease state and while it monitors AWS/ECS, it runs on
Kubernetes, in or out of AWS. This is added using the [GitHub
NOTE/WARNING syntax][1], since that is the main place where this info is
displayed, it is not used for the docs on the website. On DockerHub and
ArtifactHub it should be ok.

The order/structure of the README was not the best, it started with the
(default and autogenerated) helm install command, which would fail
anyway because the whole AWS setup is missing. The AWS setup was only
described after that. This reverses the order and moves all that setup
into a Prerequisites section, and also adds another helm command to the
Deployment section, that shows how to actually send the necessary values
to the deployment. Unfortunately it is not possible to get rid of the
default deployment command without altering all the templates.

This also clarifies some details, if all values are necessary and what
the AWS cost calculations are based on.

[1]: https://github.com/orgs/community/discussions/16925

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Technically the namespace the scans get created in can be configured by
setting config.kubernetes.namespace, but outside of development it is of
low practical value, because the Helm chart only defines a Role and a
RoleBinding for the namespace the AutoDiscovery gets installed in. To
allow it to create scans in a different namespace, more
Roles/RoleBindings or a ClusterRole and a ClusterRoleBinding are needed.
These could be created manually but it is not the nicest way to do this,
so officially configuring the namespace is not supported for now.

To achieve this, the namespace is removed from values.yaml, so that it
is hidden in the documentation, but it can still be set from the config
passed with --config, which is needed for local development.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Standard or good golang project layout seems to be quite controversial,
there is the [Standard Go Project Layout][1], which comes with an
[extensive disussion][2] if this is encouraged or not. There are also
[blogposts][3] by go team members and the official [Organizing a Go
module][4] docs.

The takeaway I got was that a cmd dir is good for the executables and a
pkg dir is fine if it makes the repository structure better, and with
all the helm and docker files this is certainly the case.

[1]: https://github.com/golang-standards/project-layout
[2]: golang-standards/project-layout#117
[3]: https://eli.thegreenplace.net/2019/simple-go-project-layout-with-modules/
[4]: https://go.dev/doc/modules/layout

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Move docker reference normalization to the kubernetes side, it fits
better there. Several places of the code need different parts of the
docker reference of the running containers. This simplifies the access
to those properties and uses them in a more consistent way. The scans
now only use the actual image name for their name, so that there is
space for information other than docker-io-. The hashmap of running
containers now uses a normalized reference string instead of the
ImageInfo objects, which makes it less error prone with the included
reference information.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Technically they are independent from the cloud provider, so they should
have generic names.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Give names to the loggers the aws and kubernetes sides receive to more
clearly spearate the messages and log more information from the received
aws messages. Unfortunately go does not allow simple iterator/map
statements to access the properties that should be logged, after all for
loops are all everyone could ever need.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
Previously the code was a bit sparsely commented, this adds some more.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from ca73a55 to e63bfaa Compare October 20, 2023 10:57
The initial set of kubernetes permissions for the role in the helm
template was adapted from the kubernetes AutoDiscovery, which is based
on kubebuilder. The AWS Cloud AutoDiscovery manually connects to
kubernetes and uses a lot fewer API permissions than the kubernetes
AutoDiscovery. If future features need more permissions they can be
added back again.

Signed-off-by: Lukas Fischer <lukas.fischer@iteratec.com>
@o1oo11oo o1oo11oo force-pushed the feat/cloud-autodiscovery-aws branch from e63bfaa to b6ba7cd Compare October 20, 2023 11:05
@o1oo11oo o1oo11oo enabled auto-merge (rebase) October 20, 2023 11:44
@o1oo11oo o1oo11oo merged commit 0489d6e into secureCodeBox:main Oct 20, 2023
@o1oo11oo o1oo11oo deleted the feat/cloud-autodiscovery-aws branch October 20, 2023 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-discovery enhancement New feature or request go Pull requests that update Go code
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Cloud AutoDiscovery MVP
2 participants