-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add AWX role #8226
add AWX role #8226
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cben Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
roles/awx/tasks/install.yml
Outdated
- name: Deploy and Activate Postgres | ||
shell: >- | ||
{{ openshift_client_binary }} project {{ awx_project }}; | ||
{{ openshift_client_binary }} process -n openshift postgresql-persistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could oc_process
module be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried initially and to be honest I don't fully remember why I dropped it :-)
Hmm, from oc_process.py:
in conjunction with 'create: True' the process will be piped to | oc create -f
That doesn't sound idempotent. I guess I need to wrap it in a condition to create unless exists?
Then what if you upgrade openshift and the builtin postgresql-persistent
template changes but you won't pick up changes. So I think | oc apply
is better?
Or maybe I'm looking at this wrong, and upgrading a postgres in place is actually undesired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be combined with @elad661's suggestion - create a new template and then use oc_apply
to create new objects from the template. There should already be a postgresql template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the oc_apply role instead of oc apply command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, oc_apply
role would make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched from j2 to an openshift template.
- Used one
oc_configmap
because the main payload is a fixed python file (getting param by env var instead of textual templating) and I felt it's much cleaner to just have it as a file.
The two existing oc_apply
wrappers I found (roles/openshift_metrics/tasks/oc_apply.yaml
, roles/openshift_provisioners/tasks/oc_apply.yaml
) added value is reporting "changed" correctly. They seem to expect a single object, as they check resourceVersion before/after. I'd have to complicate that approach to iterate the multiple objects the template creates.
=> Instead, I'm running oc apply
directly, formulated a changed_when
that looks at output messages. Tested with/without change and with several errors (both in template and apply stage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments. Open questions remain open...
roles/awx/tasks/install.yml
Outdated
|
||
- name: Template Openshift AWX Deployment | ||
template: | ||
src: deployment.yml.j2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current pattern in openshift-ansible is to move away from jinja templates and into OpenShift Templates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need for templating here is actually very little; most objects are fixed, and a few env vars could be converted to secretKeyRef
.
I've also considered replacing the template with a series of oc_service, oc_configmap, ..., oc_obj modules.
I'm happy to go either way, just need someone to tell me...
labels: | ||
name: awx-web-svc | ||
spec: | ||
type: "NodePort" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any specific reason for using NodePort
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea :) copied from https://github.com/ansible/awx/blob/1.0.3/installer/openshift/templates/deployment.yml.j2#L76
Thanks for noticing, it does sound undesired, I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, works with a regular ClusterIP service just as well.
targetPort: http | ||
tls: | ||
insecureEdgeTerminationPolicy: Allow | ||
termination: edge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a lot of components use reencrypt and expect the pod itself to server https with an automatically generated certificate
Haven't yet had a chance to look very closely at this yet, but one thing that stood out immediately is that you should probably be using serialized OpenShift Templates for all your assets rather than Jinja templates. Recently discussed this in the autoheal role PR. New component roles should be very loosely coupled to Ansible. |
2fed963
to
e2b73c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Templating is looking good, added some comments re: security of secrets
- mountPath: /etc/tower | ||
name: awx-application-config | ||
env: | ||
- name: DATABASE_PASSWORD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment is not a secure place for secrets... can awx consume the password from a file on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably can — I can mount the secret as volume, and modify settings.py to read the files.
But can you explain the threat model?
k8s API returns such env vars with valueFrom: secretKeyRef:
, it does not expose the effective value.
Are you thinking of with oc exec
access into the pod — I think then it's game over anyway?
Does the kubelet leak them in some way?
Many of the builtin templates use such secretKeyRef
env vars for keys, passwords, etc.:
https://github.com/openshift/openshift-ansible/search?q=secretKeyRef
including the postgresql-persistent template this role uses:
Lines 123 to 127 in 5c1207c
"name": "POSTGRESQL_PASSWORD", | |
"valueFrom": { | |
"secretKeyRef": { | |
"key": "database-password", | |
"name": "${DATABASE_SERVICE_NAME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the issues with secrets in env were covered way back in the original design discussion, but I wouldn't be surprised if various enhancements have been made since then that I'm just unaware of to reduce the risks.
In this case I probably wouldn't go too far out of my way to introduce file support where none exists, but I would default to avoiding environment when possible (as it's generally an additional risk with no clear added value outside shim use cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No can do, not with current awx images.
Turns out they look at POSTGRESQL_PASSWORD, AWX_ADMIN_USER, AWX_ADMIN_PASSWORD in shell scripts (that are baked into the image) before settings.py (that we control) comes into play.
Can't even attempt equivalent logic in settings.py later, because if we omit AWX_ADMIN_PASSWORD env var, we might later create a superuser with password specified via a file, but an insecure admin
/password
superuser will be created as well.
I'll start a discussion upstream in awx about various ways (including this) learnt from here to improve their k8s/openshift deployment security, and see if they want PRs for any of those.
value: postgresql | ||
- name: DATABASE_PORT | ||
value: "5432" | ||
- name: DATABASE_PASSWORD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment is not a secure place for secrets... can pgsql consume the password from a file on disk?
roles/awx/files/awx-deployment.yml
Outdated
secretKeyRef: | ||
name: admin-credentials | ||
key: username | ||
- name: AWX_ADMIN_PASSWORD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above re: environment/secrets
value: rabbitmq | ||
- name: RABBITMQ_DEFAULT_USER | ||
value: awx | ||
- name: RABBITMQ_DEFAULT_PASS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above re: environment/secrets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for RABBITMQ_ERLANG_COOKIE & RABBITMQ_DEFAULT_PASS it's even worse — I left them hard-wired to fixed known values! (as do upstream awx playbooks)
AFAICT rabbitmq container has no ports exposed outside the pod — does that sound sane or should I randomize those just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if you want to use fixed defaults, I'd make the passwords required parameters and move the defaulting logic out into the installer bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Randomized these 2 as well, stored in secrets. It seems in later AWX version rabbitmq is exposed and these will become important (?)
RABBITMQ_ERLANG_COOKIE could be mounted, not sure about RABBITMQ_DEFAULT_PASS (indirectly by putting it in rabbitmq.conf
?); not gonna bother with this until I get guidance from awx where to invest effort.
roles/awx/tasks/install.yml
Outdated
- name: Deploy and Activate Postgres | ||
# Note: NAMESPACE param is where to look for postgresql ImageStream. | ||
shell: >- | ||
{{ openshift_client_binary }} project {{ awx_project }}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd breakout namespace creation into a discrete task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not create the project, only switched to it to control where oc apply
instantiates (this template omits namespace:
).
=> Discovered I can use the cleaner oc apply -n ...
, switched to that.
2c52d7c
to
fd2b621
Compare
/retest |
@cben: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Example output when creating: "stderr": "", "stdout": "deployment.extensions \"awx\" created\nservice \"awx-web-svc\" created\nroute.route.openshift.io \"awx-web-svc\" created", Example output when re-applying without change: "stderr": "", "stdout": "deployment.extensions \"awx\" unchanged\nservice \"awx-web-svc\" unchanged\nroute.route.openshift.io \"awx-web-svc\" unchanged", Example output when re-applying with one change: "stderr": "", "stdout": "deployment.extensions \"awx\" configured\nservice \"awx-web-svc\" unchanged\nroute.route.openshift.io \"awx-web-svc\" unchanged", failed_when doesn't seem necessary, already works correctly. Tested: - invalid yaml syntax (which breaks `oc template`) - invalid content - missing name - trying to create in non-existent namespace (which breaks `oc apply`) and they all result in "rc": 1 and error in stderr.
This file is not part of openshift-ansible code, it's mounted into awx and just happens to be written in Python.
This file is not part of openshift-ansible code, it's mounted into awx and just happens to be written in Python.
- Renamed secret from 'awx' to 'admin-credentials' for clarity - Added 'username' key - Renamed 'admin_password' key to 'password'
Friendlier to running awx in same namespace with autoheal.
- RABBITMQ_ERLANG_COOKIE could be mounted at /var/lib/rabbitmq/.erlang.cookie instead of an env var: https://github.com/docker-library/rabbitmq/blob/d7096266dfb047fce1eff89bd759e3ab55d779f3/3.7/docker-entrypoint.sh#L164-L176 However https://hub.docker.com/_/rabbitmq/ doc only mentions RABBITMQ_ERLANG_COOKIE and RABBITMQ_DEFAULT_PASS env var.
@ironcladlou this got stuck, how do we move forward? I believe I addressed your comments except for env vars — they can't be passed via secrets with existing awx image. We still want to slowly push autoheal forward. What's needed from your perspective to merge this (and then #8549)? |
I think we need to establish who owns autoheal generally going forward (it's not me). @derekwaynecarr, does monitoring now own this? Some other new team? |
@oourfali Will we or anyone else proceed with autoheal? |
Goal: The recently added autoheal role (#7753) needs an AWX (or Ansible Tower) to execute playbooks in response to problems.
Currently you need to bring your own and set many config vars to point to it.
Ideally we'll want a way to launch both together, with less config (but unclear if that should will be the default).
This PR is a first step — adds a role to launch an AWX but does not config autoheal to use it.
Based on upstream AWX playbooks,
those keep changing so I'm trying to not change them too much so I can keep them in sync in future.
open questions:
should everything here be called awx / openshift_awx / openshift_autoheal_awx ?
I picked awx as for now it's an optional thing, not part of normal install, but not sure that makes sense.
UPDATE: the way it's likely to get used will be deployed together with autoheal in [WIP] Add openshift_autoheal_deploy_awx var to deploy an awx for autoheal #8549, which overrides to deploy into
openshift-autoheal
project.which AWX version do we want? I used 1.0.3 as starting point, but can move up/down.
1.0.4 & 1.0.5 will also need rabbitmq and etcd; latest 1.0.6 needs rabbitmq 3.7 but no etcd.
origin vs enterprise AWX images? no idea, I didn't find any tower nor awx on RH container catalog?
(also downstream rabbitmq image? again, no downstream image, except rhosp12/openstack-rabbitmq)
should I set cpu & mem requests/limits? can steal from later awx templates (they set quite high, 3 cpu total).
security!
admin
, password:randomized, stored into a secret.password
RABBITMQ_ERLANG_COOKIE
,RABBITMQ_DEFAULT_PASS
— randomized, stored into a secret. IIUC, rabbitmq is not exposed outside the pod in 1.0.3 but became exposed later.Turns out awx can't yet serve https itself (Ingress not working for AWX Kubernetes installation ansible/awx#1781 (comment), Add SSL Termination to standalone docker deployment ansible/awx#1549 and others), current practice is put a separate nginx or other proxy in front (although awx_web container already includes an nginx!).
=> Asking upstream & deferring this for now. I don't know if we need to add auth-proxy anyway (?), if we do that could cover TLS as well.
Which of these are blockers for merge? Can I iterate in later PRs?
@jhernand @zgalor @elad661 @ironcladlou please review.