Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

startup.sh issues of WebWolf - cannot connect to the WebGoat DB #1079

Closed
fravoss opened this issue Sep 29, 2021 · 13 comments
Closed

startup.sh issues of WebWolf - cannot connect to the WebGoat DB #1079

fravoss opened this issue Sep 29, 2021 · 13 comments
Assignees
Labels
waiting for release Issue is fix, waiting on new release
Milestone

Comments

@fravoss
Copy link

fravoss commented Sep 29, 2021

I am running the goatandwolf:8.2.2 container image in my Kubernetes cluster, and have noticed two issues the prevent WebWolf from starting up:

(1) there is a race condition in start.sh between WebWolf starting up in the second JVM and attempting to connect to the WebGoat database in the first JVM, vs. the WebGoat startup up in the first JVM and bringing the database online. The startup.sh in webgoat/goatandwolf:8.2.2 has a sleep 10 inbetween the two java calls, and my webgoat.log shows the DB to come online only after ~15 seconds. Could the timeout be longer (or configurable via an env-var), or should there be a check for the successful startup of the first JVM (grep for "Started StartWebGoat" in webgoat.log) before start.sh starts the WebWolf JVM?

(2) unlike in the two-image build of webgoat:8.1 where I could pass --spring.datasource.url in the pod startup config, I am unable to set the database connection string correctly. I found in the all-in-one code, that the connection string is set based on the WEBGOAT_SERVER environment variable. However, that creates a conflict for a container - I had to set WEBGOAT_SERVER to the FQDN by which my cluster can be reached from the browser, whereas I do not want to expose the database port to the outside of the container.

I hacked my way through the startup by exec'ing into the pod after the WebGoat JVM started up, and after the WebWolf JVM terminated because of the race condition. I then set WEBGOAT_SERVER to localhost (that's where my db runs in the pod), and then running manually the java startup call of WebWolf. I am considering to hack this logic in a custom start.sh, but could there be a different env-var for the WebGoat DB host to avoid such hacks?

@github-actions
Copy link

Thanks for submitting your first issue, we will have a look as quickly as possible.

@aolle aolle self-assigned this Sep 29, 2021
@aolle
Copy link
Collaborator

aolle commented Sep 29, 2021

Hello,

(1) The issue is currently fixed in development and the fix should be included on next releases.

(2) I have not found references about WEBGOAT_SERVER envvar, but I guess you mean server.address (SERVER_ADDRESS).

We will check that possibility, a separate/different envvar for the WebGoat DB to prevent external access to the pod database.
Can help on the following scenarios:

  • Use a k8s Ingress for the external browser access and the database binds on localhost to be used by WebGoat/WebWolf. (all-in-one deployment).
  • Use a k8s Ingress for the external browser access and the database uses a k8s Service to be accessed only by WebGoat/WebWolf internally via that service (independent deployment).

@fravoss
Copy link
Author

fravoss commented Sep 30, 2021

Thanks!! On (2) sorry, my bad - I rechecked the code, it is WEBGOAT_HOST, not WEBGOAT_SERVER. I took the hint from https://github.com/WebGoat/WebGoat/blob/develop/webgoat-container/src/main/resources/application-webgoat.properties, where server.address gets populated from $WEBGOAT_HOST, and spring.datasource.url a few lines later from $server.address.

On your scenarios:
Yes, that's exactly my motivation. in the independent deployment (with 8.1), I had the WebWolf pod starting up with the WebGoat database by passing in the startup parameters, and creating a ClusterIP service "webgoat-db" in the same namespace for the database:

spec:
  containers:
  - args:
    - --spring.datasource.url=jdbc:hsqldb:hsql://webgoat-db:9001/webgoat
    - --server.address=0.0.0.0
    - --server.port=9090

However, that does not seem to work with the start.sh of the webgoat/goatandwolf:v8.2.2 image.

I am running multiple instances of WebGoat in a K8s cluster (separated by namespaces) and use NodePort services on different ports currently for addressing the external access. I wanted to route through an ingress controller, however:

  • I don't have a wildcard DNS entry currently available for my cluster ingress address; otherwise, I could have used http://.mycluster.example.com/WebGoat for addressing
  • and I did not get WebGoat to work on separate web context; otherwise, I would have used http://cluster.example.com//<wg-instance/WebGoat

@nbaars
Copy link
Collaborator

nbaars commented Oct 3, 2021

@fravoss I guess it would be a bit easier if we provide separate Docker images. We could build them again the main reason for goatandwolf is to have to and all in one image with nginx etc and start it with Docker. For the Kubernetes perspective this is not necessary.
Passing the argument should probably be without -- in front.

@fravoss
Copy link
Author

fravoss commented Oct 5, 2021

Yes, I only moved from the separate Docker images to the combined goatandwolf image because I wanted to switch to 8.2.2 and that version is not available on Docker Hub as separate images. If I had 8.2.2 in separate images, I'd certainly go back to my two-pod helm-chart installation; the startup-dependency handling isn't perfect there either - webwolf sleeps for 8 seconds before the java call, i.e. there is still a risk that WebWolf won't come up if WebGoat's DB isn't ready yet, but in such a case, the WebWolf container will terminate and Kubernetes will start a new one right away.

On the -- in the startup parameters - those are required (or at least, they work...) for the distributed containers - their startup script (e.g. start-webwolf.sh) passes them on to the java call, and this is how the setting from helm-deployment makes it to the pod-spec and finally to the spring property to take effect.

here's an example ps output of WebWolf (distributed) 8.1 on the host:
/bin/bash /home/webwolf/start-webwolf.sh --spring.datasource.url=jdbc:hsqldb:hsql://webgoat-db:9001/webgoat --server.address=0.0.0.0 --server.port=9090 java -Djava.security.egd=file:/dev/./urandom -jar /home/webwolf/webwolf.jar --spring.datasource.url=jdbc:hsqldb:hsql://webgoat-db:9001/webgoat --server.address=0.0.0.0 --server.port=9090

@nbaars
Copy link
Collaborator

nbaars commented Oct 24, 2021

Just to put things in one place:

  • WebGoat starts either hsqldb or uses an external db (with webgoat.start.hsqldb set to false)
  • WebWolf pulls the database info from WebGoat through the actuator
  • Having one image with nginx, WebGoat and WebWolf works fine if you only use Docker
  • but it does not work in a Kubernetes env where you probably also want to have an ingress etc.

Proposed changes:

  • we publish WebGoat / WebWolf as different images just to make it easier to use it
  • update the documentation to include the commands to start them as two different images
  • Have a least a basic Kubernetes setup with two the two containers and a simple ingress and a config map to configure the WebGoat/WebWolf properties

@zubcevic / @aolle what do you think?

@nbaars
Copy link
Collaborator

nbaars commented Nov 16, 2021

see #1152

@aolle
Copy link
Collaborator

aolle commented Nov 16, 2021

IMO, I like the current format of having all-in-one image.
Maybe we can make modifications for providing some more versatility on the configuration side and provide a configmap for some use cases.
WDYT?

@nbaars
Copy link
Collaborator

nbaars commented Nov 17, 2021

@aolle the all-in-one image would work as well, the only thing we don't want to start is nginx probably you would configure this through an ingress. WebGoat/WebWolf can still both run no point to have 2 separate Docker images for this

@fravoss
Copy link
Author

fravoss commented Nov 22, 2021

@nbaars , @aolle - I am not clear on what advantages you get from the all-in-one image vs. the two-container deployment, and the choice is yours, of course.

As some food for your considerations: From my (naive?) external perspective, I see only disadvantages (like the loss of native container logging and your work-around that at least one of the two applications pipes its logfile to stdout), or removing a chance to address startup race conditions at the level of the container orchestration. Both of my reported issues are resolvable in the one-container model as well, albeit it feels more cumbersome and limiting in the one-container approach.

@nbaars
Copy link
Collaborator

nbaars commented Nov 22, 2021

@fravoss the all-in-one image is very helpful in for example a training context, you have one image with everything up and running (no need to figure out all the details everything is present even a landing page if someone navigates to just localhost). So instead of downloading Java starting two different jar files with all the associated problems etc that's the reason we have the all-in-one image.

I do see your point about that it does not work that well in a Kubernetes context etc where you would like to have 2 as it fits better, can you share you Helm charts for version 8.1?

@nbaars
Copy link
Collaborator

nbaars commented Jan 13, 2022

Due to a number of reasons we completely simplified the setup. It is now just one application with one jar file. This also solves the discussion above

@nbaars nbaars added waiting for release Issue is fix, waiting on new release and removed waiting for input labels Jan 13, 2022
@nbaars nbaars added this to the 8.2.3 milestone Jan 13, 2022
@nbaars
Copy link
Collaborator

nbaars commented Jan 6, 2023

Closing as we released 2023.3

@nbaars nbaars closed this as completed Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for release Issue is fix, waiting on new release
Projects
None yet
Development

No branches or pull requests

3 participants