-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
post-deployment hook not finishing #11069
Comments
cc/ @smarterclayton |
I am not able to reproduce.
My server and deployer images are both v1.3.0
@mfojtik can you have a look as well?
|
May it be that you need to update your deployer image? I am going to test with v1.2.0 shortly. |
@csrwng is oc cluster up enforcing the use of the same version as the server for all components? |
Ok, tricked oc cluster up and used a v1.2.0 deployer image. Still I cannot reproduce this issue.
|
@Kargakis :-( I have deployed this some hours back, and if I tail the logs, like you did, I see:
and I can see the pod still running, and even I can exec into it, but don't know what/how to troubleshoot. I'm using docker native and not docker-machine and I have config and data mounted to the host. Other than that don't know what else to say. |
|
@Kargakis by default it'll use the same version for everything, but if you specify --version=latest then you could get mismatched images |
@Kargakis I'll summarise what I've found (with your help through irc). I have diagnosed that I could curl from within the hook pod by changing the command to sleep and then rshing into the container and doing the curl. It worked.
I changed it back to the curl -s http://nationalparks:8080/data/ws/load and it failed.
And ctrol+c after 1 minute: But I see the pods:
If I change the hook from "curl -s" to curl -v:
I need to cancel the previous, as it didn't finished:
And now I can not see the hook pod with this command. How can I raise loglevel of the deployment config? |
@jorgemoralespou You can raise the loglevel of your deployment by specifying a command in the strategy.customParams (your strategy doesn't need to be of type custom). Here's a snippet from a test strategy:
|
Very easy and intuitive. Is this documented anywhere? El 23 sept. 2016 20:55, "Cesar Wong" notifications@github.com escribió:
|
I didn't see it documented anywhere. @Kargakis ? |
I don't think we have docs for @smarterclayton's latest changes in custom deployments. Can you open an issue in openshift-docs? I may tackle it as part of openshift/openshift-docs#2890. Regarding the behavior you see
This is very strange. Can you reproduce it and post the logs from both the deployer and the post hook? |
@kargaris How can I get the logs por the post? Those that get streamed into El 23 sept. 2016 21:50, "Michail Kargakis" notifications@github.com
|
Try |
Hi Michail, I can reproduce this, almost always (funny though, sometimes worked). In any case, here are the logs:
And:
Full log in a gist: https://gist.github.com/jorgemoralespou/84cce7aac15f02e2d2f015e48e7cf13f |
The one thing i noticed is the time in the response. Current time is: Which by no means relate to the date I get in the answers from the server. @csrwng Can this be related to "oc cluster up", and having the container paused during the night? |
It might be related to the time problem.
|
@jorgemoralespou ok to close this? |
@mfojtik Not sure. My problem is due to click synch, but I guess there's an underlying problem that don't finish deployments in 10 minutes if post deployment don't work. I would rather look into why this happens although for me I have no longer problem since I can workaround. |
If the post hook never exits then the deployer pod can never exit because it needs to know the outcome of the hook and act accordingly (based on the failure policy of the hook). |
@mfojtik @Kargakis well, for one I think a timeout should be able to set on hooks (and maybe that's another issue that needs to be opened) but then the second is why a curl does not finish? I don't really think it's a problem with the hook itself, but the mechanisms behind the hook. |
@jorgemoralespou I think this is related to this upstream issue kubernetes/kubernetes#26895 Not documented, but all the timeouts you set for your hooks are being ignored. |
No, that issue is about container hooks, this issue is about deployment hooks. @jorgemoralespou the log query being executed by curl is definitely strange. This query is executed by the deployer pod to get the logs from the hook and inline them into the deployer logs but it uses an openshift client and not curl. It seems that time skew affects this but I am not sure how or why. Regarding a hook timeout, we had discussed this previously with @mfojtik. My reaction back then was negative but eventually we may want to consider setting one as ActiveDeadlineSeconds for the hook. |
The overall deadline for deployment should bound hooks. They have to fit On Oct 11, 2016, at 5:23 AM, Michail Kargakis notifications@github.com @jorgemoralespou https://github.com/jorgemoralespou I think this is No, that issue is about container hooks, this issue is about deployment @jorgemoralespou https://github.com/jorgemoralespou the log query being Regarding a hook timeout, we had discussed this previously with @mfojtik — |
That was my initial thought on this. I don't feel strong for either: a On Wed, Oct 12, 2016 at 5:57 AM, Clayton Coleman notifications@github.com
|
agree with @Kargakis, timeouts are bad and might get complex (you have timeout for hooks, deadline for deployment, etc..). I would rather stick to one global timeout, which is a deadline seconds currently. |
While I'm not going to discuss if implementation wise this is difficult or complex, I think that trying to stick hooks execution in global deployment deadline is good but being able to have fine grained timeouts for hooks can also be good. I would maybe just apply a particular timeout in case it's specified and otherwise just use the deployment deadline to include hook execution. In any case, whether a specific timeout is set, deployment with hooks should never exceed the deadline. |
It's not hard implementation-wise, it will just make DeploymentConfigs a On Wed, Oct 12, 2016 at 11:08 AM, Jorge Morales Pou <
|
@Kargakis I'm up to see how things go upstream, but there I would make sure this is at least discussed. |
I'm not sure we need two timeouts. I'm just saying, when we start a hook On Wed, Oct 12, 2016 at 10:00 AM, Jorge Morales Pou <
|
It seems that we are just setting the default activeDeadline currently:
We need the creation time of the deployer pod inside it if we want to set On Thu, Oct 13, 2016 at 12:53 AM, Clayton Coleman notifications@github.com
|
Opened #11352 |
Fixed. Also linking the upstream hooks rfe (kubernetes/kubernetes#14512) and the initial proposal (kubernetes/kubernetes#33545). Closing this. |
Start a timer at the beginning of the deployer process and calculate from On Oct 13, 2016, at 4:42 AM, Michail Kargakis notifications@github.com It seems that we are just setting the default activeDeadline currently:
We need the creation time of the deployer pod inside it if we want to set On Thu, Oct 13, 2016 at 12:53 AM, Clayton Coleman notifications@github.com
— |
I'm using origin 1.3.0 GA, and I have an application deployment that has a post deployment hook. It used to finish immediately, but now, with this release it's stuck and seems to never finish. I could think that the hook is failing and retrying, but it's configure to Ignore, and I don't even see traces of retries or errors.
It's been there for a very long time (I would say even more than the configured 10 m).
I'm using "oc cluster up", and the post-deployment hook just does a curl to an endpoint in the main service. I see the traces in the main application of the endpoint being called, but the trigger I don't see anything.
How could I raise the logs level? I have added LOGLEVEL=10 to the main deployment as well as the hook.
Version
Server https://127.0.0.1:8443
openshift v1.3.0
kubernetes v1.3.0+52492b4
Steps To Reproduce
oc cluster up --version=v1.3.0
oc new-project roadshow
oc policy add-role-to-user view system:serviceaccount:roadshow:default
oc create -f https://raw.githubusercontent.com/openshift-roadshow/nationalparks/master/ose3/application-template.json
oc new-app nationalparks
Wait for build to finish and deployment. Once last stack traces are shown in the console:
the post deployment hook should have been executed.
Current Result
Deployment never finishes, and status for the deployment is Running instead of Active.
Expected Result
Deployment finishes, and status is Active.
The text was updated successfully, but these errors were encountered: