-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Builder] Introduce a typed command system and 2 phase parse/dispatch build #33492
[Builder] Introduce a typed command system and 2 phase parse/dispatch build #33492
Conversation
builder/dockerfile/builder.go
Outdated
if err != nil { | ||
if strings.Index(err.Error(), "unknown instruction:") == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we handle this with an error type instead of string comparison? This is likely to break
builder/dockerfile/builder.go
Outdated
totalCommands += len(stage.Commands) | ||
} | ||
for _, meta := range metaArgs { | ||
shlex := NewShellLex(escapeToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done before the for loop
builder/dockerfile/builder.go
Outdated
fmt.Fprintf(b.Stdout, "%v / %v %v", currentCommandIndex, totalCommands, &meta) | ||
currentCommandIndex++ | ||
fmt.Fprintln(b.Stdout) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting out a function for "dispatching" metaArgs would be great
return err | ||
} | ||
c.Dest = dst | ||
from, err := expander(c.From) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is supported atm. We have to be careful about this as it may prevent us from determining dependencies between stages before the image config has been pulled. We may want to do this in the context of global args like the strings in FROM
command but that's a discussion for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, flags where not expanded then ? i'll change that!
switch c := cmd.(type) { | ||
case *FromCommand: | ||
if c.StageName == "" { | ||
c.StageName = strconv.Itoa(len(stages)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look correct. --from=n
syntax is for accessing all stages, not the ones that didn't have a name.
if c.StageName == "" { | ||
c.StageName = strconv.Itoa(len(stages)) | ||
} | ||
stage := BuildableStage{Name: c.StageName, Commands: []interface{}{c}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to remove a from command inside a stage and just make FROM==buildstage
. If complicates things atm it could be a follow-up with a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It complicates things a bit for the BuildFromConfig stuff: in one case we have a from command that will initialize the dispatchState with a named stage or image lookup, in the other case, the resume build command just copy an existing runConfig. Also, I like the idea of having every command dispatch logic mapped to a given typed command and for consistency sake, I like the idea to treat "FROM" command as any other dispatchable command (even if on parsing it has a slightly different behavior as it is a marker for a stage beginning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FROM
is not allowed in BuildFromConfig
. Only commands that change image config are allowed there. Basically, it is not building anything, it is just converting parameters in dockerfile syntax to a json config syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moby/builder/dockerfile/builder.go
Lines 25 to 36 in 274cc09
var validCommitCommands = map[string]bool{ | |
"cmd": true, | |
"entrypoint": true, | |
"healthcheck": true, | |
"env": true, | |
"expose": true, | |
"label": true, | |
"onbuild": true, | |
"user": true, | |
"volume": true, | |
"workdir": true, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand, my point is that a stageBuilder can be initialized by either a FROM
command or a ResumeBuild
command created when doing a BuildFromConfig
. I like to leverage the typed-command style dispatching to change this alternative behavior.
Some other things to consider:
- From is not only about initializing the stage: At dispatch time, it might trigger
onbuild
expression dispatching - From supports buildargs expansion and cannot be entirely resolved at parse time
- From can reference another stage, in this case it is not resolvable at parse time either, as it needs to get the runconfig produced by the referenced stage dispatching
- I'd like to keep the degree of consistency I have now: 1 dockerfile instruction == 1 typed command (even meta args are typed command, they just don't belong to a specific stage)
- In term of consistency, I really like the way it is, as the builder don't need to know anything about stage initialization, it only execute one list of commands per stage. The 1st command accidently initialize the stage runConfig, but the builder itself does not even care.
cc @AkihiroSuda as well as this is related to #32550 (comment) |
builder/dockerfile/builder.go
Outdated
select { | ||
case <-b.clientCtx.Done(): | ||
logrus.Debug("Builder: build cancelled!") | ||
fmt.Fprint(b.Stdout, "Build cancelled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\n
missing?
|
||
type BuildableStage struct { | ||
Name string | ||
Commands []interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface{}
-> BaseCommand
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one specific command that does not embed CommandBase as it is not parsed from source code (ResumeBuild, used in BuildFromConfig). This also opens up some cool features like in the future, being able to split typed commands into more fine grained subcommands to increase parallelism opportunities. (for exemple, having the AddCommand split into many fetch
subcommands -1 per source- and join them in a mergeInto
command)
|
||
labels := KeyValuePairs{} | ||
|
||
for j := 0; j < len(req.args); j++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fold two j++
into a single j+=2
? (as in parseEnv
)
builder/dockerfile/builder.go
Outdated
if n.Value == command.From && state.isCurrentStage(b.options.Target) { | ||
break | ||
} | ||
fmt.Fprintf(b.Stdout, "%v / %v %v", currentCommandIndex, totalCommands, cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Step %d/%d : %v"
for compatibility?
I think there are user applications that depend on the current logging format.
(should be converted to machine-readable format in the new RPC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually a failing test for that that I am going to fix
Fixed in last commit:
|
2f7c5d5
to
7349187
Compare
Last push: |
All unit tests are now back (and due to typedCommands, it is easier to isolate parsing and dispatching tests). |
cca3a45
to
3077710
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good.
I think most of my comments are just about removing some duplication in the new instruction types.
If we really want to keep unique types for each. I think we could still do that with type foo bar
aliases, instead of repeating the structs and the methods.
|
||
for _, command := range commands { | ||
ast, err := parser.Parse(strings.NewReader(command)) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.NoError(t, err)
builder/dockerfile/builder.go
Outdated
return err | ||
} | ||
b.buildArgs.AddArg(meta.Name, meta.Value) | ||
b.buildArgs.AddMetaArg(meta.Name, meta.Value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove buildArgs
from Builder
and make it a field on the stage?
I think that would allow us to move this method to the build stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I like the idea (and it is required for build parallelisation). We still need a common buildArgs instance for metaArgs construction, but once it is done, we should simply clone it once per stage, and have individual stage have their own instance.
builder/dockerfile/builder.go
Outdated
} | ||
// dispatchState := newDispatchState() | ||
// dispatchState.runConfig = config | ||
// return dispatchFromDockerfile(b, dockerfile, dispatchState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out code (please remove)
builder/dockerfile/builder.go
Outdated
if err != nil { | ||
if instructions.IsUnknownInstruction(err) { | ||
buildsFailed.WithValues(metricsUnknownInstructionError).Inc() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it would be easier to export buildsFailed
. That way we could update the metrics from the source instead of having to check a bunch of error conditions. I think this is the forth check so far (3 for unknown instruction, another for target not reachable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I do not agree (and that is actually the opposite of @tonistiigi feedback on the same subject before). I'd like to keep the node -> typed command parsing logic the least dependent on anything else as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might have been a mistake to log metrics for BuildFromConfig
as it is just wrongly named function that doesn't have anything to do with docker build
. There should be only one place then. Also, maybe there is no need to check for the error type at all as the point should be to log metrics for all errors. The selection of errors that are logged separately and what are not is completely arbitrary anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the metrics name is explitily "unknown_instruction_error" should we add a "general_parsing_error" metric then for other parsing errors?
builder/dockerfile/dispatchers.go
Outdated
} | ||
|
||
runConfig := req.state.runConfig | ||
func (d *stageBuilder) dispatchEnv(c *instructions.EnvCommand) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these should be methods on a struct. They don't really make sense grouped together, they are unrelated functions that happen to take similar dependencies.
I think this worked better as functions which accept some request or options type struct.
We recently made a similar change to the Cli (moving away from a massive struct with methods, to individual functions which accepted a common type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I come from a very object oriented background where we tend to group state and methods together in a coherent stuff and avoid free functions, so I am not sure I 100% agree, but for consistency sake, I'll do that.
BaseCommand | ||
Cmd strslice.StrSlice | ||
PrependShell bool | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cmd/RunEntrypoint are all the same struct , same idea here.
}, nil | ||
|
||
//TODO | ||
//req.buildArgs.ResetAllowed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out code to remove
|
||
} | ||
|
||
func parseCmd(req parseRequest) (*CmdCommand, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of parseRun()
which would go away with the suggested struct changes for instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep multiple functions (for maintainability sake), but I will try to share some internal code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does more code make it easier to maintain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is if we add flags to cmd
in the future that are not in run
, it would be easier if the code is already split (even if in its implementation it calls a single shared function with run
and entrypoint
). It is also more explicit: if I'm looking to change run
parsing behavior, I know that I can modify parseRun without any side effect on other command parsing logic
var value *string | ||
if hasDefault { | ||
value = &newValue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this if is unnecessary. Why not just set it in the branch above?
} | ||
|
||
// IsUnknownInstruction checks if the error is an UnknownInstruction or a parseError containing an UnknownInstruction | ||
func IsUnknownInstruction(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I know it was my suggestion to make this a type. I should have looked at the code some more. I think exporting the metrics function and calling it directly where the error is raised would be cleaner.
c9300ca
to
aee5129
Compare
builder/dockerfile/dispatchers.go
Outdated
return nil | ||
state := d.state | ||
state.beginStage(cmd.StageName, image) | ||
if state.runConfig.OnBuild != nil && len(state.runConfig.OnBuild) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first nil check is unnecessary, you can just check length (https://play.golang.org/p/E0QPccyYAU)
builder/dockerfile/dispatchers.go
Outdated
err = dispatchTriggeredOnBuild(d, triggers) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This err
condition is unnecessary. You can just
return dispatchTriggeredOnBuild(d, triggers)
builder/dockerfile/evaluator_test.go
Outdated
Stdout: ioutil.Discard, | ||
buildArgs: newBuildArgs(options.BuildArgs), | ||
if err != nil { | ||
testutil.ErrorContains(t, err, testCase.expectedError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, instead of the comments in the test cases above, it would be good to split this into two test cases. One for parse errors, and another for dispatch errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggreed. I should move parsing error related tests to the instructions package as well, and change evaluator_tests to not rely on parsing, but on already constructed typed errors.
type KeyValuePairs []KeyValuePair | ||
|
||
// WithNameAndCode is the base class for Dockerfile (String() returns its source code) | ||
type WithNameAndCode struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this doesn't need to be exported.
type LabelCommand struct { | ||
BaseCommand | ||
Labels KeyValuePairs // kvp slice instead of map to preserve ordering | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree: the whole point of these PR is to have typed commands.
We can still have strict types without the code duplication. As discussed in slack I think we can do this with embedded structs. The field name can just be Value
and it doesn't weaken the type guarantees at all.
what if we want to add a flag on ENV for exemple that has no meaning at all on Label at some point in the future ?
Then at that time we can create a new struct. There's no reason to maintain extra code in the mean time for a case that may never occur.
|
||
} | ||
|
||
func parseCmd(req parseRequest) (*CmdCommand, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does more code make it easier to maintain?
builder/dockerfile/evaluator.go
Outdated
shlex *ShellLex | ||
builder *Builder | ||
source builder.Source | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be renamed back to dispatchRequest
(and with the variable name in the dispatch functions back to req
). I think that will help focus the diff on the changes related to the new typed instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I found the "stageBuild" name more explicit (it represents the whole state and dependencies required to build a stage), but I agree it would simplify the diff. Might be worth adding a Naming PR after that though
builder/dockerfile/evaluator.go
Outdated
} | ||
|
||
func newStageBuild(builder *Builder, escapeToken rune, source builder.Source, buildArgs *buildArgs) *stageBuild { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit. The definition for dispatchState
and StageBuild
are a bit tangled here. Could you move this above newDispatchState()
, so their "constructors" are together with the struct.
3ded678
to
3689774
Compare
@tonistiigi, @dnephin have you any more feedback or should we make this PR advance one more step toward merging ? |
3689774
to
9dbf14f
Compare
Yup, would be nice to see this one merged |
@tonistiigi Is this ready to merge? |
9dbf14f
to
3490d7f
Compare
7c7cdde
to
f777173
Compare
f777173
to
5cb972d
Compare
Yet another rebase for the chown flag on add/copy. Let us see how it behaves in CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Follow ups could be in areas of moving more argument expansion to the instructions
pkg and generalizing the build args. These seem areas where a different dispatcher still needs to duplicate some logic.
ping @dnephin
builder/dockerfile/evaluator.go
Outdated
"runtime" | ||
"strings" | ||
|
||
"reflect" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra whitespace
|
||
// Sources list the source paths | ||
func (s SourcesAndDest) Sources() []string { | ||
return s[:len(s)-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should make a copy to avoid caller doing an append leaking to destination variable.
6f723db
to
e260cfc
Compare
@simonferquel Can you squash the PR commits? The lint/rebase fixes might be too verbose to show up in the git history. |
52588c9
to
c2bea33
Compare
06b7f3d
to
861c696
Compare
This is a work base to introduce more features like build time dockerfile optimisations, dependency analysis and parallel build, as well as a first step to go from a dispatch-inline process to a frontend+backend process. Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
861c696
to
669c067
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a work base to introduce more features like build time
dockerfile optimisations, dependency analysis and parallel build, as
well as a first step to go from a dispatch-inline process to a
frontend+backend process.
- What I did
I moved the whole parsing of Dockerfiles into a separate package generating a typed command graph.
Then I changed the current dispatch logic so that it processes typed commands instead of AST subtrees.
This allows to have a global view of the build process before actually running it, allowing command tree optimisations, parallel execution of build stages, and even distributed execution in the future.
It has also the benefit to make the Builder code much more maintainable with a clear separation of concerns between, parsing, env variables / build args expansion and command dispatching.
- How I did it
I refactored the code to introduce a collection of typed commands corresponding to each operation in the dispatchers.go file.
- How to verify it
Builder keeps the same public interface. The current test suite is OK.
I still have to re-implement existing in-package unit tests though<- unit tests are back- Description for the changelog
Refactoring of dockerfile builder to enable future improvements
cc @tonistiigi and @dnephin
(Note: I squashed all my commits due to rebase complexity. Unsquashed - and outdated - branch is here: https://github.com/simonferquel/docker/tree/typed-builder-commands)