Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Builder] Introduce a typed command system and 2 phase parse/dispatch build #33492

Merged

Conversation

simonferquel
Copy link
Contributor

@simonferquel simonferquel commented Jun 2, 2017

This is a work base to introduce more features like build time
dockerfile optimisations, dependency analysis and parallel build, as
well as a first step to go from a dispatch-inline process to a
frontend+backend process.

- What I did
I moved the whole parsing of Dockerfiles into a separate package generating a typed command graph.
Then I changed the current dispatch logic so that it processes typed commands instead of AST subtrees.

This allows to have a global view of the build process before actually running it, allowing command tree optimisations, parallel execution of build stages, and even distributed execution in the future.

It has also the benefit to make the Builder code much more maintainable with a clear separation of concerns between, parsing, env variables / build args expansion and command dispatching.

- How I did it

I refactored the code to introduce a collection of typed commands corresponding to each operation in the dispatchers.go file.

- How to verify it

Builder keeps the same public interface. The current test suite is OK.
I still have to re-implement existing in-package unit tests though <- unit tests are back

- Description for the changelog

Refactoring of dockerfile builder to enable future improvements

cc @tonistiigi and @dnephin

(Note: I squashed all my commits due to rebase complexity. Unsquashed - and outdated - branch is here: https://github.com/simonferquel/docker/tree/typed-builder-commands)

@tonistiigi tonistiigi added the status/failing-ci Indicates that the PR in its current state fails the test suite label Jun 2, 2017
@vdemeester vdemeester requested review from dnephin and tonistiigi June 2, 2017 20:08
if err != nil {
if strings.Index(err.Error(), "unknown instruction:") == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we handle this with an error type instead of string comparison? This is likely to break

totalCommands += len(stage.Commands)
}
for _, meta := range metaArgs {
shlex := NewShellLex(escapeToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done before the for loop

fmt.Fprintf(b.Stdout, "%v / %v %v", currentCommandIndex, totalCommands, &meta)
currentCommandIndex++
fmt.Fprintln(b.Stdout)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting out a function for "dispatching" metaArgs would be great

return err
}
c.Dest = dst
from, err := expander(c.From)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is supported atm. We have to be careful about this as it may prevent us from determining dependencies between stages before the image config has been pulled. We may want to do this in the context of global args like the strings in FROM command but that's a discussion for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, flags where not expanded then ? i'll change that!

switch c := cmd.(type) {
case *FromCommand:
if c.StageName == "" {
c.StageName = strconv.Itoa(len(stages))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look correct. --from=n syntax is for accessing all stages, not the ones that didn't have a name.

if c.StageName == "" {
c.StageName = strconv.Itoa(len(stages))
}
stage := BuildableStage{Name: c.StageName, Commands: []interface{}{c}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to remove a from command inside a stage and just make FROM==buildstage. If complicates things atm it could be a follow-up with a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It complicates things a bit for the BuildFromConfig stuff: in one case we have a from command that will initialize the dispatchState with a named stage or image lookup, in the other case, the resume build command just copy an existing runConfig. Also, I like the idea of having every command dispatch logic mapped to a given typed command and for consistency sake, I like the idea to treat "FROM" command as any other dispatchable command (even if on parsing it has a slightly different behavior as it is a marker for a stage beginning)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FROM is not allowed in BuildFromConfig. Only commands that change image config are allowed there. Basically, it is not building anything, it is just converting parameters in dockerfile syntax to a json config syntax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var validCommitCommands = map[string]bool{
"cmd": true,
"entrypoint": true,
"healthcheck": true,
"env": true,
"expose": true,
"label": true,
"onbuild": true,
"user": true,
"volume": true,
"workdir": true,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand, my point is that a stageBuilder can be initialized by either a FROM command or a ResumeBuild command created when doing a BuildFromConfig. I like to leverage the typed-command style dispatching to change this alternative behavior.
Some other things to consider:

  • From is not only about initializing the stage: At dispatch time, it might trigger onbuild expression dispatching
  • From supports buildargs expansion and cannot be entirely resolved at parse time
  • From can reference another stage, in this case it is not resolvable at parse time either, as it needs to get the runconfig produced by the referenced stage dispatching
  • I'd like to keep the degree of consistency I have now: 1 dockerfile instruction == 1 typed command (even meta args are typed command, they just don't belong to a specific stage)
  • In term of consistency, I really like the way it is, as the builder don't need to know anything about stage initialization, it only execute one list of commands per stage. The 1st command accidently initialize the stage runConfig, but the builder itself does not even care.

@tonistiigi
Copy link
Member

cc @AkihiroSuda as well as this is related to #32550 (comment)

select {
case <-b.clientCtx.Done():
logrus.Debug("Builder: build cancelled!")
fmt.Fprint(b.Stdout, "Build cancelled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n missing?


type BuildableStage struct {
Name string
Commands []interface{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface{} -> BaseCommand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one specific command that does not embed CommandBase as it is not parsed from source code (ResumeBuild, used in BuildFromConfig). This also opens up some cool features like in the future, being able to split typed commands into more fine grained subcommands to increase parallelism opportunities. (for exemple, having the AddCommand split into many fetch subcommands -1 per source- and join them in a mergeInto command)


labels := KeyValuePairs{}

for j := 0; j < len(req.args); j++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fold two j++ into a single j+=2? (as in parseEnv)

if n.Value == command.From && state.isCurrentStage(b.options.Target) {
break
}
fmt.Fprintf(b.Stdout, "%v / %v %v", currentCommandIndex, totalCommands, cmd)
Copy link
Member

@AkihiroSuda AkihiroSuda Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Step %d/%d : %v" for compatibility?

I think there are user applications that depend on the current logging format.

(should be converted to machine-readable format in the new RPC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually a failing test for that that I am going to fix

@simonferquel
Copy link
Contributor Author

Fixed in last commit:

  • Unknown instruction is a typed error
  • MetaArgs processing in a function
  • Shlex created once out of the meta-args loop
  • \n in Build canceled message
  • --from in COPY command is not expanded
  • reverted to the old logic of build stage indexing
  • parseLabel has a closer logic to parseEnv

@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch 2 times, most recently from 2f7c5d5 to 7349187 Compare June 8, 2017 12:03
@simonferquel
Copy link
Contributor Author

Last push:
Various integration tests fixes + golint

@AkihiroSuda AkihiroSuda added rebuild/windowsRS1 and removed status/failing-ci Indicates that the PR in its current state fails the test suite labels Jun 9, 2017
@simonferquel
Copy link
Contributor Author

All unit tests are now back (and due to typedCommands, it is easier to isolate parsing and dispatching tests).
Note, we might want to write more instructions parsing tests, or implement more end to end parse/dispatch tests like the ones in evaluator_tests

@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch 3 times, most recently from cca3a45 to 3077710 Compare June 9, 2017 13:43
Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good.

I think most of my comments are just about removing some duplication in the new instruction types.

If we really want to keep unique types for each. I think we could still do that with type foo bar aliases, instead of repeating the structs and the methods.


for _, command := range commands {
ast, err := parser.Parse(strings.NewReader(command))
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.NoError(t, err)

return err
}
b.buildArgs.AddArg(meta.Name, meta.Value)
b.buildArgs.AddMetaArg(meta.Name, meta.Value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove buildArgs from Builder and make it a field on the stage?

I think that would allow us to move this method to the build stage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I like the idea (and it is required for build parallelisation). We still need a common buildArgs instance for metaArgs construction, but once it is done, we should simply clone it once per stage, and have individual stage have their own instance.

}
// dispatchState := newDispatchState()
// dispatchState.runConfig = config
// return dispatchFromDockerfile(b, dockerfile, dispatchState)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out code (please remove)

if err != nil {
if instructions.IsUnknownInstruction(err) {
buildsFailed.WithValues(metricsUnknownInstructionError).Inc()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be easier to export buildsFailed. That way we could update the metrics from the source instead of having to check a bunch of error conditions. I think this is the forth check so far (3 for unknown instruction, another for target not reachable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I do not agree (and that is actually the opposite of @tonistiigi feedback on the same subject before). I'd like to keep the node -> typed command parsing logic the least dependent on anything else as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might have been a mistake to log metrics for BuildFromConfig as it is just wrongly named function that doesn't have anything to do with docker build. There should be only one place then. Also, maybe there is no need to check for the error type at all as the point should be to log metrics for all errors. The selection of errors that are logged separately and what are not is completely arbitrary anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metrics name is explitily "unknown_instruction_error" should we add a "general_parsing_error" metric then for other parsing errors?

}

runConfig := req.state.runConfig
func (d *stageBuilder) dispatchEnv(c *instructions.EnvCommand) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these should be methods on a struct. They don't really make sense grouped together, they are unrelated functions that happen to take similar dependencies.

I think this worked better as functions which accept some request or options type struct.

We recently made a similar change to the Cli (moving away from a massive struct with methods, to individual functions which accepted a common type).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I come from a very object oriented background where we tend to group state and methods together in a coherent stuff and avoid free functions, so I am not sure I 100% agree, but for consistency sake, I'll do that.

BaseCommand
Cmd strslice.StrSlice
PrependShell bool
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cmd/RunEntrypoint are all the same struct , same idea here.

}, nil

//TODO
//req.buildArgs.ResetAllowed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out code to remove


}

func parseCmd(req parseRequest) (*CmdCommand, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of parseRun() which would go away with the suggested struct changes for instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep multiple functions (for maintainability sake), but I will try to share some internal code here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does more code make it easier to maintain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is if we add flags to cmd in the future that are not in run, it would be easier if the code is already split (even if in its implementation it calls a single shared function with run and entrypoint). It is also more explicit: if I'm looking to change run parsing behavior, I know that I can modify parseRun without any side effect on other command parsing logic

var value *string
if hasDefault {
value = &newValue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if is unnecessary. Why not just set it in the branch above?

}

// IsUnknownInstruction checks if the error is an UnknownInstruction or a parseError containing an UnknownInstruction
func IsUnknownInstruction(err error) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know it was my suggestion to make this a type. I should have looked at the code some more. I think exporting the metrics function and calling it directly where the error is raised would be cleaner.

@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch from c9300ca to aee5129 Compare June 12, 2017 12:24
return nil
state := d.state
state.beginStage(cmd.StageName, image)
if state.runConfig.OnBuild != nil && len(state.runConfig.OnBuild) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first nil check is unnecessary, you can just check length (https://play.golang.org/p/E0QPccyYAU)

err = dispatchTriggeredOnBuild(d, triggers)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This err condition is unnecessary. You can just

return dispatchTriggeredOnBuild(d, triggers)

Stdout: ioutil.Discard,
buildArgs: newBuildArgs(options.BuildArgs),
if err != nil {
testutil.ErrorContains(t, err, testCase.expectedError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, instead of the comments in the test cases above, it would be good to split this into two test cases. One for parse errors, and another for dispatch errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggreed. I should move parsing error related tests to the instructions package as well, and change evaluator_tests to not rely on parsing, but on already constructed typed errors.

type KeyValuePairs []KeyValuePair

// WithNameAndCode is the base class for Dockerfile (String() returns its source code)
type WithNameAndCode struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this doesn't need to be exported.

type LabelCommand struct {
BaseCommand
Labels KeyValuePairs // kvp slice instead of map to preserve ordering
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree: the whole point of these PR is to have typed commands.

We can still have strict types without the code duplication. As discussed in slack I think we can do this with embedded structs. The field name can just be Value and it doesn't weaken the type guarantees at all.

what if we want to add a flag on ENV for exemple that has no meaning at all on Label at some point in the future ?

Then at that time we can create a new struct. There's no reason to maintain extra code in the mean time for a case that may never occur.


}

func parseCmd(req parseRequest) (*CmdCommand, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does more code make it easier to maintain?

shlex *ShellLex
builder *Builder
source builder.Source
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be renamed back to dispatchRequest (and with the variable name in the dispatch functions back to req). I think that will help focus the diff on the changes related to the new typed instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I found the "stageBuild" name more explicit (it represents the whole state and dependencies required to build a stage), but I agree it would simplify the diff. Might be worth adding a Naming PR after that though

}

func newStageBuild(builder *Builder, escapeToken rune, source builder.Source, buildArgs *buildArgs) *stageBuild {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit. The definition for dispatchState and StageBuild are a bit tangled here. Could you move this above newDispatchState(), so their "constructors" are together with the struct.

@simonferquel
Copy link
Contributor Author

@tonistiigi, @dnephin have you any more feedback or should we make this PR advance one more step toward merging ?

@tonistiigi tonistiigi assigned tonistiigi and dnephin and unassigned runcom Jul 18, 2017
@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch from 3689774 to 9dbf14f Compare August 4, 2017 08:47
@shouze
Copy link
Contributor

shouze commented Aug 4, 2017

Yup, would be nice to see this one merged

@AkihiroSuda
Copy link
Member

@tonistiigi Is this ready to merge?

@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch from 9dbf14f to 3490d7f Compare August 28, 2017 12:11
@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch from 7c7cdde to f777173 Compare August 28, 2017 14:18
@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch from f777173 to 5cb972d Compare August 29, 2017 07:29
@simonferquel
Copy link
Contributor Author

Yet another rebase for the chown flag on add/copy. Let us see how it behaves in CI

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Follow ups could be in areas of moving more argument expansion to the instructions pkg and generalizing the build args. These seem areas where a different dispatcher still needs to duplicate some logic.

ping @dnephin

"runtime"
"strings"

"reflect"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra whitespace


// Sources list the source paths
func (s SourcesAndDest) Sources() []string {
return s[:len(s)-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should make a copy to avoid caller doing an append leaking to destination variable.

@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch from 6f723db to e260cfc Compare September 5, 2017 07:52
@yongtang
Copy link
Member

yongtang commented Sep 6, 2017

@simonferquel Can you squash the PR commits? The lint/rebase fixes might be too verbose to show up in the git history.

@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch 2 times, most recently from 52588c9 to c2bea33 Compare September 7, 2017 14:28
@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch 2 times, most recently from 06b7f3d to 861c696 Compare September 14, 2017 13:08
This is a work base to introduce more features like build time
dockerfile optimisations, dependency analysis and parallel build, as
well as a first step to go from a dispatch-inline process to a
frontend+backend process.

Signed-off-by: Simon Ferquel <simon.ferquel@docker.com>
@simonferquel simonferquel force-pushed the typed-builder-commands-squashed branch from 861c696 to 669c067 Compare September 18, 2017 07:49
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AkihiroSuda AkihiroSuda merged commit 22e1572 into moby:master Sep 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants