Skip to content

Proposal: DAG image builder (can execute muti-stage build in parallel) #32550

Closed
@AkihiroSuda

Description

Proposal

Beginning from 17.05, Docker supports "multi-stage" Dockerfile.

My proposal is to convert such Dockerfile to DAG internally, and execute it in parallel.

image

No change to file format nor UX

POC is available: #32402

Tasks

Implement generic DAG utility

I made a small generic DAG package: https://github.com/AkihiroSuda/go-dag

import (
    "github.com/AkihiroSuda/go-dag"
    "github.com/AkihiroSuda/go-dag/scheduler"
)

g := &dag.Graph{
	Nodes: []dag.Node{0, 1, 2},
	Edges: []dag.Edge{
		{Depender: 2, Dependee: 0},
		{Depender: 2, Dependee: 1},
	},
}
concurrency := 0
scheduler.Execute(g, concurrency, func(n dag.Node) { buildStage(n) })

If this design is correct, I think we can vendor this package (or just copy it to github.com/docker/docker/pkg/dag)

Determine Dockerfile DAG granularity

Alternatively, we could use fine-grained DAG like this, but I'm -1 ATM, because it is likely to cause implementation issue

image

Refactor builder pkg to create DAG

My previous POC (#32402) was implemented in weird way:

  • builder/dockerfile/parser(unchanged): parses Dockerfile text and returns parsed tree structure
  • builder/dockerfile/parallel: re-parses output from builder/dockerfile/parser, and creates DAG

Re-parsing output from builder/dockerfile/parser is likely to cause implementation issue; actually I forgot to implement support for ARG.

Rather, we should refactor builder/dockerfile/parser to emit DAG directly.

Investigate other usecases of DAG

#32402 (comment)

The DAG based execution engine should be the new core of the builder, not only provide concurrent stage execution but concurrent build jobs, more efficient processing and cache reuse, cache export/import for DAG branches etc. In the same time, this core should be separated from the frontend(Dockerfile) logic to provide more options for declaring the build definition and also provide more extension points for others to reuse this solver.

Investigate why docker build is slow

Storage driver seems being bottleneck, but haven't looked into this deeper

#9656
#32402 (comment)
#32402 (comment)

cc @tonistiigi @dnephin @aaronlehmann @vdemeester @cpuguy83 @simonferquel @thaJeztah

Metadata

Assignees

No one assigned

    Labels

    area/builderkind/enhancementEnhancements are not bugs or new features but can improve usability or performance.roadmap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions