cmd/compile: unnecessary temporary variable on stack #65495
Open
Description
Go version
go version go1.21.6 linux/amd64
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://goproxy.cn'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='/usr/local/bin/gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build734905149=/tmp/go-build -gno-record-gcc-switches'
What did you do?
Use GOSSAFUNC=enc go build qmac.go
or go tool compile -S qmac.go
for its assembly results or see https://godbolt.org/z/9GxzbY4eP.
//qmac.go
package main
func main() {}
type desc struct {
t int
x int
y int
z int
p int
}
func enc(d []byte) []byte {
var buf = d
t, a, b, c, pr := 0, -1, 3, 1, 113
for i := 0; i < len(buf); i++ {
res := process(t, a, b, c, pr)
buf[i] = byte(res.z)
}
return buf
}
//go:noinline
func process(t, a, b, c, pr int) desc {
d := desc{
t: t,
x : a,
y: b,
z: c,
p: pr,
}
//...
return d
}
What did you see happen?
For every loop Go compiler stores the call result into onstack variable main.res
, which is meaningless, as main.res
is only an intermediate result for setting buf[i]
.
I also haven't figured out what autotmp_11
is used for here, why not doing a direct store MOVQ AX, main.res-xx(SP)
instead of MOVQ AX, main..autotmp_11-40(SP)
, MOVQ main..autotmp_11-40(SP), DX
, MOVQ DX, main.res-80(SP)
What did you expect to see?
Compiler can optimize the redundant store operations like following case.
(reduce process
's parameters from 5 to 4)
//qmac2.go
package main
func main() {}
type desc struct {
t int
x int
y int
z int
}
func enc(d []byte) []byte {
var buf = d
t, a, b, c := 0, -1, 3, 1
for i := 0; i < len(buf); i++ {
res := process(t, a, b, c)
buf[i] = byte(res.z)
}
return buf
}
//go:noinline
func process(t, a, b, c int) desc {
d := desc{
t: t,
x : a,
y: b,
z: c,
}
//...
return d
}
Metadata
Assignees
Labels
Type
Projects
Status
Todo
Activity
Jorropo commentedon Feb 5, 2024
This happens because the compiler has a hardcoded limit, it will place on the stack all the structs with more than 4 fields or the structs with a width bigger than 4 pointers.
go/src/cmd/compile/internal/ssa/value.go
Line 592 in b8ac61e
go/src/cmd/compile/internal/ssa/value.go
Line 608 in b8ac61e
I'm testing with patches that allow to treat any struct as SSA values, will report later.
gopherbot commentedon Feb 5, 2024
Change https://go.dev/cl/561695 mentions this issue:
cmd/compile: allow arbitrary sized structs to be SSAified
Jorropo commentedon Feb 5, 2024
Here is the results with the CL I sent:
vs:
For
enc
:randall77 commentedon Feb 6, 2024
The problem with just SSAing everything is that the resulting code can be exponential in size relative to the program. For instance,
S1
is potentially very big, requiring 2^N SSA values to represent it. Currently it would be represented by just 1 autotmp of size 2^N. So doing things like copyingvar p1 *S1 = ...; var p2 *S2 = ...; *p1 = *p2
uses 2^N individual loads/stores instead of just one call tomemmove
.Jorropo commentedon Feb 6, 2024
How would
*p1 = *p2
compiles when one of them is*S1
and the other one is*S2
?I don't exactly see why neither is gonna be SSAified. Given they both have their address taken it should fail to ssa them in genssa.
Maybe Load and Store are destructured too aggressively and we should only do that based on finality.
I think I was able to repro the issue you are talking about this way:
However the compiler also have quadratic behavior before my patch. That already a problem.
Jorropo commentedon Feb 6, 2024
Alternatively I did this patch as a shot in the dark and it's been doing ok to great on real world example of codes I threw at it.
If we have support for variable based ssaification of structs I think it would make sense to vary the limit based on available registers.
We should allow ourself to be more aggressive on RISCV64 with ~<32 gp registers compared to i386 with ~<8.
[2]T
does not alias withstruct{a, b T}
which might be allowed by theunsafe
package #65535thepudds commentedon Feb 6, 2024
@Jorropo, probably worth a look at #24416.
25 remaining items