Skip to content

cmd/compile: unnecessary temporary variable on stack #65495

Open
@katsusan

Description

Go version

go version go1.21.6 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://goproxy.cn'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='/usr/local/bin/gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build734905149=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Use GOSSAFUNC=enc go build qmac.go or go tool compile -S qmac.go for its assembly results or see https://godbolt.org/z/9GxzbY4eP.

//qmac.go
package main

func main() {}

type desc struct {
        t int
        x int
        y int
        z int
        p int
}

func enc(d []byte) []byte {
        var buf = d
        t, a, b, c, pr := 0,  -1, 3, 1, 113
        for i := 0; i < len(buf); i++ {
                res := process(t, a, b, c, pr)
                buf[i] = byte(res.z)
        }

        return buf
}

//go:noinline
func process(t, a, b, c, pr int) desc {
        d :=  desc{
                t: t,
                x : a,
                y: b,
                z: c,
                p: pr,
        }
        //...
        return d
}

What did you see happen?

For every loop Go compiler stores the call result into onstack variable main.res, which is meaningless, as main.res is only an intermediate result for setting buf[i].

I also haven't figured out what autotmp_11 is used for here, why not doing a direct store MOVQ AX, main.res-xx(SP) instead of MOVQ AX, main..autotmp_11-40(SP), MOVQ main..autotmp_11-40(SP), DX, MOVQ DX, main.res-80(SP)

image

What did you expect to see?

Compiler can optimize the redundant store operations like following case.
(reduce process's parameters from 5 to 4)

image
//qmac2.go
package main

func main() {}

type desc struct {
        t int
        x int
        y int
        z int
}

func enc(d []byte) []byte {
        var buf = d
        t, a, b, c := 0,  -1, 3, 1
        for i := 0; i < len(buf); i++ {
                res := process(t, a, b, c)
                buf[i] = byte(res.z)
        }

        return buf
}

//go:noinline
func process(t, a, b, c int) desc {
        d :=  desc{
                t: t,
                x : a,
                y: b,
                z: c,
        }
        //...
        return d
}

Activity

self-assigned this
on Feb 5, 2024
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Feb 5, 2024
added this to the Backlog milestone on Feb 5, 2024
Jorropo

Jorropo commented on Feb 5, 2024

@Jorropo
Member

This happens because the compiler has a hardcoded limit, it will place on the stack all the structs with more than 4 fields or the structs with a width bigger than 4 pointers.

if t.Size() > int64(4*types.PtrSize) {

if t.NumFields() > MaxStruct {

I'm testing with patches that allow to treat any struct as SSA values, will report later.

added
FixPendingIssues that have a fix which has not yet been reviewed or submitted.
on Feb 5, 2024
gopherbot

gopherbot commented on Feb 5, 2024

@gopherbot
Contributor

Change https://go.dev/cl/561695 mentions this issue: cmd/compile: allow arbitrary sized structs to be SSAified

removed their assignment
on Feb 5, 2024
Jorropo

Jorropo commented on Feb 5, 2024

@Jorropo
Member

Here is the results with the CL I sent:

  TEXT command-line-arguments.enc(SB), ABIInternal
  CALL command-line-arguments.process(SB)
  MOVQ DI, AX
  RET

vs:

command-line-arguments_enc_pc0:
  TEXT command-line-arguments.enc(SB), ABIInternal, $88-0
  CMPQ SP, 16(R14)
  PCDATA $0, $-2
  JLS command-line-arguments_enc_pc83
  PCDATA $0, $-1
  PUSHQ BP
  MOVQ SP, BP
  SUBQ $80, SP
  FUNCDATA $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
  FUNCDATA $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
  PCDATA $1, $0
  CALL command-line-arguments.process(SB)
  MOVQ AX, command-line-arguments..autotmp_2(SP)
  MOVQ BX, command-line-arguments..autotmp_2+8(SP)
  MOVQ CX, command-line-arguments..autotmp_2+16(SP)
  MOVQ DI, command-line-arguments..autotmp_2+24(SP)
  MOVQ SI, command-line-arguments..autotmp_2+32(SP)
  MOVQ command-line-arguments..autotmp_2(SP), AX
  MOVQ AX, command-line-arguments..autotmp_1+40(SP)
  MOVUPS command-line-arguments..autotmp_2+8(SP), X0
  MOVUPS X0, command-line-arguments..autotmp_1+48(SP)
  MOVUPS command-line-arguments..autotmp_2+24(SP), X0
  MOVUPS X0, command-line-arguments..autotmp_1+64(SP)
  MOVQ command-line-arguments..autotmp_1+64(SP), AX
  ADDQ $80, SP
  POPQ BP
  RET
command-line-arguments_enc_pc83:
  NOP
  PCDATA $1, $-1
  PCDATA $0, $-2
  CALL runtime.morestack_noctxt(SB)
  PCDATA $0, $-1
  JMP command-line-arguments_enc_pc0

For enc:

package a

type desc struct{ t, x, y, z, p int }

func enc() byte {
	// amd64:-"MOV.+autotmp"
	return byte(process().z)
}

//go:noinline
func process() desc {
	return desc{}
}
randall77

randall77 commented on Feb 6, 2024

@randall77
Contributor

The problem with just SSAing everything is that the resulting code can be exponential in size relative to the program. For instance,

type S1 { a, b S2 }
type S2 { a, b S3 }
...
type SN { a, b int }

S1 is potentially very big, requiring 2^N SSA values to represent it. Currently it would be represented by just 1 autotmp of size 2^N. So doing things like copying var p1 *S1 = ...; var p2 *S2 = ...; *p1 = *p2 uses 2^N individual loads/stores instead of just one call to memmove.

Jorropo

Jorropo commented on Feb 6, 2024

@Jorropo
Member

How would *p1 = *p2 compiles when one of them is *S1 and the other one is *S2 ?

I don't exactly see why neither is gonna be SSAified. Given they both have their address taken it should fail to ssa them in genssa.
Maybe Load and Store are destructured too aggressively and we should only do that based on finality.


I think I was able to repro the issue you are talking about this way:

package a

type S0 struct{ a, b S1 }
type S1 struct{ a, b S2 }
type S2 struct{ a, b S3 }
type S3 struct{ a, b S4 }
type S4 struct{ a, b S5 }
type S5 struct{ a, b S6 }
type S6 struct{ a, b S7 }
type S7 struct{ a, b S8 }
type S8 struct{ a, b S9 }
type S9 struct{ a, b S10 }
type S10 struct{ a, b S11 }
type S11 struct{ a, b S12 }
type S12 struct{ a, b S13 }
type S13 struct{ a, b S14 }
type S14 struct{ a, b S15 }
type S15 struct{ a, b S16 }
type S16 struct{ a, b S17 }
type S17 struct{ a, b S18 }
type S18 struct{ a, b S19 }
type S19 struct{ a, b S20 }
type S20 struct{ a, b S21 }
type S21 struct{ a, b S22 }
type S22 struct{ a, b S23 }
type S23 struct{ a, b S24 }
type S24 struct{ a, b S25 }
type S25 struct{ a, b S26 }
type S26 struct{ a, b S27 }
type S27 struct{ a, b S28 }
type S28 struct{ a, b S29 }
type S29 struct{ a, b S30 }
type S30 struct{ a, b S31 }
type S31 struct{ a, b int }

func Copy(a, b *S0) {
	*a = *b
}

However the compiler also have quadratic behavior before my patch. That already a problem.

Jorropo

Jorropo commented on Feb 6, 2024

@Jorropo
Member

Alternatively I did this patch as a shot in the dark and it's been doing ok to great on real world example of codes I threw at it.
If we have support for variable based ssaification of structs I think it would make sense to vary the limit based on available registers.

We should allow ourself to be more aggressive on RISCV64 with ~<32 gp registers compared to i386 with ~<8.

thepudds

thepudds commented on Feb 6, 2024

@thepudds
Contributor

@Jorropo, probably worth a look at #24416.

25 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    FixPendingIssues that have a fix which has not yet been reviewed or submitted.NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      cmd/compile: unnecessary temporary variable on stack · Issue #65495 · golang/go