Performance: Restrictions on arguments in registers in SSA implementation

252 views
Skip to first unread message

Arseny Samoylov

unread,
Dec 12, 2024, 4:53:50 AMDec 12
to golang-nuts
Hi everybody!

Recently, I noticed that there are some restrictions on the arguments passed to functions in registers.

For example, if `a` is a struct, it must have fewer than 5 fields, and its size must be less than `5 * ptrsz`. You can find these restrictions in `cmd/compile/internal/ssa/value.go` at line 590 in the `CanSSA` function:

```
// CanSSA reports whether values of type t can be represented as a Value.
func CanSSA(t *types.Type) bool {
types.CalcSize(t)
if t.Size() > int64(4*types.PtrSize) {
// 4*Widthptr is an arbitrary constant. We want it
// to be at least 3*Widthptr so slices can be registerized.
// Too big and we'll introduce too much register pressure.
return false
}
switch t.Kind() {
...
case types.TSTRUCT:
if t.NumFields() > MaxStruct { // MaxStruct = 4
return false
}
}
}
```

Consider the following example:

```
type A struct {
s1, s2 string
i1     int64
}

func (a A) GetInt() int64 {
return a.i1
}
```

This compiles to:

```
f90007e0                MOVD R0, 8(RSP)
f9000be1                MOVD R1, 16(RSP)
f9000fe2                MOVD R2, 24(RSP)
f90013e3                MOVD R3, 32(RSP)
f90017e4                MOVD R4, 40(RSP)
aa0403e0                MOVD R4, R0
d65f03c0                RET
```

In the recent merged changes (CL#611075)[https://go-review.googlesource.com/c/go/+/611075/4] and (CL#611076)[https://go-review.googlesource.com/c/go/+/611076/6], support was added for making structs with any number of fields SSA-able. With these changes, I was able to remove the size restriction for structs that can be SSA-ized.

Without these restrictions, the above example compiles to:

```
f90007e0                MOVD R0, 8(RSP)
f9000fe2                MOVD R2, 24(RSP)
aa0403e0                MOVD R4, R0
d65f03c0                RET
```

So, I am wondering: why does the restriction on size exist in the first place? It seems unreasonable to place the argument in registers only to later push it to the stack. The comment mentions that it helps reduce register pressure, but can't the register allocator decide to spill the argument if necessary? Also, if we’re preemptively pushing the structure to the stack, why not just pass it on the stack from the beginning?

Thank you for your time and attention,  
Arseny.

Arseny Samoylov

unread,
Dec 12, 2024, 9:13:23 AMDec 12
to golang-nuts
If we're concerned about register pressure, perhaps we should look at the total number of registers taken by arguments rather than just the size of the arguments. Consider the following example:

```
type MegaInt struct {
i1, i2, i3, i4, i5 int64
}

func foo(i1, i2, i3, i4, i5 int64) int64 {
return i1 + i2 + i3 + i4 + i5
}

func bar(i MegaInt) int64 {
return i.i1 + i.i2 + i.i3 + i.i4 + i.i5
}
```

This compiles to:
```
TEXT command-line-arguments.foo(SB)
8b000021                ADD R0, R1, R1
8b010041                ADD R1, R2, R1
8b010061                ADD R1, R3, R1
8b010080                ADD R1, R4, R0
d65f03c0                 RET

TEXT command-line-arguments.bar(SB)

  f90007e0                MOVD R0, 8(RSP)
  f9000be1                MOVD R1, 16(RSP)
  f9000fe2                 MOVD R2, 24(RSP)
  f90013e3                MOVD R3, 32(RSP)
  f90017e4                MOVD R4, 40(RSP)
  f94007e5                MOVD 8(RSP), R5
  8b0100a1               ADD R1, R5, R1
  8b010041               ADD R1, R2, R1
  8b010061               ADD R1, R3, R1
  8b010080               ADD R1, R4, R0
  d65f03c0                RET
```

Keith Randall

unread,
Dec 17, 2024, 2:49:36 PM (11 days ago) Dec 17
to golang-nuts
I think most of what you are seeing is a mismatch between how a big struct is passed in the calling convention and how it is processed within a function by ssa.

The calling convention lets larger structs be broken up and put in registers, if there are enough argument registers for it (which is an arch-dependent thing).
The total set of registers used is fixed, and those registers really can't be used for anything else at the call point, so there's no danger in overusing them.

Inside a function, we can have many more such structs and there's no obvious way to pick which ones get registers and which don't.

`type T struct { a,b,c,d,e int }`
`func f(x,y,z,p,q T) {}`

Here it's obvious how to allocate registers. some prefix of the argument list gets registers, the rest don't.
There's a fixed set of spill instructions needed to handle the rest.

Whereas if we had
`
func f() {
   var x,y,z,p,q T
   ...
}
`
How do we decide which (parts of) variables get registers? How does that compete with other, non-large-struct register demands?
Because we don't have great answers to these questions, we want to be significantly more conservative in how many registers we let a single variable consume.

All that said, I'm sure there are cases where we could do better. In your example, those spills are either dead or kind of silly.

Arseny Samoylov

unread,
Dec 25, 2024, 8:38:53 AM (3 days ago) Dec 25
to golang-nuts
Hello, thank you for your response.

I understand the concern about how many registers a single variable consumes. However, I don’t fully understand why this affects SSA, or why we preemptively decide to spill structures that are already laid out in registers. As far as I understand, this should be a concern for the register allocator, not earlier in the process.


> In your example, those spills are either dead or kind of silly.

Exactly! That's why I provided them =). The example with the Getter function is my main point because it's a pretty common pattern.

Just to clarify, here’s the example I mentioned earlier as a reminder:

```
type A struct {
s1, s2 string
i1     int64
}

func (a A) GetInt() int64 {
return a.i1
}
```

This compiles to:

```
f90007e0                MOVD R0, 8(RSP)
f9000be1                MOVD R1, 16(RSP)
f9000fe2                 MOVD R2, 24(RSP)
f90013e3                MOVD R3, 32(RSP)
f90017e4                MOVD R4, 40(RSP)
aa0403e0               MOVD R4, R0
d65f03c0                RET
```
Reply all
Reply to author
Forward
0 new messages