Skip to content

evaluator: parsing antler CUE configs can exhaust system memoryΒ #3452

Open
@heistp

Description

What version of CUE are you using (cue version)?

$ cue version
cue version v0.5.0

go version go1.23.0
      -buildmode exe
       -compiler gc
  DefaultGODEBUG asynctimerchan=1,gotypesalias=0,httplaxcontentlength=1,httpmuxgo121=1,httpservecontentkeepheaders=1,netedns0=0,panicnil=1,tls10server=1,tls3des=1,tlskyber=0,tlsrsakex=1,tlsunsafeekm=1,winreadlinkvolume=0,winsymlink=0,x509keypairleaf=0,x509negativeserial=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1

Does this issue reproduce with the latest stable release?

Yes. It's the same or possibly worse in v0.10.0, with or without CUE_EXPERIMENT=evalv3.

What did you do?

I created a CUE package for Antler in this sce-tests repo. This is an Antler test config with 216 tests, that uses both large lists (generated programmatically with Go templates), and CUE list comprehension, that likely results in a large CUE graph.

The biggest culprit, it seems, is the Run list for my FCT tests. This creates list of 1200 elements (using a Go template), which is used with list comprehension to generate StreamClients. When Antler goes to unify the schema with the config using the CUE API using the CUE API, the process memory reported by top rises very quickly, and can complete exhaust the system memory, depending on the hardware. If I comment out this list, it's still slow and uses a lot of system memory compared to what I'd hope for, but it's at least much faster.

To reproduce it, one can install Antler, pull the sce-tests repo, and run antler vet to parse the config. My hope is that this isn't necessary for you to do, and just based on the description, you can identify the category of performance problem referred to in the Performance umbrella issue, so I have a sense of if or when this may be improved.

Also, I might be able to work around this by avoiding large lists, but it's flexible for users to provide their own statistical distribution of wait times and flow lengths, and these lists can simply get long. On top of that, this project will eventually at least triple in size with more tests, so I'll have to solve this somehow, and am just looking for advice. Would this be any better in v0.11.0-alpha.1, or with any other config options?

What did you expect to see?

The config to parse reasonably quickly.

What did you see instead?

Excessive memory allocations.

A Linux laptop with 8G of RAM and 8G of swap runs out of memory entirely when parsing the config.

Another box with 16G of RAM and 8G of swap is able to parse the config without running out of memory, but just barely.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions