Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

High Memory Allocation with snapteld v0.19 #1478

Open
daniellee opened this issue Jan 18, 2017 · 9 comments
Open

High Memory Allocation with snapteld v0.19 #1478

daniellee opened this issue Jan 18, 2017 · 9 comments

Comments

@daniellee
Copy link

daniellee commented Jan 18, 2017

OS version: Ubuntu Xenial
Snap version: 0.19
Environment details (virtual, physical, etc.): Kubernetes pod - our Docker container which is based on the latest intelsdi/snap:xenial container.

Steps to reproduce

Using the following plugins:

Collectors:

  • load (version 3)
  • docker (version 4)
  • cpu (version 6)
  • meminfo (version 3)
  • iface (version 4)
  • iostat (version 6)

Publisher:
graphite (version 5)

One task running with the following metrics:

  • /intel/docker/*
  • /intel/iostat/*
  • /intel/procfs/cpu/*
  • /intel/procfs/iface/*
  • /intel/procfs/load/*
  • /intel/procfs/meminfo/*

The collection interval is 10 seconds.

The number of metrics being collected is between 5000 and 10000 (to put it another way: 5000 - 10000 series are saved to graphite every 10 seconds).

Actual results

On one kubernetes cluster the snap container is using between 1.3GB and 2.5GB according to the memory_stats/usage/usage metric from the docker collector. On the smaller cluster it is using around 600MB. I ran snapteld with pprof on the smaller cluster. The snapteld process was allocated ~300MB and the graphite publisher was allocated ~300MB.

Here are the pprof profiles:

In use memory for snapteld:

Allocated memory for snapteld:

Allocated objects for snapteld:
snapteld-alloc_objects.pdf

CPU profile for snapteld, 2 hours:
cpu.pprof.tar.gz

In use memory for the graphite publisher:
inuse.pdf

Allocated memory for the graphite publisher:
alloc.pdf

Result of running: ps aux --sort -rss | head -n 2

image

Expected results

I'm not sure but the rss memory usage is higher than what I expected.

@candysmurf
Copy link
Contributor

@daniellee, thanks for the details. What's your expectation? Would you please elaborate?

@candysmurf
Copy link
Contributor

@daniellee, those ppof profiles are really helpful. We can easily identify memory hogging methods. We'll look into it. Please feel free to update this issue with your insights anytime. thanks.

@kindermoumoute
Copy link
Contributor

kindermoumoute commented Jan 19, 2017

Here is a flame graph of Snap daemon memory heap allocation generated from @daniellee profiles. It brings to light 3 kind of hotspots that consumes a lot of memory.

Gob and grpc encoding/decoding

How to highlight it?

  • search for decode (70%) and encode (4.5%).

How to fix it

Short term, Snap workflow is made so every metrics go through Snap, so we need to allocate it. There might be a long term solution if we change Snap workflow to go from a plugin to another, it would save metrics to go back and forth between plugins and Snap.

UpdateCache

How to highlight it?

  • search for UpdateCache (15.8%).

How to fix it

  • Namespace.GetSeparator is one of the main hotspot in UpdateCache, a way to fix it would be to save the separator in the Namespace structure. This change might be straight forward inside the Snap daemon. But there might be more work to do if some GoRPC plugins use Namespace.String().
  • Investigate on eventual other hotspot in UpdateCache.

ToMetric

How to highlight it?

  • search for ToMetric (6.9%)

How do fix it

Unify the structure control/plugin/client.metric and control/plugin/rpc.Metric. More generally, we shouldn't have twice the same structure defined across Snap Daemon. We should have a package with all types defined inside (like ctypes), and each struct should have the JSON, protobuf, YAML tags we need.
Bonus: we could do that in the Go lib too (here and here).

It's important to know that all those numbers scale as we scale Snap.

@bjray @mjbrender we may need an "area/performance" tag for that.

@kindermoumoute
Copy link
Contributor

@daniellee sent me a CPU profile of snapteld. So now we can quantify the actual loss on performances.

Garbage collector

Memory heap hotspot we highlighted will cause more garbage collection. If we search for runtime.gcBgMarkWorker, we can see garbage collection use 15.6% of the CPU ressources.

goRPC gRPC encode / decode

The CPU usage of encode and decode without the garbage collection is 55.7% (6.9% + 48.8%).

Namespace.String()

In UpdateCache we previously noticed that Namespace.String was the reason of a lot of memory allocation. If we search for the CPU usage of that same function, it uses 18.3% (18.7% for UpdateCache).

ToMetric

ToMetric uses 2.4% of the Snap footprint on the CPU.

Short term we should definitely prioritize optimizing Namespace.String and client.ToMetric.

@IRCody
Copy link
Contributor

IRCody commented Jan 20, 2017

Namespace.GetSeparator is one of the main hotspot in UpdateCache, a way to fix it would be to save the separator in the Namespace structure. This change might be straight forward inside the Snap daemon. But there might be more work to do if some GoRPC plugins use Namespace.String().

There are two ways that come to mind that could address this:

  1. Modify all of the cache functions to take in to take in namespaces instead of strings. Then implement a key() function that takes the relevant info, including the namespace, and generates a key for all functions in cache. Inside the key function we would concat namespaces using core.Separator instead of calls to Namespace.String()
  2. Modify the Namespace type to store the separator (and current string maybe?) used as an unexported value. It could be incrementally updated on every call to AddDynamicElement and AddDtaticElement. This would eat the separator calls on namespace creation but not on subsequent calls to String(). This addresses issues with GoRPC plugins potentially using it since the change would not be API changing for them.
The CPU usage of encode and decode without the garbage collection is 55.7% (6.9% + 48.8%).

I think it'd be intersting to also test with an all GoRPC/all gRPC set to see what differences(if any) there might be.

@daniellee
Copy link
Author

To get around this problem in our production environment, I reduced the amount of metrics we are saving from 5000-10000 metrics every 10 seconds down to 1000 metrics every 10 seconds. This reduced the memory used by Snap from 1-2GB down to 180 - 300MB per instance.

@candysmurf you asked about my expectations, so 180MB is much closer to what I would expect for a collector to be using. IMO a collector should have as small a memory and cpu profile as possible.

@candysmurf
Copy link
Contributor

candysmurf commented Jan 20, 2017

@daniellee, thanks for details. It helps.

@IRCody & @kindermoumoute, by looking at profiling hotspots, tackling at accumulated metrics inside cache is a good first step. #1 proposed by Cody seems easy and doable right away. Let's do that first and see how much improvement it may bring. Then we can work on future solutions for other areas.

@candysmurf
Copy link
Contributor

@cody, agree on #2 as well that's how Oliver and I have been talked about. We can default the separator to the forward slash. If any collector does not want to use that, it can specify its own separator where Snap should honor that. The change should not be significate but document and communication.

By looking at inuse profiling graph, it didn't form the hotspot for that method. That's why I was more incline to change cache first. Thinking about it now, if we go with #2, caching issue is auto resolved. Should we go with #2 directly? It seems more ultimate.

@candysmurf
Copy link
Contributor

Even though we allow a collector to specify its own separator, some constraints should be enforced to a limited set of characters so it won't bring us further marshall/unmarshal/encode/decode issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants