-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowness Observed in OpenBao When Using Raft Storage Backend in Amazon EKS Environment #573
Comments
@sspirate24 I'm not an AWS expert but it looks like their docs mention that you can choose your IOPS: https://aws.amazon.com/ebs/general-purpose/ What are your limits configured at and are they similar to Azure's? |
@cipherboy What are all the details and logs I need to collect from my end to troubleshoot this issue which happens in EKS? |
@sspirate24 A few things:
To my knowledge, there is nothing different in the behavior on any host type, other than the It'd be interesting to see if there are leadership elections taking place, i.e., if OpenBao thinks the system/network/... is slow to the point of it timing out holding leader and wanting to re-elect a new one. Some thoughts! |
@cipherboy Point to Note: We don't use OpenBao to store many secrets; our primary use case is leveraging the transit engine for encryption and decryption. Log files: AKS vs EKS Visualizations1. CPU Usage2. Network Traffic3. Memory Usage4. Disk Throughput Over Time5. Cumulative Disk Usage Over Time6. CPU Usage Over Time7. I/O Wait vs Disk Throughput8. Transactions Per Second Over TimeAzure MetricsBelow are the visualizations for Azure Metrics: AWS MetricsBelow are the visualizations for AWS Metrics: |
@sspirate24 Sorry about the delay here. Honestly, the stats you gave didn't really jump out to me as to why it would be slower.
I'm surprised Transit is much slower, tbh. Most keys should be cached (unless you've turned that off or have a large number of distinct keys), so it shouldn't be hitting disk. Are there other environmental factors perhaps? Different CPU features (e.g., no AES-NI acceleration) or network speed? As an aside, I don't think Transit has metrics but OpenBao in general does, I'd be curious if you've hooked those up to a monitoring solution and what they say. We'd also probably take a PR if you wanted to add metrics to Transit! |
@cipherboy |
@sspirate24 Raft should emit the following metrics: https://openbao.org/docs/internals/telemetry/metrics/raft/ You'd have to configure a telemetry provider in the config file: https://openbao.org/docs/configuration/telemetry/ The core request metrics might also be interesting, along with the |
We are experiencing significant slowness in OpenBao when using Raft as the storage backend in the Amazon EKS environment, particularly with the
gp3
storage class. In contrast, a similar setup in Azure Kubernetes Service (AKS) operates normally, utilizing theSTANDARD_SSD
storage class, with consistent execution times for the same operations.Steps to Reproduce the Behavior
ROLE_ID
andSECRET_ID
.Expected Behavior
The authentication, decryption, and token revocation operations should execute quickly and consistently, similar to the performance observed in the AKS environment.
Environment
OpenBao server configuration file(s):
Additional context
The only notable log entry observed continuously during testing:
The text was updated successfully, but these errors were encountered: