-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] - cache duplicate creds during detection #2276
base: main
Are you sure you want to change the base?
Conversation
pkg/detectors/aws/aws.go
Outdated
} | ||
} | ||
|
||
// Cache the result. | ||
s.credsCache.Set(rawV2, cacheItem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a pointer?
https://github.com/patrickmn/go-cache#usage (last example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, using a pointer is the right approach. I'm considering implementing a constructor to ensure that we consistently return a pointer. While this approach doesn't offer substantial protection since it's within the same package, it sets the groundwork for potential future refactoring. This could involve moving it into a separate package, from which it can be imported by all other detectors
@@ -47,6 +51,11 @@ func New(opts ...func(*scanner)) *scanner { | |||
opt(scanner) | |||
} | |||
|
|||
scanner.credsCache = memory.New( | |||
memory.WithExpirationInterval(1*time.Hour), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just placeholders for now, we could consider passing these in as args to New
, or use a default value that seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big could the cache conceivably get in a given time frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. It would really depend on what was being scanned. We can run some tests to scan some repos and record some metrics to see what the size of this cache grows to be. Alternatively, we could use a LRU cache where we fix the size if needed.
if val, ok := s.credsCache.Get(rawV2); ok { | ||
item, ok := val.(cacheItem) | ||
if !ok { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the right behavior, i'd like to maybe log this as an error? Not sure if constructing our custom logcontext.AddLogger
make sense so we can do that?
e20e093
to
f286c33
Compare
…ufflehog into feat-cache-dupe-creds
Still relevant? |
Description:
This PR introduces caching for AWS credentials in the detector, aiming to minimize redundant API calls with identical credentials. The cache uses the rawV2 value as its key and a bespoke cacheItem for its value, enabling the reconstruction of complete metadata for each result, such as
VerificationErr,
verified
, andExtraData
.In practical testing with the trufflehog repository, this implementation reduced AWS API calls significantly, from 707 to 256, while maintaining the same data processing volume, as indicated by unchanged chunk numbers. This approach holds potential for broader application across our most frequently used detectors, enhancing efficiency and performance.
Checklist:
make test-community
)?make lint
this requires golangci-lint)?