-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage with slightly customized ruleset + enabled network services #208
Comments
More details on the deployment--the falco containers are not exactly the ones we create, they are ones based on https://github.com/phusion/baseimage-docker, which swaps out debian:unstable for the phusion base image. Probably doesn't change the cpu usage, though. |
I tried out the attached ruleset with the workloads we use internally for performance testing, which do include cassandra, and I'm able to see a significant difference in CPU usage between this ruleset and the ruleset that comes with falco 0.5.0. I think the most likely culprit is the additional network rules. I'll do some more investigation to identify the specific rules. |
It's actually an infinite loop in sinsp_filter_check_thread::compare_full_aname:
The process state is malformed with a cycle between 4 processes: 21394 (cut) -> 21389 (mongostats2stat) -> 21421 (mongostats2stat) -> 21400 (sh) -> 21394 (cut, beginning of list) None of those processes actually exist any longer, which explains why a second falco instance doesn't have the same cpu usage. I suspect this is related to dropped events + stale thread state + pid recycling. |
Detect loops in parent thread state by noting the starting point of the search and aborting whenever the current thread pointer is equal to the place at which we started. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue #752.
sinsp_threadinfo::detect_parent_state_loop() detects a loop in the parent state by using two pointers that traverse the parent state at different rates. If they ever match each other, it detects a loop. In all the places where filterchecks might traverse parent state in an unbounded way, first check for a loop in parent state and return NULL/false if a loop is detected. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue #752.
sinsp_threadinfo::detect_parent_state_loop() detects a loop in the parent state by using two pointers that traverse the parent state at different rates. If they ever match each other, it detects a loop. In all the places where filterchecks might traverse parent state in an unbounded way, first check for a loop in parent state and return NULL/false if a loop is detected. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue #752.
FYI, here's a trace file that can be used to reproduce the problem: It was created by changing the scap file writer to modify the parent process of a given process to one of its children. |
Replace the ad-hoc parent thread state traversal that was in several filterchecks as well as in the mesos/coreos code with a central way to traverse parent thread state and detect potential loops at the same time. A new method traverse_parent_state traverses the parent state from the current thead and takes a function that is called for each thread while traversing. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue #752. In the 4 filterchecks that used to traverse parent state (proc.sname, proc.loginshellid, proc.aname, proc.apid), replace the direct traversal with a call to traverse_parent_state + an appropriate visitor function. Update mesos's get_env_mesos_task_id, which used to do a combination of recursion and get_parent_task_id to traverse parent state, with a visitor and traverse_parent_state. It stops as soon as any of the environment variables for a thread are found. This version doesn't explicitly skip pid 1, but I don't think that was strictly necessary as init wouldn't have those environment variables anyway. Also replace a similar process in coreos to find rkt pods.
* Whitespace diffs. Checking in separate from other changes. * Combine parent state traversal w/ loop detection Replace the ad-hoc parent thread state traversal that was in several filterchecks as well as in the mesos/coreos code with a central way to traverse parent thread state and detect potential loops at the same time. A new method traverse_parent_state traverses the parent state from the current thead and takes a function that is called for each thread while traversing. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue #752. In the 4 filterchecks that used to traverse parent state (proc.sname, proc.loginshellid, proc.aname, proc.apid), replace the direct traversal with a call to traverse_parent_state + an appropriate visitor function. Update mesos's get_env_mesos_task_id, which used to do a combination of recursion and get_parent_task_id to traverse parent state, with a visitor and traverse_parent_state. It stops as soon as any of the environment variables for a thread are found. This version doesn't explicitly skip pid 1, but I don't think that was strictly necessary as init wouldn't have those environment variables anyway. Also replace a similar process in coreos to find rkt pods. * Add regression tests for parent state loops Add a new trace file parent_state_loop.scap to the traces zip that has a series of processes with malformed parent state containing a loop. Add 3 new sysdig command lines that test filterchecks/outputs that are known to traverse parent thread state. Although they should *not* cause an infinite loop, add a timeout to the sysdig command line just to make sure it is terminated somewhat quickly.
* Whitespace diffs. Checking in separate from other changes. * Combine parent state traversal w/ loop detection Replace the ad-hoc parent thread state traversal that was in several filterchecks as well as in the mesos/coreos code with a central way to traverse parent thread state and detect potential loops at the same time. A new method traverse_parent_state traverses the parent state from the current thead and takes a function that is called for each thread while traversing. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue draios#752. In the 4 filterchecks that used to traverse parent state (proc.sname, proc.loginshellid, proc.aname, proc.apid), replace the direct traversal with a call to traverse_parent_state + an appropriate visitor function. Update mesos's get_env_mesos_task_id, which used to do a combination of recursion and get_parent_task_id to traverse parent state, with a visitor and traverse_parent_state. It stops as soon as any of the environment variables for a thread are found. This version doesn't explicitly skip pid 1, but I don't think that was strictly necessary as init wouldn't have those environment variables anyway. Also replace a similar process in coreos to find rkt pods. * Add regression tests for parent state loops Add a new trace file parent_state_loop.scap to the traces zip that has a series of processes with malformed parent state containing a loop. Add 3 new sysdig command lines that test filterchecks/outputs that are known to traverse parent thread state. Although they should *not* cause an infinite loop, add a timeout to the sysdig command line just to make sure it is terminated somewhat quickly.
* Whitespace diffs. Checking in separate from other changes. * Combine parent state traversal w/ loop detection Replace the ad-hoc parent thread state traversal that was in several filterchecks as well as in the mesos/coreos code with a central way to traverse parent thread state and detect potential loops at the same time. A new method traverse_parent_state traverses the parent state from the current thead and takes a function that is called for each thread while traversing. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue #752. In the 4 filterchecks that used to traverse parent state (proc.sname, proc.loginshellid, proc.aname, proc.apid), replace the direct traversal with a call to traverse_parent_state + an appropriate visitor function. Update mesos's get_env_mesos_task_id, which used to do a combination of recursion and get_parent_task_id to traverse parent state, with a visitor and traverse_parent_state. It stops as soon as any of the environment variables for a thread are found. This version doesn't explicitly skip pid 1, but I don't think that was strictly necessary as init wouldn't have those environment variables anyway. Also replace a similar process in coreos to find rkt pods. * Add regression tests for parent state loops Add a new trace file parent_state_loop.scap to the traces zip that has a series of processes with malformed parent state containing a loop. Add 3 new sysdig command lines that test filterchecks/outputs that are known to traverse parent thread state. Although they should *not* cause an infinite loop, add a timeout to the sysdig command line just to make sure it is terminated somewhat quickly.
This was fixed in draios/sysdig#753. |
@mstemm awesome! Great to hear. Sorry I've been MIA. |
We don't have a new release yet, but that's coming soon. In the meantime, you can try one of the daily dev builds. |
* K8s fixes + max & windows build (#666) * mac build (not tested) * linux build and run * Done - add blocking connect/init mode to k8s - sysdig connect and init are blocking now (faster startup) - move k8s http to 1.1 to utilize keepalive - fixed chunk purging bug - reuse state socket for watch (no diconnect after state fetch) Todo - improve handler receive error handling - test https - blocking resolve * watch redirection fix * fix watch transition; detect http 1.1 watch emission end and reconnect promptly; fix jq filter order bug * fix mac build * fix linux compile error; add docker flag to handler * windows build * fix race condition when no data on first attempt; make k8s default http 1.1 * fix blocking read * Add less to docker image * Added s390x support to sysdig source (#667) * Update ppm.h Added support for s390x * Update ppm.h re committing changes related to s390x * Revert "Added s390x support to sysdig source (#667)" This reverts commit bf7ae5a. * Added s390 support to sysdig source (#671) * Update ppm.h Added support for s390x * Update ppm.h re committing changes related to s390x * build: Fix openssl build when not using the bundled library. (#672) Otherwise, with cmake -DCMAKE_BUILD_TYPE=Debug -DUSE_BUNDLED_OPENSSL=OFF .. one gets ``` [ 96%] Linking CXX executable csysdig [ 97%] Linking CXX executable sysdig /usr/bin/ld: ../libsinsp/libsinsp.a(k8s_handler.cpp.o): undefined reference to symbol 'SSL_CTX_use_PrivateKey_file' /usr/lib/libssl.so.1.0.0: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status make[2]: *** [userspace/sysdig/CMakeFiles/csysdig.dir/build.make:131: userspace/sysdig/csysdig] Error 1 make[1]: *** [CMakeFiles/Makefile2:275: userspace/sysdig/CMakeFiles/csysdig.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... /usr/bin/ld: ../libsinsp/libsinsp.a(k8s_handler.cpp.o): undefined reference to symbol 'SSL_CTX_use_PrivateKey_file' /usr/lib/libssl.so.1.0.0: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status make[2]: *** [userspace/sysdig/CMakeFiles/sysdig.dir/build.make:129: userspace/sysdig/sysdig] Error 1 make[1]: *** [CMakeFiles/Makefile2:323: userspace/sysdig/CMakeFiles/sysdig.dir/all] Error 2 make: *** [Makefile:139: all] Error 2 ``` This is because linking to libssl and libcrypto is not done (after find_package in CMakeLists.txt) when using the system libraries. Also, fix the curl ssl dependency. sysdig-CLA-1.0-signed-off-by: Raghavendra Prabhu <me@rdprabhu.com> * - keep single k8s socket opened throughout session - fix transition from non-chunked to chunked k8s handler - change active k8s handler filter from reference to pointer - remove unnecessary socket handler docker flag - early terminate k8s blocking request loop on JSON end detect - increase k8s blocking loop sleep to 10ms - fix some logs * missing deployments fix * fix, part II; Missing k8s metadata #251 * make iolen signed * fix code formatting * fix (un)signed warnings; lower k8s handler creation log severity to debug * replace http parser * Changes for s390 , removed unsupported syscalls (#676) added checks for unused macros on s390x sysdig-CLA-1.0-signed-off-by: Ketan Kunde ketan22584@gmail.com * Parse Conf from Docker * Detect/remove stale threadinfo in clone children When parsing clone exit events, specifically for the child half of a clone and when in a container, detect and potentially remove stale threadinfo state for the child thread. Generally the client have of a clone is responsible for creating the thread state for the new thread, as long as the parent is in a container. See the parent half of the "if(childtid == 0)" statement. We simply need to verify in the child half that the parent really was in a container. You can find the parent thread id from the syscall return information, which is moved up from below. Look up the parent thread and see if its vtid/vpid differs from tid/pid. If so, any existing thread state must be stale and remove it. Note that you can't use evt->m_tinfo->get_parent_thread() directly, as that comes from the existing potentially stale threadinfo. This fixes #664. * Remove cwd parsing from the driver because the function became sleepable in 4.8 (torvalds/linux@47be618). When forking a new process, inherit the cwd from the parent. * Use main_thread for set_cwd/get_cwd * Mesos token auth (#673) Support DC/OS token auth and HTTPS on Mesos * Remove spurious code * Probe builder with timeout (#683) * add timeout to urlopen operations * add timeout to download operations * retry download max 10 times * exlude 4.9 from ubuntu repos (#685) * add msg end handler * sysdig with https k8s-api failed #687 * windows compile errors * return an exception when a filter only fiels is used for display * bugfix: evtin.span.*.tags filter fields were not working properly * evtin* fields can also be used as display fields now * minor cleanup * add stopwatch utility * Fix compilation issues with kernel 4.9 (#684) * Fix compilation issues with kernel 4.9 related commits: torvalds/linux@4c737b4 torvalds/linux@b9d989c * map io cgroup to blkio, fix for kernels >= 4.8 * Fix tracer code errors * Fix ipv4 mapped ipv6 when used on sendto and receiver endpoint is 0 * Use https for all downloads. Use https instead of http for all downloads within the install script. In cases where the links refer to artifacts in our s3 bucket, switch to https + s3.amazon.aws.com, which is already used by other urls in the script. This fixes falcosecurity/falco#152. * Fix format memory leak (#694) * Whitespace diffs. Committing separate from other changes. * Fix leak when fmt string ends with non-filtercheck Make sure that any final rawstring_check added to the list of tokens is also added to m_chks_to_free, so it is properly freed. This fixes #693. * Clean up utils header file to be self-contained (#696) Currently, utils.h has a lot of implicit dependencies on other stl header files as well as assuming the std namespace is available. Clean it up so it can be included on its own (say, in falcosecurity/falco#162). * Fix typo in csysdig threads view * a bit of work on the flame chisel * support reading merged files * wrong return value * throttle k8s (#699) * throttle max bytes per socket/cycle to 512k, max msgs for critical k8s entities to 100 * ifdef k8s caching * adjust some commented (TBD) code * fix the message limit logic * osx build * windows build, remove some warnings * Revert "exlude 4.9 from ubuntu repos (#685)" This reverts commit c183a57. * Reset marathon group json together with marathon app one (#700) * Reset marathon group json together with marathon app one * Remove spurious app_it declaration * The previous commit on timeout completely broke the logic of this script: now a simple 404 (which is expected) is enough to skip the entire source, and as a result we were missing 80%+ of the sources * a bit more work on pushing the sinsp thread table to scap when saving files * the dumper class is now optionally able to recreate the output file's thread and file tables based on sinsp's state * expand the scap_dumper_t into a real structure that keeps additional state other than the file handle * fix scap_fds.c compile and link (#705) * scap-int.h: fix forward declaration of scap_fd_write_to_disk Now uses scap_dumper_t* instead of gzFile as last argument. Signed-off-by: Steven Noonan <steven@uplinklabs.net> * scap_fds: fix references to scap_dump_write There wasn't a forward declaration of scap_dump_write, and the definition of scap_dump_write was declared with inline linkage, which would break when trying to link scap_fds.o's references to scap_dump_write. Signed-off-by: Steven Noonan <steven@uplinklabs.net> * TLSv1_2_client_method(void) deprecated in OpenSSL 1.1 #707 * Expose mesos token (#701) * Whitespace diffs. Committing separate from other changes. * Split mesos auth into standalone class Split mesos auth into a standalone class so you can get an authentication token without doing any of the other mesos-related activities. The auth work is now in mesos_auth.cpp and has the authenticate() and refresh_token() methods that used to be in mesos.cpp and a new method get_token() that returns the token. mesos.cpp had an unused bool m_token_authentication which was removed and not carried over. * Manage periodic refresh of tokens. mesos_auth's refresh_token() method is now responsible for deciding when to regenerate a token. It maintains its own time at which a token was last generated and also only generates one when dcos_enterprise credentials were provided. authenticate() now becomes private so it can only be called from the mesos_auth class itself. refresh_token() is the main interface to update the token. get_token() will always call refresh_token(), so callers only using get_token() can be sure that the token is refreshed automatically. They can also force a token refresh via refresh_token(). * Add ability to change auth hostname from localhost On the mesos slaves, you can't get an auth token from localhost. Instead, you need another hostname like master.mesos. So add the ability to override the hostname in the constructor. * Expose the list of marathon uri tokens. This is needed by the analyzer to update an app check's config with the right uri. * Improve comment * Ifdef __access_remote_vm since the function was made publicly accessible (#711) since kernel >=4.9.1. Fixes #710. sysdig-CLA-1.0-signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com> * trim newline from encoded credentials * Changes to get windows build working (#713) * Use std namespace for string. In the header file, fully specify the namespace, and in the .cpp file add a using namespace std. * Add header file for shared_ptr. When HAS_CAPTURE is false, you need to #include <memory> to pick up std::shared_ptr. * Only implement marathon_uris with HAS_CAPTURE Only define/implement marathon_uris() when HAS_CAPTURE is set. * Temporary exclude 4.9 kernels * Consider a mesos container valid only if we find mesos_task_id * another attempt at removing stale threads from the thread table. This commit removes 833f790 and instead uses the time from the clone to determine if a thread table entry is stale. If the previous clone was done more than 2 seconds ago the entry is considered stale and is removed. * force an update of the container ID in the execve handler if the clone happened a long time ago * when a thread used to have a container ID but now it doesn't any more, make sure to clear m_container_id. This can happen in case of procexit drops, when a thread in the table that was in a container gets refreshed and is not in a container any more. * Locally catch exceptions from authenticate() Catch exceptions from authenticate() locally. Previously, these were caught much higher up (e.g. within callers of top level mesos calls like refresh()), but we want to catch them here instead now that there are additional users of mesos_auth. * apply the age-based thread table pruning logic only for child clones * Revert "Temporary exclude 4.9 kernels" This reverts commit 058c996. * Update third party libs to address security vulnerabilities (#709) * Update openssl to 1.0.2j. This fixes a set of ~25 security vulnerabilities. * Update libcurl to 7.52.1. This fixes ~10 security vulnerabilities. * Patch jq 1.5 with a fix for security vulns. After downloading jq 1.5, apply the changes in jqlang/jq@8eb1367 by downloading the commit as a patch and applying it. This fixes CVE-2015-8863. * Add a local dockerfile variant. Add a local dockerfile variant that allows creating an image from a local .deb package. * with merged captrues files, make sure event numbers are monothonic and filter settings are remembered * Precompile probe module for Oracle Linux (#727) * Add Oracle-specific kernel crawler * Build OL probe modules from an OL container * Create a separate builder for OL6 * Add elfutils to pick up missing libdw dependency in OL6 * Don't build UEK2 since it's a 2.6.39 kernel * Update git pull avoidance for UEK builds * Fix distro build order, remove debugging code, and fix indentation * More docker cleanup, fix string comparisons * Don't do any git pulls from inside UEK module builder * make the code compile again when HAS_CAPTURE is disabled * fixes 114 sysdig-CLA-1.0-signed-off-by: Adam Baldwin baldwin@andyet.net (#720) * Add proc.pcmdline. (#721) Add proc.pcmdline, which returns the commandline of the parent process. This is useful for some cases like detecting ansible environments when you want to see the parent command line (in this case, ansible's use of python) to tell the difference between python and python-run-by-ansible. * Fix Epel releases * Squashed commit of the following: commit 5407b1c Author: Luca Marturana <luca@sysdig.com> Date: Fri Jan 20 15:01:12 2017 +0100 Use another type for fake event commit a428209 Author: Luca Marturana <luca@sysdig.com> Date: Wed Jan 18 17:00:04 2017 +0100 Fix other mode related if commit df8abb2 Author: Luca Marturana <luca@sysdig.com> Date: Wed Jan 18 13:57:02 2017 +0100 Parse only sockets for nodriver mode commit 89298f0 Merge: c20c693 8f12a09 Author: Luca Marturana <luca@sysdig.com> Date: Wed Jan 18 12:37:54 2017 +0100 Merge branch 'dev' into driverless commit c20c693 Author: Luca Marturana <luca@sysdig.com> Date: Fri Jan 13 16:07:09 2017 +0100 Improve naming commit 0549064 Author: Luca Marturana <luca@sysdig.com> Date: Wed Jan 11 11:57:28 2017 +0100 Remove debug functions commit 7d3e737 Author: Luca Marturana <luca@sysdig.com> Date: Fri Jan 6 13:24:19 2017 +0100 Fix warning commit db38999 Merge: 38ecc61 82969c8 Author: Luca Marturana <luca@sysdig.com> Date: Fri Jan 6 11:54:25 2017 +0100 Merge branch 'dev' into driverless commit 38ecc61 Author: Luca Marturana <luca@sysdig.com> Date: Thu Jan 5 17:49:17 2017 +0100 Instead of no fd lookup, limit it for nodriver mode, useful to detect listening ports commit 31f076f Author: Luca Marturana <luca@sysdig.com> Date: Tue Jan 3 19:39:09 2017 +0100 Bug fix on vtid read from /proc/<pid>/status commit fefc098 Author: Luca Marturana <luca@sysdig.com> Date: Tue Jan 3 19:19:49 2017 +0100 Read clone_ts from /proc it's done by approximating it via stat.ctime of /proc/<pid> directory commit 5e2b68c Author: Luca Marturana <luca@sysdig.com> Date: Thu Dec 22 16:42:39 2016 +0100 Don't scan threads and fds in nodriver mode commit c5ccbbb Author: Luca Marturana <luca@sysdig.com> Date: Thu Dec 8 11:48:51 2016 +0100 Driverless experiments * Restore scap_savefile * Fix issue #734 * Fix vtid parsing * Remove assertion since callers alredy handle failures * compile errors on windows * Update kernel.org coding style link * typo, edits in v_incoming_connections.lua (#703) "arguments" for "argyuments". No change to code, just data. Also edit container.name description to be more concise. * typo, edits in v_incoming_connections.lua (#703) "arguments" for "argyuments". No change to code, just data. Also edit container.name description to be more concise. * Revert "typo, edits in v_incoming_connections.lua (#703)" This reverts commit b465e0d. * Fix regression tests (#737) * Whitespace diffs. Committing separate from other changes. * Ensure PT_CHARBUF extracts set length. Ensure that all PT_CHARBUF filtercheck extracts set the length of the string they return. Although the strings they return are usually null-terminated, they may not always be null terminated, which causes problems in the chisel api when returning the strings to lua. This is half of the fix for #736. * Don't rely on PT_CHARBUFs being null terminated. Use lua_pushlstring to return extracted PT_CHARBUFs to lua. This way, even if a charbuf isn't null terminated it won't go past the end of the buffer. This is half of the fix for #736. * Properly initialize clone_ts. Initialize clone_ts when using a stack-residing scap_threadinfo to read from a proclist. Otherwise, when reading from an old trace file the initial clone_ts could be invalid, causing this difference in the proc_exec_time chisel: diff -r /tmp/sysdig.NzoD98EsyQ/results/proc_exec_time/wordpress.scap.output /tmp/sysdig.NzoD98EsyQ/baseline/proc_exec_time/wordpress.scap.output 3d2 < 1425066765.22 sleep 0.1 * Don't crash on short capture lengths. (#735) Handle trivially short capture lengths (<4). If the priority string length is 0, or if the message buffer is not long enough to hold the priority string plus surrounding <> characters, simply set the priority to -1 and return. This fixes #725. * Fix build using the CMake Ninja backend The Ninja build system needs to know about the outputs of the external projects that are built, otherwise it fails with this sort of error: ninja: error: 'ncurses-prefix/src/ncurses/lib/libncurses.a', needed by 'userspace/sysdig/csysdig', missing and no known rule to make it The BUILD_BYPRODUCTS argument allows us to create the correct dependency rules. sysdig-CLA-1.0-signed-off-by: Sam Thursfield ssssam@gmail.com * Scan subdirectories only of /proc/<pid> and not /proc/<pid>/task/<tid> * Added some NULL checks before using strlen(), fixing issue #740 (#742) * Added some NULL checks before using strlen() * Undoing some unneccesary checks * Event scope escape (#733) * add scope utils and k8s/docker events scope checks * factor out scope check and assembly * event scope class * restrict key with regex, allow anything in value (e340987) * replace c++11 regex with posix * simplify regex check * fixes from @bertocci review * fix regex on windows * Add support for tagging falco rules. (#746) Add support for tagging falco rules with tags, enabling/disabling sets of rules based on the tags they have, and having the ability to run a subset of loaded rules against a given event: - in sinsp_evttype_filter::add(), you now provide a set of tags for each filter (can be empty). - internal to sinsp_evttype_filter, m_filter_by_tag maintains a mapping from tag to list of filters having that tag. This is used for sinsp_evttype_filter::enable_tags, which allows enabling/disabling all filters having a given tag. - In sinsp_evttype_filter, rules are grouped into rulesets. All rules have a ruleset of 0 which reflects their original loaded status. - In ::enable/::enable_tags(), you can provide an optional ruleset id (number, defaults to 0) that lets you select sets of rules that are enabled given that ruleset id. - The filter_wrapper boolean is now a vector indexed by ruleset id. filter_wrapper is also now a class with a constructor that initializes the vector to one ruleset id (0) that is enabled. Having multiple rulesets lets you call enable/enable_tags multiple times with different rulesets. It's one set of rules, but the ruleset allows you to have different subsets of rules enabled/disabled. - ::run() also takes an optional ruleset argument. Once you find the matching set of filters given the event type, check the filter's enabled vector to see if the filter is enabled given the provided ruleset. If so, the filter runs against the event. - In sinsp_evttype_filter, clean up use of auto loops to use const references whenever possible and use std:: for stl objects in the header file. * Allow scopes to be initially empty. (#754) Allow an initially empty scope by skipping the add() call in the constructor unless both key and value are non-empty. Also, in get()/get_ref() log a warning if the scope-as-string is empty. * libscap: fix compilation errors on OS X * libsinsp: fix compilation when the library is compiled outside sysdig g_logger was not found in user_event.h. By including sinsp_int.h we get the logger declaration. * Exclude 4.10 kernels from builder * fix topports_server chisel and wrong IP family (#760) * K8s improvements (#712) * skip unnecessary data copying (perf enhancement) * handle absence of entities without error logs (non-critical enhancement) * check http status in socket_handler * remove unused variables and add some comments * add all sources to project * separate http_reason; improve curl http status log * remove unused vars; remove null filter for watch * @bertocci review fixes * remove check for null parser * add assert for drop mode (#762) * Travis mac build (#761) * split travis steps on multiple scripts * cleanup * mac os build files * adding mac to os list in travis yaml file * install_if_not_present * compile flags and tests * make script compatible with osx * install coreutils * temp remove linux for faster iteration * debug * verbose debug * Regression tests are failing on Mac build #614 * uncomment rm * revert -pc and fix topports_server script #614 * enable linux * report dev changes for linux build to this branch * Detect aname loops (#753) * Whitespace diffs. Checking in separate from other changes. * Combine parent state traversal w/ loop detection Replace the ad-hoc parent thread state traversal that was in several filterchecks as well as in the mesos/coreos code with a central way to traverse parent thread state and detect potential loops at the same time. A new method traverse_parent_state traverses the parent state from the current thead and takes a function that is called for each thread while traversing. This prevents infinite loops like observed in falcosecurity/falco#208. This doesn't address the underlying cause of what caused the thread state to get corrupted in the first place. That's tracked by a separate issue #752. In the 4 filterchecks that used to traverse parent state (proc.sname, proc.loginshellid, proc.aname, proc.apid), replace the direct traversal with a call to traverse_parent_state + an appropriate visitor function. Update mesos's get_env_mesos_task_id, which used to do a combination of recursion and get_parent_task_id to traverse parent state, with a visitor and traverse_parent_state. It stops as soon as any of the environment variables for a thread are found. This version doesn't explicitly skip pid 1, but I don't think that was strictly necessary as init wouldn't have those environment variables anyway. Also replace a similar process in coreos to find rkt pods. * Add regression tests for parent state loops Add a new trace file parent_state_loop.scap to the traces zip that has a series of processes with malformed parent state containing a loop. Add 3 new sysdig command lines that test filterchecks/outputs that are known to traverse parent thread state. Although they should *not* cause an infinite loop, add a timeout to the sysdig command line just to make sure it is terminated somewhat quickly. * Support for updated cpu hotplug API in 4.10 kernel (#744) * Basics for kernel 4.10 hotplug support - in progress * Further 4.10 support changes, works with 4.10 now, untested pre 4.10 Fixes a bug where enabling a cpu that the module has never seen online will kernel panic. This should exist in the stock version too. (needs testing) * Misc fixes and cleanup for hotplug code Switch to nocalls cpuhp variants Hold the consumer mutex when initializing rings in the hotplug callback Record the hotplug event on all consumers Fix pre-4.10 code to use NOTIFY_OK instead of NOTIFY_DONE * Fixed pre-4.10 compiler warning * Remove potential vmalloc in atomic context This also re-adds cpu_online to fix some logic inconsistencies in using a mix of cpu_online, capture_enabled, and buffer to determine if a CPU is online * Fix up debugging code for release * Make sure to unlock mutex during error paths * use gtimeout when testing on osx * userspace/libsinsp: add sinsp dependencies when using bundled projects Signed-off-by: Riccardo Schirone <sirmy15@gmail.com> * Path argument placed on exit event for `mkdir` and `rmdir`. (#757) * Path argument placed on exit event for `mkdir` and `rmdir`. * Both old and new mkdir/rmdir events exist, for compatibility reasons. * Revert "Exclude 4.10 kernels from builder" This reverts commit a5a89eb. * Cache evt formatters (#771) * Whitespace diffs. Committing separate from other changes. * Add a caching event formatter. New object sinsp_evt_formatter_cache manages a set of sinsp_evt_formatter objects. It avoids the overhead of recreating sinsp_evt_formatter objects for each event. We'll use this initially in falco, and probably other places later. * Fix memory leak. (#772) On reset(), delete m_callbaks if it exists. * fix stopwatch utility reset * fix 394 * deletion event for non-running k8s pods not received #399; lower the log level for not found deleted entities to debug * userspace: fix some compiler warnings userspace/libsinsp: allow copy elision of temporary objects userspace/libsinsp: do not use arrays as pointers userspace/libsinsp: remove unused field userspace/libscap: remove unused function warning * driver: fix driver compilation on armv6l * Add functional header. (#780) Add <functional> so references to std::function have a definition. * K8s pods fix (#781) * Revert "deletion event for non-running k8s pods not received #399; lower the log level for not found deleted entities to debug" This reverts commit b5d83f0. * Don't require phase==Running on k8s events since it's not always the case * Revert "Revert "Exclude 4.10 kernels from builder"" This reverts commit 91da239. * Temporarily disable CoreOS Alpha * Bugfix on rkt detection, fixes #748 * Avoid calling clock() for each event but rely on event timestamp for -M option, fixed #783 * Cleanup mount points when CoreOS builder fails * Forward urladdr as is instead of recreating it * Revert "Temporarily disable CoreOS Alpha" This reverts commit 2bfb57b. * Revert "Revert "Revert "Exclude 4.10 kernels from builder""" This reverts commit ec4eba7. * fix k8s pretty-printed state JSON handling and sysdig state fetch error due to overwritten completed flag (#791) * Bline (#759) * minor friendliness changes * minor refactories * few more event listeners * listener callback for clone() * merge dev * some inlining * small interface change * the set_output_format chisel API call now supports base64 and jsonbase64 * calculate a simple hash for each process that falco can use * minor typo * save container IP with the right endianess * improve local address detection by matching against the full list of container addresses * falco process hash includes the arguments if the process is a scripting language * extract the image ID from the docker API * save/load the container ID from trace files * fix a merge issue * Add container image id filtercheck. (#661) Add support for displaying container image ids via the filtercheck container.image.id. Only supported for docker containers right now. * minor changes required by the agent * compile error * some logging for debugging purposes * a bit more debug info * a bit more debug info * a bit more debug info * a bit more debug info * a bit more debug info * a bit more debug info * more debug info * more debug info * debug info fix * decrease container verbosity * more debug info * dump to memory functionality implemented * proper support for tracers in memory dumps * bugfix: potential buffer underrun * fixed a bug when converting sinsp IPv6 FDs to scap * compression experiments * cleanups * cleanups * fix a comment * a couple of helper functions for memory dumps * don't restart event numbering when reading merged captures + FD initialization bugfix * remove some logging * small changes to support memory dumping * dump a circular capture file when a command is run in the cassandra container * cleanups * a bit on infrastructure for a notification event * notification event type * apply the filter in the successive segments of a merged capture only if there actually is a filter * temporarily enable dump of any execve * some debug info * less aggressive logging * Restore scap_savefile * heuristic to determine if a thread is part of a shell pipe * fixes to the pipe detection heuristic * propagate bash pipe flags in the execve parser * a coule of helper functions * make sure the analyzer thread info is accessed only if available * EOLs * cleanups * cleanups * removed an unused variable * fix docker watch (broken after prety-print k8s fix) * Start building standalone falco kernel modules. (#789) * Start building standalone falco kernel modules. falcosecurity/falco#215 pointed out a problem with compatibility between latest sysdig kernel module and falco 0.5.0. The (newer) driver had different events than falco was expecting, causing a crash. To fix this, I'm changing falco to package its own driver. It was already building its own driver, but the remaining changes are to change the device name from sysdig to falco, module falco-probe, etc. These changes will allow for automatically building the falco-probe kernel module on a variety of kernel platforms and running sysdig-probe-loader (under the name falco-probe-loader) to get a module as needed. While doing this, merge the nearly identical build_{falco,sysdig,sysdigcloud} functions into build_probe. It now does the work of checking out the right code based on the PROBE_* variables, runs make driver from the main code repository, and verifies it can be loaded. * Add autoconf for falco builds. The falco builds need autoconf so add it to the set of installed yum packages. * Parse processes tty (#792) * Extract tty from /proc + kernel * typo * Proper include for 2.6.32 * A couple more initializations * Fix markdown syntax errors in README (mostly links) (#796) sysdig-CLA-1.0-signed-off-by: Jan Bölsche <jan@lagomorph.de>
@pgray reported high falco cpu usage with the attached falco rules file:
falco_rules.yaml.zip
Compared to 0.5.0, a few rules have been disabled, a few additions to lists of programs that are expected to do things like spawn shells, etc. The big change is that all the network-related rules (XXX unexpected network inbound/outbound traffic) have been uncommented.
We should double-check the network-related rules to make sure they're efficient.
The text was updated successfully, but these errors were encountered: