Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: TIME variables support #1223

Merged
merged 3 commits into from
Dec 9, 2024

Conversation

geoolekom
Copy link
Contributor

Thank you for contributing to Coraza WAF, your effort is greatly appreciated
Before submitting check if what you want to add to coraza list meets quality standards before sending pull request. Thanks!

Make sure that you've checked the boxes below before you submit PR:

Thanks for your contribution ❤️

This pull request addresses issue #1220 by introducing support for the following time-related variables in Coraza WAF:

  • TIME
  • TIME_DAY
  • TIME_EPOCH
  • TIME_HOUR
  • TIME_MIN
  • TIME_MON
  • TIME_SEC
  • TIME_WDAY
  • TIME_YEAR

These variables enhance Coraza's functionality by enabling detailed logging and time-based rule creation, improving compatibility with ModSecurity. Additionally, this PR includes tests to ensure the correct implementation and behavior of these new variables.

@geoolekom geoolekom requested a review from a team as a code owner November 18, 2024 12:46
@geoolekom geoolekom force-pushed the feature-time-variables branch from 6ebf213 to d3b9c23 Compare November 18, 2024 12:46
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 82.35294% with 18 lines in your changes missing coverage. Please review.

Project coverage is 81.67%. Comparing base (64f7b2a) to head (b952670).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/corazawaf/rule_multiphase.go 0.00% 18 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1223    +/-   ##
========================================
  Coverage   81.66%   81.67%            
========================================
  Files         168      168            
  Lines        9655     9757   +102     
========================================
+ Hits         7885     7969    +84     
- Misses       1519     1537    +18     
  Partials      251      251            
Flag Coverage Δ
coraza.rule.case_sensitive_args_keys 81.63% <82.35%> (+<0.01%) ⬆️
coraza.rule.multiphase_valuation 81.67% <82.35%> (+<0.01%) ⬆️
coraza.rule.no_regex_multiline 81.61% <82.35%> (+<0.01%) ⬆️
default 81.67% <82.35%> (+<0.01%) ⬆️
examples+ 16.50% <20.58%> (+0.04%) ⬆️
examples+coraza.rule.case_sensitive_args_keys 81.63% <82.35%> (+<0.01%) ⬆️
examples+coraza.rule.multiphase_valuation 81.50% <82.35%> (+<0.01%) ⬆️
examples+coraza.rule.no_regex_multiline 81.53% <82.35%> (+<0.01%) ⬆️
examples+memoize_builders 81.63% <82.35%> (+<0.01%) ⬆️
examples+no_fs_access 80.95% <82.35%> (+0.01%) ⬆️
ftw 81.67% <82.35%> (+<0.01%) ⬆️
memoize_builders 81.76% <82.35%> (+<0.01%) ⬆️
no_fs_access 81.12% <82.35%> (+0.01%) ⬆️
tinygo 81.64% <82.35%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fzipi fzipi changed the title TIME variables support feat: TIME variables support Nov 18, 2024
tx.variables.time.Set(timestamp.Format(time.TimeOnly))
tx.variables.timeDay.Set(strconv.Itoa(timestamp.Day()))
tx.variables.timeEpoch.Set(strconv.FormatInt(timestamp.Unix(), 10))
tx.variables.timeHour.Set(strconv.Itoa(timestamp.Hour()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to say we can come up with a map for this instead of doing the conversion since these values are deterministic. Still worth to write a benchmark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you propose to use a TIME_MAP variable with keys such as hour, day, month, year, and so on? Did I get this right?

If so, this approach would further widen the compatibility gap between ModSecurity and Coraza, whereas the main goal of this PR is to reduce it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I wasn't clear enough. I was advocating for having a map in memory e.g.

var hourConversion map[int]string = map[int]string{1: "1", 2: "2", ...}

instead of doing the strconv.Itoa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of benchmarking, I run BenchmarkTransactionCreation - because all the values are set in the constructor.

Before changes:

cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
BenchmarkTransactionCreation
BenchmarkTransactionCreation-8   	   53013	     21910 ns/op
PASS

After changes:

BenchmarkTransactionCreation
BenchmarkTransactionCreation-8   	   51748	     22462 ns/op
PASS

However, the results vary widely from run to run. Am I missing something? Maybe, you could explain in more detail, what should I benchmark?

@@ -1360,6 +1378,18 @@ func makeTransaction(t testing.TB) *Transaction {
return tx
}

func makeTransactionTimestamped(t testing.TB) *Transaction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder by testing.TB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same interface as in makeTransaction and makeTransactionMultipart to ensure uniformity.

@jcchavezs
Copy link
Member

jcchavezs commented Nov 18, 2024

First of all, thanks a lot @geoolekom for taking the time to put up this PR, we truly appreciate users that are willing to contribute. That said, I am not fully convinced on the need of this. I believe the case you point out is more like a logger concern as we are only interpolating the variable to display the timestamp. Interpolating a variable isn't cheap, nor it is allocating all the variables we allocated and converted by default from int to string.

Notice native environments are OK with clock calls but other environments like Wasm don't get it that simple.

I would say if we could come up with a solid use case where the time is part of the rule I am in to land the variables (only those showing value for the rule) but if the concern is timestamp in the logs we should definitively address that in the logger. What do you think?

@geoolekom
Copy link
Contributor Author

Thanks so much for your detailed feedback @jcchavezs! I really appreciate you taking the time to review this PR and share your thoughts. Let me clarify the rationale behind these changes and explore possible solutions.

  1. Modsec compatibility. The main idea behind this PR is to keep Coraza as close as possible to ModSecurity. If we skip these variables, it’ll create a bigger compatibility gap, which I’d really like to avoid.
  2. Ruleset compatibility. These time-related variables are really important because some of popular rulesets, like Imunify360, rely on them heavily. Dropping support for these would make Coraza incompatible with a bunch of existing rulesets, which would make it a lot harder for users to migrate their setups.
  3. Performance. It would be very helpful if you could share an example of a benchmarking report or methodology that seems convincing to you, so we’re addressing your concerns in the best way possible.

If the benchmarks show a real hit to performance, I’ve got a few ideas to make things more flexible:

  • Add a flag: We could introduce a toggle to enable or disable the timestamp variables, so users can choose based on their needs.
  • Allow setvar directives to define variables outside the TX collection, so users can define what they want.
  • Custom variable registration: We could create a plugins.RegisterVariable function (like plugins.RegisterOperator) to let users register extra variables.

Let me know what you think of these ideas.

@jcchavezs
Copy link
Member

Thanks @geoolekom! I think most of the points raised here make sense. Let me do a couple of experiments as what I have in mind is that this a good opportunity for trying lazy collections (i.e. collections that are filled on demand) given that a transaction will be read in a sync fashion.

tx.variables.timeMin.Set(strconv.Itoa(timestamp.Minute()))
tx.variables.timeSec.Set(strconv.Itoa(timestamp.Second()))
tx.variables.timeWday.Set(strconv.Itoa(int(timestamp.Weekday())))
tx.variables.timeMon.Set(strconv.Itoa(int(timestamp.Month()) - 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having TIME_MON in the range 0-11 instead of 1-12 feels very counterintuitive to me.
ModSecurity behaviour is not consistent, with v2 ranging 1-12 and v3 ranging 0-11. I opened this issue about it: owasp-modsecurity/ModSecurity#3305.

I'm pushing for a unified behaviour between all the engines and I feel that the natural one would be 1-12 range.

Note: If we are going to implement it as 1-12 also Coraza Docs have to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I agree. Should I update this PR right now, or wait until Modsecurity issue is closed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might take long

Copy link
Contributor Author

@geoolekom geoolekom Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can update in advance, right? Or should we stick with modsec-v3 compatibility and use 0-11 range?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would not be compliant with one of the two current ModSecurity implementations anyway 😄.
There is already an open PR owasp-modsecurity/ModSecurity#3306 that aligns v3 to use 1-12 range. So I believe that we have pretty much agreed that 1-12 range is the one we should stick with. Let's go with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@geoolekom geoolekom force-pushed the feature-time-variables branch from d3b9c23 to 22be149 Compare November 25, 2024 11:51
@geoolekom geoolekom requested a review from M4tteoP November 25, 2024 11:55
@geoolekom
Copy link
Contributor Author

geoolekom commented Nov 27, 2024

@jcchavezs Real-life examples of rules where TIME variables are truly important.

Track time for filescan

We use a set of rules that stores the DURATION value before a scan is started and after completion. This is required to control the time taken for the scan process.

SecRule &TX:lua_present "@eq 0" "id:33363,nolog,auditLog,block,chain,severity:2,phase:2,t:none,msg:'IM360 WAF: Attempt to upload malware||Scan duration:%{TX.py_scan_duration}||Sizes:%{FILES_SIZES}||Combined size:%{FILES_COMBINED_SIZE}||User:%{SCRIPT_USERNAME}||Filename:%{FILES}||Scanned:%{FILES_TMPNAMES}||SC:%{SCRIPT_FILENAME}||WPU:%{TX.wp_user}||RSV:6.52||T:APACHE||',tag:'service_i360'"
SecRule FILES_TMPNAMES "@inspectFile /opt/alt/python35/share/imunify360/scripts/modsec_scan.py" "t:none,setvar:TX.py_scan_duration=%{DURATION},setvar:TX.py_scan_duration=-%{TX.py_scan_start}"

Dynamic IP Tracking and RBL Validation

This example tracks client IPs and combines them with timestamps for uniqueness, then identifies risky actions by matching request filenames against a known list. It verifies risky actions through an external RBL lookup to confirm malicious or undesirable behavior and then blocks requests based on positive RBL matches..

SecAction "id:33368,phase:2,pass,nolog,severity:5,setvar:tx.rbl_ip=%{TIME_HOUR}-%{TIME_MIN}.%{tx.remote_addr},initcol:ip=%{tx.remote_addr}"

SecRule REQUEST_FILENAME "@pmFromFile risky-actions.list" "id:33315,phase:2,block,severity:2,nolog,auditlog,t:none,msg:'IM360 WAF: RBL block risky actions||RSV:6.52||T:APACHE||MV:%{MATCHED_VAR}',chain,setvar:tx.rbl_perf=1,tag:'service_i360'"
SecRule TX:RBL_IP "@rbl risky-actions.v2.rbl.imunify.com." "t:none,chain"
SecRule TX:RBL_IP "!@rbl nxdomain.v2.rbl.imunify.com." "t:none"

@geoolekom geoolekom force-pushed the feature-time-variables branch from 22be149 to 3e0131e Compare November 27, 2024 11:36
@geoolekom
Copy link
Contributor Author

@jcchavezs Could you please elaborate, what could we do to get this feature merged?

jcchavezs
jcchavezs previously approved these changes Dec 3, 2024
@jcchavezs jcchavezs dismissed their stale review December 3, 2024 18:37

I wasn't meant to approve but to comment.

@jcchavezs
Copy link
Member

I worked a PoC on lazy collections in geoolekom#1

@@ -242,6 +242,7 @@ func (w *WAF) newTransaction(opts Options) *Transaction {
tx.variables.duration.Set("0")
tx.variables.highestSeverity.Set("0")
tx.variables.uniqueID.Set(tx.id)
tx.setTimeVariables()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't better to just remove this call and generate the value for each variable at access time instead? Kind of being more an init than set.

The lazy collections idea is generic and will work. I would like to see if we add extra memory/benchmarks.

Another option is to have specific time collections that receive the timestamp, and the function to apply when setting and only evaluate when getting. I guess in the end is more or less the same?

@jcchavezs jcchavezs enabled auto-merge (squash) December 9, 2024 08:49
@jcchavezs jcchavezs merged commit f3d06a2 into corazawaf:main Dec 9, 2024
71 checks passed
@jcchavezs jcchavezs mentioned this pull request Dec 9, 2024
4 tasks
@jcchavezs
Copy link
Member

We merged this PR as it was and with a couple of performance improvements in #1242 because lazy collections wasn't bringing enough improvements for this particular use case that it did not feel right to land such a concept for low benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants