Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid backtracing in ceylon interpolation regex #1773

Merged
merged 1 commit into from
Dec 14, 2021

Conversation

thewoolleyman
Copy link
Contributor

Fix backtracing in ceylon interpolation regex.

Addresses the third remaining Regex Denial Of Service vulnerability reported in https://hackerone.com/reports/1283484. The other two mentioned in that report have already been addressed as part of #1690.

@thewoolleyman
Copy link
Contributor Author

FYI @dblessing

@thewoolleyman
Copy link
Contributor Author

More Details

(The following examples can be reproduced on a gitlab.com wiki page)

A high CPU load can be caused using the following ceylon regex:

```ceylon  
"````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````  

This ReDos example in the description of this issue renders in less than a second with the fix in this PR - it was 40+ seconds before.

This fix was based on the suggestions in the "Quantifiers in Sequence" section of this page: https://www.regular-expressions.info/redos.html

I also tested the following valid ceylon markdown, and it still seems to work the same with the fix:

```ceylon
print("Hello, ``person.firstName`` ``person.lastName``, the time is ``Time()``.");
print("1 + 1 = ``1 + 1``");
```

@tancnle
Copy link
Collaborator

tancnle commented Dec 13, 2021

Thank you for your contribution @thewoolleyman ❤️ We would love to see this backtracking issue addressed in rouge ASAP. I have added some suggestions for your consideration. Please let me know what you think 🙏🏼 .

@tancnle
Copy link
Collaborator

tancnle commented Dec 13, 2021

thoughts: With the change in this PR, the interpolated string still seems less ideal. They are all clumped together as Literal::String rather than with interpolation Literal::String::Interpolation. This is, however, another issue that can be dealt with in a separate PR.
Screen Shot 2021-12-13 at 2 31 37 pm

@tancnle tancnle self-assigned this Dec 13, 2021
@tancnle tancnle added the author-action The PR has been reviewed but action by the author is needed label Dec 13, 2021
@@ -51,7 +51,7 @@ class Ceylon < RegexLexer

rule %r("(\\\\|\\"|[^"])*"), Literal::String
rule %r('\\.'|'[^\\]'|'\\\{#[0-9a-fA-F]{4}\}'), Literal::String::Char
rule %r(".*``.*``.*"', String::Interpol
rule %r("[^`]*``[^`]*``[^`]*"', String::Interpol
Copy link
Collaborator

@tancnle tancnle Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a flaw with the initial rule which is caused by this line and the next being mangled into one regex (notice the extraneous single quote ' and missing closing bracket). For better or worse, that bug might not expose the backtracking issue as it is reported 🤔

Given that we addressed the wrapping issue above:

Using the reported string, I have verified on https://regex101.com/debugger

  • ".*``.*``.*" -> Match 1 failed in 106 086 step(s)
  • "[^`]*``[^`]*``[^`]*"-> Match 1 failed in 95 step(s)

Could we add the string interpolation example in the spec/visual/samples/ceylon for future reference?

diff --git spec/visual/samples/ceylon spec/visual/samples/ceylon
index b6d7a122..20cf4608 100644
--- spec/visual/samples/ceylon
+++ spec/visual/samples/ceylon
@@ -32,3 +32,6 @@ module com.acme.test "1.0.0" {
 }
 
 package com.example;
+
+print("Hello, ``person.firstName`` ``person.lastName``, the time is ``Time()``.");
+print("1 + 1 = ``1 + 1``");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've added this change in the latest push.

Comment on lines 54 to 55
rule %r("[^`]*``[^`]*``[^`]*"', String::Interpol
rule %r(\.)([a-z_]\w*)) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: What do you think about addressing both the wrapping and backtracking issue?

Suggested change
rule %r("[^`]*``[^`]*``[^`]*"', String::Interpol
rule %r(\.)([a-z_]\w*)) do
rule %r("[^`]*``[^`]*``[^`]*"), Literal::String::Interpol
rule %r((\.)([a-z_]\w*)) do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually for interpolation it's best to try and push a separate state, so you can highlight the contents of the interpolation as well. It can check for the closing interpolation tokens and then mixin the main expression state.

Copy link
Collaborator

@tancnle tancnle Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jneen. Great point. In fact, I have a follow-up commit locally to move this string interpolation to state (similar to how the ruby lexer deal with this). Once this merges, I can create another PR to address interpolation more appropriately. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tancnle I've fixed the wrapping as suggested in the latest push.

@thewoolleyman
Copy link
Contributor Author

Thank you for your contribution @thewoolleyman ❤️ We would love to see this backtracking issue addressed in rouge ASAP. I have added some suggestions for your consideration. Please let me know what you think 🙏🏼 .

@tancnle thanks for the prompt response!

I've made the two suggested changes to my commit, and pushed a rebased branch.

Let me know if there's anything else needed on my end.

@tancnle
Copy link
Collaborator

tancnle commented Dec 14, 2021

Thanks @thewoolleyman. The PR looks good to me 🚀

@tancnle tancnle merged commit db5c4ad into rouge-ruby:master Dec 14, 2021
@tancnle tancnle removed their assignment Dec 14, 2021
@tancnle tancnle removed the author-action The PR has been reviewed but action by the author is needed label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants