Unescape regex captures #1446

jkowens · 2018-06-11T20:09:19Z

This adds tests to make sure named parameters and regex captures are unescaped, and also provides a potential fix for #1443. I'm not familiar enough with Mustermann to know if it should handle unescaping captures.

tenderlove · 2018-06-14T19:05:09Z

lib/sinatra/base.rb

@@ -1028,7 +1028,7 @@ def process_route(pattern, conditions, block = nil, values = [])

      regexp_exists = pattern.is_a?(Mustermann::Regular) || (pattern.respond_to?(:patterns) && pattern.patterns.any? {|subpattern| subpattern.is_a?(Mustermann::Regular)} )
      if regexp_exists
-        captures           = pattern.match(route).captures
+        captures           = pattern.match(route).captures.map { |c| force_encoding URI_INSTANCE.unescape(c) if c }


I can't seem to get the tests running here, but I think this might not be correct. It seems like force_encoding will set the strings to UTF-8, but you can encode any arbitrary bytes with URI encoding. force_encoding means that we'll end up with strings that are tagged as UTF-8 but aren't actually UTF-8 data.

This problem is actually demonstrated by the test in this PR. \x80 is not valid unicode:

irb(main):005:0> x = '/foo%80/bar%e2%80%8cbaz' => "/foo%80/bar%e2%80%8cbaz" irb(main):006:0> y = URI.unescape x => "/foo\x80/bar‌baz" irb(main):007:0> y.force_encoding "UTF-8" => "/foo\x80/bar‌baz" irb(main):008:0> y.valid_encoding? => false

Hm good point. I was trying to take into account how things were done in v1.4.8:

https://github.com/sinatra/sinatra/blob/v1.4.8/lib/sinatra/base.rb#L1006

Wow! Good job tracking that down!!

I think just not calling force_encoding is the right thing to do (that's what we're doing in a monkey patch), but it looks like calling force_encoding is what Sinatra used to do, so maybe everyone has written their apps based on that behavior.

I'm not sure what is the best path forward (pun intended but also a true statement).

It turns out URI::Parser#unescape already calls force_encoding, so that was redundant anyway. You're right though, the results are tagged UTF-8 but valid_encoding? returns false.

Yes, it will call force_encoding, but it uses the encoding of the string that was passed in. When we read the path from the socket, the socket will return a string that is encoded in binary, not utf-8.

For example, a path encoded with UTF-8:

irb(main):002:0> path = "/foo%80/bar%e2%80%8cbaz" => "/foo%80/bar%e2%80%8cbaz" irb(main):003:0> path.encoding => #<Encoding:UTF-8> irb(main):004:0> x = URI.unescape path => "/foo\x80/bar‌baz" irb(main):005:0> x.encoding => #<Encoding:UTF-8> irb(main):006:0> x.valid_encoding? => false

Now if the path is binary:

irb(main):007:0> path = "/foo%80/bar%e2%80%8cbaz".b => "/foo%80/bar%e2%80%8cbaz" irb(main):008:0> path.encoding => #<Encoding:ASCII-8BIT> irb(main):009:0> x = URI.unescape path => "/foo\x80/bar\xE2\x80\x8Cbaz" irb(main):010:0> x.encoding => #<Encoding:ASCII-8BIT> irb(main):011:0> x.valid_encoding? => true

The test differs from production here because the default encoding of strings inside a file is UTF-8 (so the string literal will be tagged with UTF-8), but the strings we read from the socket will be tagged as binary.

Doing this might be more closely represent production:

get '/foo%80/bar%e2%80%8cbaz'.b

Ah, gotcha!

tenderlove · 2018-06-14T19:06:25Z

test/routing_test.rb

+  it 'unescapes named parameters and splats' do
+    mock_app {
+      get '/:foo/*' do |a, b|
+        assert_equal "foo\x80", params['foo']


Here we should also do:

assert_predicate params['foo'], :valid_encoding?

I suspect it will fail.

You were right on, this assertion fails.

tenderlove · 2018-06-14T19:47:01Z

I think in Rails, we default to force encoding things to UTF-8 since that is the most common case. At GitHub we have to deal with non UTF-8 data, so I added a thing to Rails such that a particular controller / action can override the encoding.

Maybe the right thing to do here is to call force_encoding, but also provide a way to override which encoding gets used on a particular route.

Thanks for looking in to this!

jkowens · 2018-06-14T20:31:31Z

After a little more digging, I see Sinatra provides an option to set default_encoding for the app:

set :default_encoding,  'ascii-8bit'

This doesn't help though if you only need it for particular routes, I'll try to see what that takes 👍

tenderlove

For what it's worth, this looks good to me. 😀

jkowens · 2018-06-14T21:47:08Z

I just realized I was confusing code for handling "regular" routes with regex routes. I think we're probably good on the regex front with this PR, but I realized a relatively recent commit is conflicting a bit:

b058835

I think we'll need to decide on a consistent approach for handling both. @namusyaka any thoughts?

jkowens · 2018-06-14T21:49:33Z

test/routing_test.rb

+    mock_app {
+      get '/:foo/*' do |a, b|
+        assert_equal "foo\x80", params['foo']
+        assert_predicate params['foo'], :valid_encoding?


This fails because params are force encoded with the default encoding here:

https://github.com/sinatra/sinatra/blob/master/lib/sinatra/base.rb#L1026

Just droping the permalink version of that link here as the line it points to might change in the future:

sinatra/lib/sinatra/base.rb

Line 1026 in 1b35eaa

force_encoding(params)

jkowens · 2018-06-15T03:39:08Z

@namusyaka I don't know all of the history of #1412, but the question has come up if we should try to force encode route params the default_encoding if it potentially results in an invalid encoding?

namusyaka · 2018-06-15T16:23:09Z

@jkowens Sorry, I'm busy for a few days.
I'm going to review this carefuly next week.

Thanks for your patch.

namusyaka

@tenderlove @jkowens Thank you for your contributions.
I'm really pleasure to see your discussions and fixes.
The current change set looks good to me.

If you have potential invalid encoding case, please open a new issue.

## 2.0.4 / 2018-09-15 * Don't blow up when passing frozen string to `send_file` disposition [#1137](sinatra/sinatra#1137) by Andrew Selder * Fix ubygems LoadError [#1436](sinatra/sinatra#1436) by Pavel Rosick�«ò * Unescape regex captures [#1446](sinatra/sinatra#1446) by Jordan Owens * Slight performance improvements for IndifferentHash [#1427](sinatra/sinatra#1427) by Mike Pastore * Improve development support and documentation and source code by Will Yang, Jake Craige, Grey Baker and Guilherme Goettems Schneider ## 2.0.3 / 2018-06-09 * Fix the backports gem regression [#1442](sinatra/sinatra#1442) by Marc-Andr�«± Lafortune ## 2.0.2 / 2018-06-05 * Escape invalid query parameters [#1432](sinatra/sinatra#1432) by Kunpei Sakai * The patch fixes [CVE-2018-11627](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11627). * Fix undefined method error for `Sinatra::RequiredParams` with hash key [#1431](sinatra/sinatra#1431) by Arpit Chauhan * Add xml content-types to valid html_types for Rack::Protection [#1413](sinatra/sinatra#1413) by Reenan Arbitrario * Encode route parameters using :default_encoding setting [#1412](sinatra/sinatra#1412) by Brian m. Carlson * Fix unpredictable behaviour from Sinatra::ConfigFile [#1244](sinatra/sinatra#1244) by John Hope * Add Sinatra::IndifferentHash#slice [#1405](sinatra/sinatra#1405) by Shota Iguchi * Remove status code 205 from drop body response [#1398](sinatra/sinatra#1398) by Shota Iguchi * Ignore empty captures from params [#1390](sinatra/sinatra#1390) by Shota Iguchi * Improve development support and documentation and source code by Zp Yuan, Andreas Finger, Olle Jonsson, Shota Iguchi, Nikita Bulai and Joshua O'Brien

Add tests to make sure named params and captures are unescaped

e5cd58c

tenderlove reviewed Jun 14, 2018

View reviewed changes

jkowens force-pushed the fix-1443 branch from cfd5205 to 50c7ca5 Compare June 14, 2018 20:05

jkowens force-pushed the fix-1443 branch from 50c7ca5 to 1f84f02 Compare June 14, 2018 20:40

tenderlove approved these changes Jun 14, 2018

View reviewed changes

jkowens force-pushed the fix-1443 branch from 1f84f02 to 420b6a8 Compare June 14, 2018 21:21

jkowens commented Jun 14, 2018

View reviewed changes

Unescape regex captures. Fixes sinatra#1443.

529f791

jkowens force-pushed the fix-1443 branch from 420b6a8 to 529f791 Compare June 15, 2018 12:25

namusyaka added this to the v2.0.4 milestone Jun 24, 2018

namusyaka approved these changes Aug 4, 2018

View reviewed changes

namusyaka merged commit 9590706 into sinatra:master Aug 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unescape regex captures #1446

Unescape regex captures #1446

jkowens commented Jun 11, 2018 •

edited

Loading

tenderlove Jun 14, 2018

jkowens Jun 14, 2018

tenderlove Jun 14, 2018

jkowens Jun 14, 2018

tenderlove Jun 14, 2018

jkowens Jun 14, 2018

tenderlove Jun 14, 2018

jkowens Jun 14, 2018

tenderlove commented Jun 14, 2018

jkowens commented Jun 14, 2018

tenderlove left a comment

jkowens commented Jun 14, 2018

jkowens Jun 14, 2018

dentarg Jun 14, 2018

jkowens commented Jun 15, 2018

namusyaka commented Jun 15, 2018

namusyaka left a comment

Unescape regex captures #1446

Unescape regex captures #1446

Conversation

jkowens commented Jun 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tenderlove commented Jun 14, 2018

jkowens commented Jun 14, 2018

tenderlove left a comment

Choose a reason for hiding this comment

jkowens commented Jun 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkowens commented Jun 15, 2018

namusyaka commented Jun 15, 2018

namusyaka left a comment

Choose a reason for hiding this comment

jkowens commented Jun 11, 2018 •

edited

Loading