-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
httpserver/all: Clean up and standardize request URL handling #1633
Conversation
The HTTP server now always creates a context value on the request which is a copy of the request's URL struct. It should not be modified by middlewares, but it is safe to get the value out of the request and make changes to it locally-scoped. Thus, the value in the context always stores the original request URL information as it was received. Any rewrites that happen will be to the request's URL field directly. The HTTP server no longer cleans /sanitizes the request URL. It made too many strong assumptions and ended up making a lot of middleware more complicated, including upstream proxying (and fastcgi). To alleviate this complexity, we no longer change the request URL. Middlewares are responsible to access the disk safely by using http.Dir or, if not actually opening files, they can use httpserver.SafePath(). I'm hoping this will address issues with #1624, #1584, #1582, and others.
@abiosoft: I still can't figure out exactly what this is for. 😅
caddyhttp/browse/browse.go
Outdated
if !strings.HasSuffix(r.URL.Path, "/") { | ||
staticfiles.RedirectToDir(w, r) | ||
return 0, nil | ||
u := r.Context().Value(httpserver.OriginalURLCtxKey).(url.URL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maybe not what we want. If rewrite
occurred before, r.URL.Path
contains the result and we should use it. Otherwise rewrite
middleware will disfunction when combining with browser
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good catch. This is tricky. The reason I chose to get the path of the original URL is because we end up causing a redirect with it; the redirect is external, but the rewrite is internal.
I wonder... should we check for a trailing slash using r.URL.Path
and then issue the redirect by modifying the original path stored in the context? (What do you think is most correct?)
I can go back to just using the rewritten path (r.URL.Path) - I haven't heard any issues about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original logic is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tw4452852 Alrighty -- updated.
for _, ext := range e.Extensions { | ||
if resourceExists(e.Root, urlpath+ext) { | ||
_, err := os.Stat(httpserver.SafePath(e.Root, urlpath) + ext) | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am nitpicking, but I guess this should be fine
if _, err := os.Stat(httpserver.SafePath(e.Root, urlpath) + ext); err == nil {
r.URL.Path = urlpath + ext
break
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the two-line version in this case because there's already a lot of logic in that first line. Thanks though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor notes
caddyhttp/browse/browse.go
Outdated
if u.Path == "" { | ||
u.Path = "/" | ||
} | ||
if u.Path[len(u.Path)-1] != '/' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use strings.HasSuffix
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂️ Beats me, the standard library does it this way. Maybe a single byte comparison is faster than needing do all the logic of strings.HasSuffix.
if !strings.HasSuffix(r.URL.Path, "/") { | ||
RedirectToDir(w, r) | ||
// ensure there is a trailing slash | ||
if u.Path[len(u.Path)-1] != '/' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here, any reason not to use strings.HasSuffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: no particular reason. Std lib does it this way. /shrug
I think this provides a much more straightforward way of dealing with |
caddyhttp/httpserver/server.go
Outdated
// | ||
// If opening a file, use http.Dir instead. | ||
func SafePath(siteRoot, reqPath string) string { | ||
if filepath.Separator != '/' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filepath.ToSlash
helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, indeed. Updated!
@tw4452852 Feel free to approve (or request changes to) the PR when you have a chance. I can get this out later today if so. (Will probably do so anyway.) Thanks so much for your time! |
LGTM. |
1. What issue(s) is this pull request related to?
I'm hoping this will address issues with #1624, #1584, #1582, #256 (by not breaking it), and others.
2. What does this change do, exactly?
The HTTP server now always creates a context value on the request which
is a copy of the request's URL struct. It should not be modified by
middlewares, but it is safe to get the value out of the request and make
changes to it locally-scoped. Thus, the value in the context always
stores the original request URL information as it was received. Any
rewrites that happen will be to the request's URL field directly.
The HTTP server no longer cleans /sanitizes the request URL. It made too
many strong assumptions and ended up making a lot of middleware more
complicated, including upstream proxying (and fastcgi). To alleviate
this complexity, we no longer change the request URL. Middlewares are
responsible to access the disk safely by using http.Dir or, if not
actually opening files, they can use httpserver.SafePath().
3. What documentation changes (if any) go along with this PR?
Nothing of note, except instructions for middleware authors:
Middlewares should NOT do this:
But this is safe:
If not opening files, you can use
SafePath()
:But
SafePath()
is good for when you don't need to open the file.4. Checklist
I'm considering having SafePath() take the entire *http.Request rather than just a string (so right now you have to pass req.URL.Path) - but it's not a big deal.
This is a big change, but the highlights are:
httpserver.OriginalURLCtxKey
("original_url") which stores a copy of the original url.URL value before any middlewares even see the request.