Description
Long story short
My client needs to send multipart/form-data to an API that expects field names with []
in the name. The server does not accept the submission with default set_content_disposition
parameters due to wrong quoting.
Expected behaviour
Content-Disposition: form-data; name="files[]"; filename="filename"
Actual behaviour
Content-Disposition: form-data; name="files%5B%5D"; filename="filename"; filename*=utf-8''filename
Steps to reproduce
Client code is like
with aiohttp.MultipartWriter('form-data') as mpw:
f = mpw.append(file)
f.set_content_disposition("form-data", name="files[]", filename="filename")
res = await self.session.post(url, data=mpw)
Your environment
aiohttp==3.5.4 async client, Ubuntu 18.04, python 3.6.8.
Analysis
Returning Values from Forms: multipart/form-data says
In most multipart types, the MIME header fields in each part are
restricted to US-ASCII; for compatibility with those systems, file
names normally visible to users MAY be encoded using the percent-
encoding method in Section 2, following how a "file:" URI
[URI-SCHEME] might be encoded.NOTE: The encoding method described in [RFC5987], which would add a
"filename*" parameter to the Content-Disposition header field, MUST
NOT be used.
It would seem the current implementation misinterpreted this to mean all field values are to be percent-encoded. But the RFC7578 is clear that the encoding is only to be used on file names. Furthermore, the filename*=
form from MIME Parameter Value and Encoded Word Extensions should be used only for the other fields, but as the filename is already via percent-encoding to within US-ASCII, filename*=
is not to be used on the filename.
For converting from unicode string to bytes for the percent-encoding, user will need to specify charset in some cases, as in the RFC:
The encoding used for the file names is typically UTF-8, although
HTML forms will use the charset associated with the form.
Thus, in some cases, an additional charset
parameter is needed in set_content_disposition
. Is it needed in other functions?
The RFCs refer to RFC822 for quoted-string definition, which is currently obsoleted by Internet Message Format RFC5322.
qtext = %d33 / ; Printable US-ASCII
%d35-91 / ; characters not including
%d93-126 / ; "\" or the quote character
obs-qtext
qcontent = qtext / quoted-pair
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
quoted-pair = ("\" (VCHAR / WSP)) / obs-qp
And from Augmented BNF for Syntax Specifications: ABNF
VCHAR = %x21-7E
; visible (printing) characters
WSP = SP / HTAB
; white space```
The quoted-pair quoting of quoted-string is missing in the current implementation.
There is also a rather far-fetched case of extremely long values causing the line length limit of 998 characters to be exceeded https://tools.ietf.org/html/rfc5322#section-2.1.1 and requiring using the Folding White Space (FWS).
I can not tell if there would be any compatibility impact of just changing the percent-quoting to the correct quoted-pair quoting. Should the quote_fields
parameter concern the percent-encoding of filename or the quoted-pair of all fields?
The current behavior seems to be result of discussion in #916 to fix #903.
Activity
kohtala commentedon Aug 31, 2019
I noticed, the
filename*
is specified in RFC 6266 Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP). It however states clearlyRFC2388 is now obsoleted by the RFC7578.
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
kohtala commentedon Oct 10, 2019
I checked with Firefox, and any form fields names containing "[]" are sent without being enoded. Also filenames are sent in 8-bit utf-8 without any encoding. Rereading #916, it would seem the quote_fields=False option has been added to get correct behavior. Any API expecting the default quote_fields=True behavior can not work with submissions from html forms. With my patch and quote_fields=False, it is further fixed to quote any quotes in the file names.
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
Fix aio-libs#4012 encoding of content-disposition parameters
5 remaining items