Bug in the URL regex used to validate AnyUrl fields #1115
Closed
Description
Bug
Please complete:
- OS: Ubuntu 18.04
- Python version
import sys; print(sys.version)
: 3.7 - Pydantic version
import pydantic; print(pydantic.VERSION)
: 1.1
Please read the docs and search through issues to
confirm your bug hasn't already been reported.
Where possible please include a self contained code snippet describing your bug:
import pydantic
class Item(pydantic.Model):
url: pydantic.HttpUrl = ''
item = Item(url='http://twitter.com/@handle')
The above snippet throws an error because of a bug in the url_regex
used by all AnyUrl
subclasses that makes the validator think that twitter.com/
is the username in this URL because of the presence of @ in the path. This can be fixed trivially by adding / to the characters that cannot be in username (or for that matter, password).
So in pydantic/networks.py
, the following line would change from:
url_regex = re.compile(
r'(?:(?P<scheme>[a-z0-9]+?)://)?' # scheme
r'(?:(?P<user>[^\s:]+)(?::(?P<password>\S*))?@)?' # user info
r'(?:'
r'(?P<ipv4>(?:\d{1,3}\.){3}\d{1,3})|' # ipv4
r'(?P<ipv6>\[[A-F0-9]*:[A-F0-9:]+\])|' # ipv6
r'(?P<domain>[^\s/:?#]+)' # domain, validation occurs later
r')?'
r'(?::(?P<port>\d+))?' # port
r'(?P<path>/[^\s?]*)?' # path
r'(?:\?(?P<query>[^\s#]+))?' # query
r'(?:#(?P<fragment>\S+))?', # fragment
re.IGNORECASE,
)
to
url_regex = re.compile(
r'(?:(?P<scheme>[a-z0-9]+?)://)?' # scheme
r'(?:(?P<user>[^\s:/]+)(?::(?P<password>[^\s/]*))?@)?' # user info
r'(?:'
r'(?P<ipv4>(?:\d{1,3}\.){3}\d{1,3})|' # ipv4
r'(?P<ipv6>\[[A-F0-9]*:[A-F0-9:]+\])|' # ipv6
r'(?P<domain>[^\s/:?#]+)' # domain, validation occurs later
r')?'
r'(?::(?P<port>\d+))?' # port
r'(?P<path>/[^\s?]*)?' # path
r'(?:\?(?P<query>[^\s#]+))?' # query
r'(?:#(?P<fragment>\S+))?', # fragment
re.IGNORECASE,
)