While implementing some pinging functions I’ve come around a weird behavior of the urljoin method from urlparse:
# weird_urljoin.py file
from urlparse import urljoin
print urljoin('http://url.com/pathname', '?q=joinable')
What do you think will be printed when this code is executed? Whatever you do, you probably wouldn’t think the following would happen:
python weird_urjoin.py
"http://url.com/?q=joinable"
Yeah, while testing if the urls would hit I was building them with urljoin and getting a 400 status code every request without knowing why - I was actually printing the url I was trying to hit, but the change was so subtle that I didn’t notice.
Turns out that this behavior is expected, since the function clearly says it implements the rfc1808 standard (and actually implements the rfc2396, if we check the implementation).
Anyway, if you check the RFC I’ve linked above, you will note that they’ve both been obsoleted by rfc3986. So I decided to see if this was “fixed” in a newer python version 1, and well, it was! (if you’re going to check it yourself, note that urljoin is now part of urllib.parse, which makes a lot more sense).
After a quick investigation, I came across Issue 1432, which basically goes over all I’ve just written here.
If you’re a Python 2.6 user, you may be inclined to say that urljoin('http://url.com/pathname', '?q=joinable') works alright and it does. On 2.6. On Python 2.5 the result is the same as my first example. Check out this simple test and it’s output:
try:
from urlparse import urljoin
except ImportError:
from urllib.parse import urljoin
print(urljoin('http://url.com/pathname', '?q=joinable'))
print(urljoin('http://url.com/pathname?', 'q=joinable'))
# On Python 2.5.4:
# >>> http://url.com/?q=joinable
# >>> http://url.com/q=joinable
# On Python 2.6.3:
# >>> http://url.com/pathname?q=joinable
# >>> http://url.com/q=joinable
# On Python 3.1.1: "http://url.com/q=joinable"
# >>> http://url.com/pathname?q=joinable
# >>> http://url.com/q=joinable
The behavior is the same with 2.6 and 3.1, but on 2.5 it get’s slightly more messed up. It’d be swell if the fix from Issue 1432 got backported to 2.5 or even better: AppEngine started working based on 2.6, so I’d finally stop using 2.5. There are some great improvements that I’d love to use here.
Okay, what to do if I’m developing for 2.6 but want to keep compatibility with 2.5? I’d just use string formatting to join the urls and the parameters. Actually, I think that’s the best approach even when not keeping compatibility with older versions:
endpoint = '%(url)s?%(data)s' % data_dict
Seems way more readable to me than:
endpoint = urljoin(url, '?%(data)s' % data_dict)
And we don’t need to care whether the url is fully-qualified. So, concluding this article, I’m giving up one of my advices posted previously, at lest when doing quick coding, since you must pay great attention when using urljoin.
Even with all these considerations, there’s one thing you must always do: test. A LOT.
-
I’ve been messing with Python 3k for a long while now and whenever I have to do some quick tests in the interactive console, I go for Python3 instead of Python 2.5, which is used on AppEngine. This is sub-optimal, I know, but I get to train my mind for the changes being introduced by the new python version.
↩