One man’s URL encoding is another man’s bug

I got an email last night from someone who was using my sample code to open URLs in the default browser on OSX. However, he was having trouble when his URL contained a string.

Easy peasy, I thought, just URL Encode it. (Actually, easy peasy isn’t the phrase I used, but I’m trying to keep this relatively clean). I jumped into Delphi for what I thought would be a quick 5 minutes to alter the code, and finished 2 hours later with it finally working, having learnt more than I really wanted to know about URL encoding along the way.

If you don’t care for the details, I’ve updated the sample code to handle URLs with spaces (and other reserved characters) on github and as a zip file. Hope this helps.

In case there is anybody still reading, here’s some of what I learned along the way.

If you’re not sure what URL Encoding is, here’s a quick example. Type the following URL into your browser:

http://www.malcolmgroves.com/test page.html

Hit enter and let the page load. Then look in the address bar. Chances are the URL has been changed to:

http://www.malcolmgroves.com/test%20page.html

Notice the space has been replaced with %20, 20 being the ASCII value in hex for the space character? This is URL Encoding at work, and is also referred to as Percent Encoding. If you can’t sleep, RFC 3986 lays it all out for you, but in short, characters in a URL fall into two groups: reserved and unreserved. Unreserved characters can be reproduced unchanged in a URL, but reserved characters in particular contexts have to be encoded. The reserved characters and their corresponding encoded values are:


(Thank you Wikipedia)

Note the space character isn’t listed there. It’s actually considered as character data, but we’re getting off-track.

Here’s the original code, with no encoding:

At this stage I still thought this was an easy fix, so the first thing I looked at was NSURL. Surely this must have a method to correctly load itself from an unencoded string? No, but I thought I found the next best thing: NSString has a method called stringByAddingPercentEscapesUsingEncoding. Bit wordy, but looks like just the thing I’m after. However, encoding the above URL gave me this:

http%3A%2F%2Fwww.malcolmgroves.com%2Ftest+page.html

Not only did it replace the space with a + (see this section on wikipedia), it replaced the forward slashes with %2F and the colon with %3A. Leaving aside the plus, at first glance you might think the rest is right, given the table above. The important thing to realise is that reserved characters only need to be encoded in particular contexts, not just whenever they appear. For example, a forward slash is perfectly valid as a separator between the elements of your URL path, however it is not valid as part of the value in a name=value pair in a URL query string.

So, our encoding method needs to be aware of the different contexts of a URL, and only encode reserved characters that appear out of their valid contexts. Realising that, it was pretty clear why NSString wasn’t doing that for me. It’s a generic string, it doesn’t really know if the contents are a URL, an English sentence or some other random text. It just blindly encodes what it’s given.

I then remembered that the HTTPApp unit has a method called HTTPEncode. Even before I tried it, looking at the source told me it would do exactly the same thing, and indeed, it did. Like NSString, this would be fine for encoding just the params for our URL, but not the whole thing. In fact, the docs (when I eventually got around to reading them) point out that this method is for encoding values to be included in a HTTP message header, not a URL.

I googled a bit and found a number of bits of sample source doing exactly the same, blind string replace technique.

Then I remembered Indy. Indy has a TIdURI class in the IdURI unit, maybe that would do it properly? A quick glance at the source showed that the TIdURI.URLEncode class function seemed to be breaking the URL down into it’s component parts, and encoding them separately. This was promising. A quick test showed that passing the original URL into this method gave me back exactly what I expected:

http://www.malcolmgroves.com/test%20page.html

Thankfully, it also works on OSX.

So, I’ve updated the original source to look like this:

and now it happily handles spaces in URLs.

Be the first to leave a comment. Don’t be shy.

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">