Whole words only

Sunday, April 19, 2009

Spaces in URLs

Over the past few weeks we've been experimenting with different rules for handling space characters in URLs. Technically, spaces are not allowed in URLs; they should be encoded as %20. Nevertheless, many websites use spaces in file names.

We looked at treatment of spaces in common browsers. In every case browsers translate interior spaces to %20. The treatment of leading and trailing spaces varies. The most forgiving behavior is to remove them. Thus we now remove leading/trailing spaces, but keep interior spaces. Here are a couple of examples to illustrate:

<a href=" leadingSpace.html"> is treated as <a href="leadingSpace.html">

While

<a href="interior Space.html"> becomes <a href="interior%20Space.html">