Just as there are many different methods of access to resources, there are several schemes for describing the location of such resources. URLs are used to `locate’ resources, by providing an abstract identification of the resource location. Having located a resource, a system may perform a variety of operations on the resource, as might be characterized by such words as `access’, `update’, `replace’, `find attributes’. In general, only the `access’ method needs to be specified for any URL scheme.
A URL contains the name of the scheme being used () followed by a colon and then a string (the ) whose interpretation depends on the scheme. Scheme names consist of a sequence of characters. The lower case letters “a”–“z”, digits, and the characters plus (“+”), period (“.”), and hyphen (“-“) are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case cheme names (e.g., allow “HTTP” as well as “http”).
URL Character Encoding Issues
URLs are sequences of characters, i.e., letters, digits, and special characters. A URLs may be represented in a variety of ways: e.g., ink on paper, or a sequence of octets in a coded character set. The interpretation of a URL depends only on the identity of the characters used.
In most URL schemes, the sequences of characters in different parts of a URL are used to represent sequences of octets used in Internet protocols. For example, in the ftp scheme, the hostname, directory name and file names are such sequences of octets, represented by parts of the URL. Within those parts, an octet may be represented by the character which has that octet as its code within the US-ASCII coded character set.
In addition, octets may be encoded by a character triplet consisting of the character “%” followed by the two hexadecimal digits which forming the hexadecimal value of the octet. Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.
No corresponding graphic US-ASCII:
URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded.
Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs.
All unsafe characters must always be encoded within a URL. For example, the character “#” must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.
Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters “;”, “/”, “?”, “:”, “@”, “=” and “&” are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme.
Usually, a URL has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL. Thus, only alphanumerics, the special characters “$-_.+!*'(),”, and reserved characters used for their reserved purposes may be used unencoded within a URL. On the other hand, characters that are not required to be encoded (including alphanumerics) may be encoded within the scheme-specific part of a URL, as long as they are not being used for a reserved purpose.
Hierarchical schemes and relative links
In some cases, URLs are used to locate resources that contain pointers to other resources. In some cases, those pointers are represented as relative links where the expression of the location of the second resource is in terms of “in the same place as this one except with the following relative path”. Relative links are not described in this document. However, the use of relative links depends on the original URL containing a hierarchical structure against which the relative link is based. Some URL schemes (such as the ftp, http, and file schemes) contain names that can be considered hierarchical; the components of the hierarchy are separated by “/”.
The mapping for some existing standard and experimental protocols is outlined in the BNF syntax definition. Notes on particular protocols follow. The schemes covered are: