URL
A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it
Defined in RFC 1738 in 1994 by Tim Berners-Lee, a URL is a specific type of Uniform Resource Identifier (URI) (RFC 3986)
URL is a subset of URI
URLs are used to reference:
- Web pages (http)
- File transfer (
ftp
) - email (
mailto
) - Database access (
JDBC
) - And many other applications
URL Syntax
Every HTTP URL conforms to the syntax of a generic URI (consisting of a hierarchical sequence of 5 components):
URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
Where the authority
component divides into 3 subcomponents:
authority = [userinfo "@"] host [":" port]
The URI comprises:
Scheme: A non-empty scheme component followed by a colon (
:
):- Begins with a letter
- Followed by any combination of letters, digits, plus (
+
), period (.
), or hyphen (-
) - Canonical form lowercase (they are case-insensitive)
- E.g.
http
,https
,ftp
,mailto
,file
,data
, andirc
Authority: An optional component preceded by 2 slashes (
//
):userinfo
(optional) subcomponent that may consist of a user name and an optional password proceeded by a colon (:
), followed by an at symbol (@
)username:password
: is depreciated for security reasons (it uses plain text)
host
: A subcomponent consisting of either a registered name or an IP address- IPv4 addresses must be in dot-decimal notation
- IPv6 addresses must be enclosed in brackets (
[]
)
port
: optional subcomponent preceded by a colon (:
)
Path: A component consisting of a sequence of path segments separated by a slash (
/
)- Always defined and may be empty (0 length)
- In
http
andhttps
URIs, the last part of a path is namedpathinfo
(optional) - For more information Wiki URL
Query: An optional component preceded by a question mark (
?
) containing a query string of non-hierarchical data:- Its syntax is not well-defined
- By convention is often a sequence of attribute–value pairs separated by a delimiter.
Fragment: An optional component proceeded by a hash (
#
):- The fragment contains a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI.
- When the primary resource is an HTML document, the fragment is often an id attribute of a specific element, and web browsers will scroll this element into view.
Example URL http://www.example.com:81/a/b.html?user=Alice&year=2049#heading
is divided into different parts:
Scheme:
http
(Protocol)Authority: Domain Name + Port
- Domain Name:
www.example.com
(Hostname) - Port Number:
81
- Domain Name:
Path:
/a/b.html
(Path to the file or API endpoints or dynamically generated)Query:
?user=Alice&year=2049
(Parameters)Fragment:
#heading
(Anchor)
Ways to specify a URL
Full URL:
<a href='http://stanford.edu/news/2019/'>2019 News</a>
Relative URL:
<a href='september'>September News</a>
- Same as:
http://stanford.edu/news/2019/september
- Same as:
Absolute URL:
<a href='/events'>Events</a>
- It removes everything till the root and adds
/event
to it, unlike relative URL - Same as:
http://stanford.edu/events
- It removes everything till the root and adds
Fragment URL:
<a href='#section3'>Jump to Section 3</a>
- Scrolls to
<a href='section3' />
within page - Same as:
http://stanford.edu/events#section3
- Scrolls to
Some HTML tags that can include URLs:
<img>
<video>
,<audio>
<canvas>
<link>
,<style>
<script>
TRAILING FORWARD SLASH
If you add trailing /
to the URL?
Internationalized URL
An Internationalized Resource Identifier (IRI) is a form of URL that includes Unicode characters. All modern browsers support IRIs
The domain name in the IRI is known as an Internationalized Domain Name (IDN)
Web and Internet software automatically convert the domain name into
punycode
usable by the Domain Name System; for example, the Chinese URLhttp://例子.卷筒纸
becomeshttp://xn--fsqu00a.xn--3lr804guic/
. Thexn--
indicates that the character was not originally ASCII- Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames
The user can also specify the URL path name in the local writing system. If not already encoded, it is converted to UTF-8, and any characters not part of the primary URL character set are escaped as hexadecimal using percent-encoding;
- For example, the Japanese URL
http://example.com/引き割り.html
becomeshttp://example.com/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html
. The target computer decodes the address and displays the page
- For example, the Japanese URL