Real Computer Science begins where we almost stop reading ...: Uniform Resource Locator

Friday, 29 March 2013

Uniform Resource Locator

URL stands for Uniform Resource Locator, the global address of documents and other resources on the World Wide Web. The first part of the address is called a protocol identifier and it indicates what protocol to use, and the second part is called a resource name and it specifies the IP address or the domain name where the resource is located. The protocol identifier and the resource name are separated by a colon and two forward slashes.

For example

http://www.tallysolutions.com/website/html/PartnerDetails/622894.php

The URLs above specifies a Web page that should be fetched using the HTTP protocol

Elements of a URL

Every URL is made up of some combination of the following: the scheme name (commonly called protocol), followed by a colon, then, depending on scheme, a hostname (alternatively, IP address), a port number, the pathname of the file to be fetched or the program to be run, then (for programs such as CGI scripts) a query string[4][5], and with HTML files, an anchor (optional) for where the page should start to be displayed.

Scheme

The scheme represents the protocol, and for our purposes will either be http or https. https represents a connection to a secure web server.

<scheme>:<scheme-specific-part>

A URL contains the name of the scheme being used (<scheme>) followed by a colon and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme. Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

Host

The hostname part of the URL should be a valid Internet hostname such as www.tallysolutions.com. It can also be an IP address such as 204.29.207.217

Port Number

The port number is optional. It's not necessary if the service is running on the default port, 80 for http servers.

Path Information

The path points to a particular directory on the specified server. The path is relative to the document root of the server, not necessarily to the root of the file system on the server. In general a server does not show its entire file system to clients. Indeed it may not really expose a file system at all. (Amazon's URLs, for example, mostly point into a database.) Rather it shows only the contents of a specified directory. This directory is called the server root, and all paths and filenames are relative to it. Thus on a Unix workstation all files that are available to the public might be in /var/public/html, but to somebody connecting from a remote machine this directory looks like the root of the file system.

The filename points to a particular file in the directory specified by the path. It is often omitted in which case it is left to the server's discretion what file, if any, to send. Many servers will send an index file for that directory, often called index.html. Others will send a list of the files in the directory. Others may send an error message.

Fragment identifier

The fragment identifier is used to reference a named anchor or ID in an HTML document. A named anchor is created in HTML document with an A element with a NAME attribute like this one:

<a name="anchor" >Here is the content you're after...</a>

Absolute and Relative URLs

Absolute URL

URLs that include the hostname are called absolute URLs. An example of an absolute URL is:

http://localhost/cgi/script.cgi.

Relative URL

URLs without a scheme, host, or port are called relative URLs. These can be further broken down into full and relative paths:

Full paths

Relative URLs with an absolute path are sometimes referred to as full paths (even though they can also include a query string and fragment identifier). Full paths can be distinguished from URLs with relative paths because they always start with a forward slash. Note that in all these cases, the paths are virtual paths, and do not necessarily correspond to a path on the web server's filesystem. An example of an absolute path is /index.html.

Relative paths

Relative URLs that begin with a character other than a forward slash are relative paths. Examples of relative paths include script.cgi and ../images/photo.jpg.

URL Character Encoding Issues

URLs are sequences of characters, i.e., letters, digits, and special characters. A URLs may be represented in a variety of ways: e.g., ink on paper, or a sequence of octets in a coded character set. The interpretation of a URL depends only on the identity of the characters used.

In most URL schemes, the sequences of characters in different parts of a URL are used to represent sequences of octets used in Internet protocols. For example, in the ftp scheme, the host name, directory name and file names are such sequences of octets, represented by parts of the URL. Within those parts, an octet may be represented by the chararacter which has that octet as its code within the US-ASCII [20] coded character set.

In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from "0123456789ABCDEF") which forming the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encodings.)

Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

No corresponding graphic US-ASCII

URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded.

Unsafe

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters < and > are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

Reserved

Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme.

Usually a URL has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL.

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. On the other hand, characters that are not required to be encoded (including alphanumerics) may be encoded within the scheme-specific part of a URL, as long as they are not being used for a reserved purpose.

No comments:

Inspiring Quotes

An inspiring quote may be just what you need to turn your day around. Here are some of the most inspiring quotes ever spoken or written.

I hated every minute of training, but I said, “Don’t quit. Suffer now and live the rest of your life as a champion.”

–Muhammad Ali

“You can have anything you want if you are willing to give up the belief that you can’t have it.”
–Robert Anthony

“There is no man living that can not do more than he thinks he can.”

–Henry Ford

“The best way to predict the future is to create it.”

–Dr. Forrest C. Shaklee

“It’s not about time, it’s about choices. How are you spending your choices?”

–Beverly Adamo

“Success…seems to be connected with action. Successful people keep moving. They make mistakes, but they don’t quit.”
–Conrad Hilton

“Destiny is not a matter of chance; it’s a matter of choice.”

–Anonymous

“The future belongs to those who believe in the beauty of their dreams.”
–Eleanor Roosevelt

“The quality of a person’s life is in direct proportion to their commitment to excellence, regardless of their chosen field of endeavor.”
–Vince Lombardi

“It is never too late to be what you might have been.”
–George Eliot

“Do not let what you can not do; interfere with what you can do.”
–John Wooden

“One man with courage makes a majority.”
–Andrew Jackson

“Failure is the opportunity to begin again more intelligently.”
–Henry Ford

“Try not to become a man of success but rather try to become a man of value.”
–Albert Einstein

“The mind is its own place, and in itself can make a heaven of Hell, a hell of Heaven.”

–John Milton

"If u are student, working and preparing give a little extra effort after regular work. A small sacrifice of TV time, fun time, or facebook time can bring a lot of better things to life than you ever imagined."

-- Naam likhna jaroori nai samajhta.

Thank you for reading, be sure to pass this along!

Friday, 29 March 2013

Uniform Resource Locator

No comments:

Post a Comment