A browser is an HTTP client because it sends requests
to an HTTP server (Web server), which then sends responses back to the
client. The standard (and default) port for HTTP servers to listen on is
80, though they can use any port.
HTTP is used to transmit resources, not just
files. A resource is some chunk of information that can be identified by
a URL (it's the R in URL). The most common kind of resource is a file,
but a resource may also be a dynamically-generated query result, the
output of a CGI script, a document that is available in several
languages, or something else. All HTTP resources are currently either
files or server-side script output.
Like most network protocols, HTTP uses the
client-server model: An HTTP client opens a connection and sends a
request message to an HTTP server; the server then returns a response
message, usually containing the resource that was requested. After
delivering the response, the server closes the connection (making HTTP a
stateless protocol, i.e. not maintaining any connection information
between transactions).
The format of the request and response messages are similar, and English-oriented. Both kinds of messages consist of:
- an initial line
- zero or more header lines
- a blank line (i.e. a CRLF by itself)
- an optional message body (e.g. a file, or query data, or query output)
<initial line, different for request vs. response>
Header1: value1
Header2: value2
Header3: value3
<optional message body goes here, like file contents or query data;
it can be many lines long, or even binary data $&*%@!^$@ >
it can be many lines long, or even binary data $&*%@!^$@ >
Initial Request Line
The initial line is different for the request
than for the response. A request line has three parts, separated by
spaces: a method name, the local path of the requested resource, and the
version of HTTP being used. A typical request line is:
GET /path/to/file/index.html HTTP/1.0Important Points:
1).GET is the most common HTTP method; it says
"give me this resource". Other methods include POST and HEAD-- more on
those later. Method names are always uppercase.
2).The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general).
3).The HTTP version always takes the form "HTTP/x.x", uppercase.
2).The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general).
3).The HTTP version always takes the form "HTTP/x.x", uppercase.
Initial Response Line (Status Line)
The initial response line, called the status
line, also has three parts separated by spaces: the HTTP version, a
response status code that gives the result of the request, and an
English reason phrase describing the status code. Typical status lines
are:
HTTP/1.0 200 OKor
HTTP/1.0 404 Not Found
Header Lines
Header lines provide information about the
request or response, or about the object sent in the message body. The
header lines are in the usual text header format, which is: one line per
header, of the form "Header-Name: value", ending with CRLF. It's the
same format used for email and news postings
The Message Body
An HTTP message may have a body of data sent
after the header lines. In a response, this is where the requested
resource is returned to the client (the most common use of the message
body), or perhaps explanatory text if there's an error. In a request,
this is where user-entered data or uploaded files are sent to the
server.
If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular:
1).The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif.
2).The Content-Length: header gives the number of bytes in the body.
2).The Content-Length: header gives the number of bytes in the body.
HTTP Request Methods
HTTP/1.0 allows an open-ended set of methods to
be used to indicate the purpose of a request. The three most often used
methods are GET, HEAD, and POST.
The GET Method
Information from a form using the GET method is
appended onto the end of the action URI being requested. Your CGI
program will receive the encoded form input in the environment variable
QUERY_STRING.
The GET method is used to ask for a specific
document - when you click on a hyperlink, GET is being used. GET should
probably be used when a URL access will not change the state of a
database (by, for example, adding or deleting information) and POST
should be used when an access will cause a change. Many database
searches have no visible side-effects and make ideal applications of
query forms using GET. The semantics of the GET method changes to a
"conditional GET" if the request message includes an If-Modified-Since
header field. A conditional GET method requests that the identified
resource be transferred only if it has been modified since the date
given by the If-Modified-Since header.
The HEAD method
The HEAD method is used to ask only for
information about a document, not for the document itself. HEAD is much
faster than GET, as a much smaller amount of data is transferred. It's
often used by clients who use caching, to see if the document has
changed since it was last accessed. If it was not, then the local copy
can be reused, otherwise the updated version must be retrieved with a
GET.
The POST Method
This method transmits all form input information
immediately after the requested URI. Your CGI program will receive the
encoded form input on stdin.
CGI Server Responses !
Like client requests, Server responses always
contain HTTP headers and an optional body. The structure of the headers
for the response is the same as for requests. The first header line has a
special meaning, and is referred to as the status line. The remaining
lines are name-value header field lines.
The Status Line
The first line of the header is the status line,
which includes the protocol and version just as in HTTP requests, except
that this information comes at the beginning instead of at the end.
This string is followed by a space and the three-digit status code, as
well as a text version of the status.
Status codes are grouped into five different classes according to their first digit:
1xx
These status codes were introduced for HTTP 1.1
and used at a low level during HTTP transactions. You won't use
100-series status codes in CGI scripts.
2xx
200-series status codes indicate that all is well with the request.
3xx
300-series status codes generally indicate some
form of redirection. The request was valid, but the browser should find
the content of its response elsewhere.
4xx
400-series status codes indicate that there was an error and the server is blaming the browser for doing something wrong.
5xx
500-series status codes also indicate there was
an error, but in this case the server is admitting that it or a CGI
script running on the server is the culprit.
Server Headers
After the status line, the server sends its HTTP
headers. Some of these server headers are the same headers that browsers
send with their requests.
The common server headers are:
Content-Base: Specifies the base URL for resolving all relative URLs within the document
Content-Length: Specifies the length (in bytes) of the body
Content-Type: Specifies the media type of the bodyDate: Specifies the date and time when the response was sent
ETag: Specifies an entity tag for the requested resource
Last-Modified: Specifies the date and time when the requested resource was last modified
Location: Specifies the new location for the resourceServer: Specifies the name and version of the web server
Set-Cookie: Specifies a name-value pair that the browser should provide with future requests
WWW-Authenticate: Specifies the authorization scheme and realm
No comments:
Post a Comment