Friday, 29 March 2013

Browser Requests and CGI Server Responses

A browser is an HTTP client because it sends requests to an HTTP server (Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port.
HTTP is used to transmit resources, not just files. A resource is some chunk of information that can be identified by a URL (it's the R in URL). The most common kind of resource is a file, but a resource may also be a dynamically-generated query result, the output of a CGI script, a document that is available in several languages, or something else. All HTTP resources are currently either files or server-side script output.
Like most network protocols, HTTP uses the client-server model: An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions).
The format of the request and response messages are similar, and English-oriented. Both kinds of messages consist of:
  • an initial line
  • zero or more header lines
  • a blank line (i.e. a CRLF by itself)
  • an optional message body (e.g. a file, or query data, or query output)
Put another way, the format of an HTTP message is:
<initial line, different for request vs. response>
Header1: value1
Header2: value2
Header3: value3
<optional message body goes here, like file contents or query data;
it can be many lines long, or even binary data $&*%@!^$@ >
Initial Request Line
The initial line is different for the request than for the response. A request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is:
GET /path/to/file/index.html HTTP/1.0
Important Points:
1).GET is the most common HTTP method; it says "give me this resource". Other methods include POST and HEAD-- more on those later. Method names are always uppercase.
2).The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general).
3).The HTTP version always takes the form "HTTP/x.x", uppercase.
Initial Response Line (Status Line)
The initial response line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code. Typical status lines are:
HTTP/1.0 200 OK
or
HTTP/1.0 404 Not Found
Header Lines
Header lines provide information about the request or response, or about the object sent in the message body. The header lines are in the usual text header format, which is: one line per header, of the form "Header-Name: value", ending with CRLF. It's the same format used for email and news postings
The Message Body
An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.
If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular:
1).The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif.
2).The Content-Length: header gives the number of bytes in the body.

HTTP Request Methods
HTTP/1.0 allows an open-ended set of methods to be used to indicate the purpose of a request. The three most often used methods are GET, HEAD, and POST.
The GET Method
Information from a form using the GET method is appended onto the end of the action URI being requested. Your CGI program will receive the encoded form input in the environment variable QUERY_STRING.
The GET method is used to ask for a specific document - when you click on a hyperlink, GET is being used. GET should probably be used when a URL access will not change the state of a database (by, for example, adding or deleting information) and POST should be used when an access will cause a change. Many database searches have no visible side-effects and make ideal applications of query forms using GET. The semantics of the GET method changes to a "conditional GET" if the request message includes an If-Modified-Since header field. A conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header.
The HEAD method
The HEAD method is used to ask only for information about a document, not for the document itself. HEAD is much faster than GET, as a much smaller amount of data is transferred. It's often used by clients who use caching, to see if the document has changed since it was last accessed. If it was not, then the local copy can be reused, otherwise the updated version must be retrieved with a GET.
The POST Method
This method transmits all form input information immediately after the requested URI. Your CGI program will receive the encoded form input on stdin. 

CGI Server Responses !

Like client requests, Server responses always contain HTTP headers and an optional body. The structure of the headers for the response is the same as for requests. The first header line has a special meaning, and is referred to as the status line. The remaining lines are name-value header field lines.
The Status Line
The first line of the header is the status line, which includes the protocol and version just as in HTTP requests, except that this information comes at the beginning instead of at the end. This string is followed by a space and the three-digit status code, as well as a text version of the status.
Status codes are grouped into five different classes according to their first digit:
1xx
These status codes were introduced for HTTP 1.1 and used at a low level during HTTP transactions. You won't use 100-series status codes in CGI scripts.
2xx
200-series status codes indicate that all is well with the request.
3xx
300-series status codes generally indicate some form of redirection. The request was valid, but the browser should find the content of its response elsewhere.
4xx
400-series status codes indicate that there was an error and the server is blaming the browser for doing something wrong.
5xx
500-series status codes also indicate there was an error, but in this case the server is admitting that it or a CGI script running on the server is the culprit.
Server Headers
After the status line, the server sends its HTTP headers. Some of these server headers are the same headers that browsers send with their requests.
The common server headers are:
Content-Base: Specifies the base URL for resolving all relative URLs within the document
Content-Length: Specifies the length (in bytes) of the body
Content-Type: Specifies the media type of the body
Date: Specifies the date and time when the response was sent
ETag: Specifies an entity tag for the requested resource
Last-Modified: Specifies the date and time when the requested resource was last modified
Location: Specifies the new location for the resource
Server: Specifies the name and version of the web server
Set-Cookie: Specifies a name-value pair that the browser should provide with future requests
WWW-Authenticate: Specifies the authorization scheme and realm


No comments:

Post a Comment