Friday, 29 March 2013

XML

XML stands for EXtensible Markup Language. It is a markup language much like HTML. It was designed to carry data, not to display data. Its tags are not predefined. You must define your own tags. XML is designed to be self-descriptive.

Why do we need XML?
Data-exchange
1).XML is used to aid the exchange of data. It makes it possible to define data in a clear way.
2).Both the sending and the receiving party will use XML to understand the kind of data that's been sent. By using XML everybody knows that the same interpretation of the data is used.
Replacement for EDI
1).EDI (Electronic Data Interchange) has been for several years the way to exchange data between businesses.
2).EDI is expensive, it uses a dedicated communication infrastructure. And the definitions used are far from flexible.
3).XML is a good replacement for EDI. It uses the Internet for the data exchange. And it's very flexible.
More possibilities
1).XML makes communication easy. It's a great tool for transactions between businesses.
2).But it has much more possibilities. You can define other languages with XML. A good example is WML (Wireless Markup Language), the language used in WAPcommunications. WML is just an XML dialect.




What it can do

With XML you can :
  • Define data structures
  • Make these structures platform independent
  • Process XML defined data automatically
  • Define your own tags
With XML you cannot
  • Define how your data is shown. To show data, you need other techniques.

Define your own tags
In XML, you define your own tags.
If you need a tag <TUTORIAL> or <STOCKRATE>, that's no problem.

DTD or Schema
If you want to use a tag, you'll have to define it's meaning. This definition is stored in a DTD (Document Type Definition). You can define your own DTD or use an existing one. Defining a DTD actually means defining a XML language. An alternative for a DTD is Schema.

Showing the results
Often it's not necessary to display the data in a XML document. It's for instance possible to store the data in a database right away. If you want to show the data, you can. XML itself is not capable of doing so. But XML documents can be made visible with the aid of a language that defines the presentation. XSL (eXtensible Stylesheet Language) is created for this purpose. But the presentation can also be defined with CSS (Cascading Style Sheets).

Tags
XML tags are created like HTML tags. There's a start tag and a closing tag.
<TAG>content</TAG>
The closing tag uses a slash after the opening bracket, just like in HTML.
The text between the brackets is called an element.

Syntax
The following rules are used for using XML tags:
1).Tags are case sensitive. The tag <TRAVEL> differs from the tags <Travel> and <travel>.
2).Starting tags always need a closing tag.
3).All tags must be nested properly.
4).Comments can be used like in HTML:
5).Between the starting tag and the end tag XML expects the content. <amount>135</amount> is a valid tag for an element amount that has the content 135.

Empty tags
Besides a starting tag and a closing tag, you can use an empty tag. An empty tag does not have a closing tag. The syntax differs from HTML:
Empty Tag : <TAG/>

Elements and children
With XML tags you define the type of data. But often data is more complex. It can consist of several parts. To describe the element car you can define the tags <car>mercedes</car>. This model might look like this:
<car>
<brand>Ferrari</brand>
<type>v40</type>
<color>RED</color> </car>
Besides the element car three other elements are used: brand, type and color. Brand, type and color are sub-elements of the element car. In the XML-code the tags of the sub-elements are enclosed within the tags of the element car. Sub-elements are also called children.

Relationship between HTML, SGML, and XML !

First you should know that SGML (Standard Generalized Markup Language) is the basis for both HTML and XML. SGML is an international standard (ISO 8879) that was published in 1986.
Second, you need to know that XHTML is XML. "XHTML 1.0 is a reformulation of HTML 4.01 in XML, and combines the strength of HTML 4 with the power of XML."
Thirdly, XML is NOT a language, it is rules to create an XML based language. Thus, XHTML 1.0 uses the tags of HTML 4.01 but follows the rules of XML.

The Document
A typical document is made up of three layers:
  • structure
  • Content
  • Style
Structure
Structure would be the documents title, author, paragraphs, topics, chapters, head, body etc.
Content
Content is the actual information that composes a title, author, paragraphs etc.
Style
Style is how the content within the structural elements are displayed such as font color, type and size, text alignment etc.

Markup
HTML, SGML, and XML all markup content using tags. The difference is that SGML and XML mainly deal with the relationship between content and structure, the structural tags that markup the content are not predefined (you can make up your own language), and style is kept TOTALLY separate; HTML on the other hand, is a mix of content marked up with both structural and stylistic tags. HTML tags are predefined by the HTML language.
By mixing structure, content and style you limit yourself to one form of presentation and in HTML's case that would be in a limited group of browsers for the World Wide Web.
By separating structure and content from style, you can take one file and present it in multiple forms. XML can be transformed to HTML/XHTML and displayed on the Web, or the information can be transformed and published to paper, and the data can be read by any XML aware browser or application.

SGML (Standard Generalized Markup Language)
Historically, Electronic publishing applications such as Microsoft Word, Adobe PageMaker or QuarkXpress, "marked up" documents in a proprietary format that was only recognized by that particular application. The document markup for both structure and style was mixed in with the content and was published to only one media, the printed page.
These programs and their proprietary markup had no capability to define the appearance of the information for any other media besides paper, and really did not describe very well the actual content of the document beyond paragraphs, headings and titles. The file format could not be read or exchanged with other programs, it was useful only within the application that created it.
Because SGML is a nonproprietary international standard it allows you to create documents that are independent of any specific hardware or software. The document structure (what elements are used and their relationship to each other) is described in a file called the DTD (Document Type Definition). The DTD defines the relationships between a document's elements creating a consistent, logical structure for each document.
SGML is good for handling large-scale, long-term information management needs and has been around for more than a decade as the language of defense contractors and the electronic publishing industry. Because SGML is very large, powerful, and complex it is hard to learn and understand and is not well suited for the Web environment.

XML (Extensible Markup Language)
XML is a "restricted form of SGML" which removes some of the complexity of SGML. XML like SGML, retains the flexibility of describing customized markup languages with a user-defined document structure (DTD) in a non-proprietary file format for both storage and exchange of text and data both on and off the Web.
As mentioned before, XML separates structure and content from style and the structural markup tags can actually describe the content because they can be customized for each XML based markup language. A good example of this is the Math Markup Language (MathML) which is an XML application for describing mathematical notation and capturing both its structure and content.
Until MathML, the ability to communicate mathematical expressions on the Web was limited to mainly displaying images (JPG or GIF) of the scientific notation or posting the document as a PDF file. MathML allows the information to be displayed on the Web, and makes it available for searching, indexing, or reuse in other applications.

HTML (Hypertext markup Language)
HTML is a single, predefined markup language that forces Web designers to use it's limiting and lax syntax and structure. The HTML standard was not designed with other platforms in mind, such as Web TV’s, mobile phones or PDAs. The structural markup does little to describe the content beyond paragraph, list, title and heading.
XML breaks the restricting chains of HTML by allowing people to create their own markup languages for exchanging information. The tags can be descriptive of the content and authors decide how the document will be displayed using style sheets (CSS and XSL). Because of XML's consistent syntax and structure, documents can be transformed and published to multiple forms of media and content can be exchanged between other XML applications.
HTML was useful in the part it has played in the success of the Web but has been outgrown as the Web requires more robust, flexible languages to support it's expanding forms of communication and data exchange.

In Short
XML will never completely replace SGML because SGML is still considered better for long-time storage of complex documents. However, XML has already replaced HTML as the recommended markup language for the Web with the creation of XHTML 1.0.
Even though XHTML has not made the HTML that currently exists on the Web obsolete, HTML 4.01 is the last version of HTML. XHTML (an XML application) is the foundation for a universally accessible, device independent Web.

Ways to use XML !

To use XML you need a DTD (Document Type Definition). A DTD contains the rules for a particular type of XML-documents. Actually it's the DD that defines the language.
Elements
A DTD describes elements. It uses the following syntax:
The text <! ELEMENT, followed by the name of the element, followed by a description of the element.
For example:
<!ELEMENT brand (#PCDATA)>
This DTD description defines the XML tag <brand>.
Data
The description (#PCDATA) stands for parsed character data. It's the tag that is shown and also will be parsed (interpreted) by the program that reads the XML document. You can also define (#CDATA), this stands for character data. CDATA will not be parsed or shown.
Sub elements
An element that contains sub elements is described thus:
<!ELEMENT car (brand, type) >
<!ELEMENT brand (#PCDATA) >
<!ELEMENT type (#PCDATA) >
This means that the element car has two subtypes: brand and type. Each subtype can contain characters.
Number of sub elements
If you use <!ELEMENT car (brand, type) >, the sub elements brand and type can occur once inside the element car. To change the number of possible occurrences the following indications can be used:
  • + must occur at least one time but may occur more often
  • * may occur more often but may also be omitted
  • ? may occur once or not at all
The indications are used behind the sub element name.
For example:
<!ELEMENT animal (color+) …
Making choices
With the sign '|' you define a choice between two sub elements. You enter the sign between the names of the sub elements.
<!ELEMENT animal (wingsize|legsize) >
Empty elements
Empty elements get the description EMPTY.
For example:
<!ELEMENT separator EMPTY>
that could define a separator line to be shown if the XML document appears in a browser.
DTD: external
A DTD can be an external document that's referred to. Such a DTD starts with the text
<!DOCTYPE name of root-element SYSTEM "address">
The address is an URL that points to the DTD.
In the XML document you make clear that you'll use this DTD with the line:
<!DOCTYPE name of root-element SYSTEM "address">
that should be typed after the line <?xml version="1.0"?>
DTD: internal
A DTD can also be included in the XML document itself. After the line <?xml version="1.0"?> you must type <!DOCTYPE name of root-element [ followed by the element definitions. The DTD part is closed with ]>

 

Embedding XML into HTML document !

One serious proposal is for HTML documents to support the inclusion and processing of XML data. This would allow an author to embed within a standard HTML document some well delimited, well defined XML object. The HTML document would then be able to support some functions based on the special XML markup. This strategy of permitting "islands" of XML data inside an HTML document would serve at least two purposes:
1).To enrich the content delivered to the web and support further enhancements to the XML-based content models.
2).To enable content developers to rely on the proven and known capabilities of HTML while they experiment with XML in their environments.
The result would look like this:
<HTML>
<body>
<!-- some typical HTML document with
<h1>, <h2>, <p>, etc. -->
<xml>
<!-- The <xml> tag introduces some XML-compliant markup for some specific purpose. The markup is then explicitly terminated with the </xml> tag. The user agent would invoke an XML processor only
on the data contained in the <xml></xml> pair. Otherwise the user agent would process the containing document as an HTML document. -->
</xml>
<!-- more typical HTML document markup -->
</body>
</html>

 

Converting XML to HTML for Display !

There exist several ways to convert XML to HTML for display on the Web.
Using HTML alone
If your XML file is of a simple tabular form only two levels deep then you can display XML files using HTML alone.
Using HTML + CSS
This is a substantially more powerful way to transform XML to HTML than HTML alone, but lacks the full power and flexibility of the methods listed below.
Using HTML with JavaScript
Fully general XML files of any type and complexity can be processed and displayed using a combination of HTML and JavaScript. The advantages of this approach are that any possible transformation and display can be carried out because JavaScript is a fully general purpose programming language. The disadvantages are that it often requires large, complex, and very detailed programs using recursive functions (functions that call themselves repeatedly) which are very difficult for most people to grasp
Using XSL and Xpath
XSL (eXtensible Stylesheet Language) is considered the best way to convert XML to HTML. The advantages are that the language is very compact, very sophisticated HTML can be displayed with relatively small programs, it is easy to re-purpose XML to serve a variety of purposes, it is non-procedural in that you generally specify only what you wish to accomplish as opposed to detailed instructions as to how to achieve it, and it greatly reduces or eliminates the need for recursive functions. The disadvantages are that it requires a very different mindset to use, and the language is still evolving so that many XSL processors in the Web servers are out of date and newer ones must sometimes be invoked through DOS

 

Displaying XML Document using CSS !

CSS stands for Cascading Style Sheets. Styles define how to display HTML elements. Styles are normally stored in Style Sheets. Styles were added to HTML 4.0 to solve a problem. External Style Sheets can save a lot of work. External Style Sheets are stored in CSS files. Multiple style definitions will cascade into one.
A Cascading Style Sheet is a file that contains instrunctions for formatting the elements in an XML document.
Creating and linking a CSS to your XML document is one way to tell browser how to display each of document's elements. An XML document with an attached CSS can be open directly in Internet Explorers. You don't need to use an HTML page to access and display the data.
There are two basic steps for using a css to display an XML document:
  • Create the CSS file.
  • Link the CSS sheet to XML document.
Creating CSS file
CSS is a plain text file with .css extension that contains a set of rules telling the web browser how to format and display the elements in a specific XML document. You can create a css file using your favorite text editors like Notepad, Wordpad or other text or HTML editor as show below:
general.css
employees
{
background-color: #ffffff;
width: 100%;
}
id
{
display: block; margin-bottom: 30pt; margin-left: 0;
}
name
{
color: #FF0000;
font-size: 20pt;
}
city,state,zipcode
{
color: #0000FF;
font-size: 20pt;
}

Linking
To link to a style sheet you use an XML processing directive to associate the style sheet with the current document. This statement should occur before the root node of the document.
<?xml-stylesheet type="text/css" href="styles/general.css">
The two attributes of the tag are as follows:
href
The URL for the style sheet.
type
The MIME type of the document begin linked, which in this case is text/css.
MIME stands for Multipart Internet Mail Extension. It is a standard which defines how to make systems aware of the type of content being included in e-mail messages.
The css file is designed to attached to the XML document as shown below:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!--This xml file represent the details of an employee-->
<?xml-stylesheet type="text/css" href="styles/general.css">
<employees>
<employee id="1">
<name>
<firstName>Girdhar</firstName>
<lastName>Gopal</lastName>
</name>
<city>Nissing</city>
<state>Haryana</state>
<zipcode>132024</zipcode>
</employee>
<employee id="2">
<name>
<firstName>Gopal</firstName>
<lastName>Girdhar</lastName>
</name>
<city>Kurukshetra</city>
<state>Haryana</state>
<zipcode>136119</zipcode>
</employee>
</employees>

 

Displaying XML Document using XSL !

It is a language for expressing stylesheets. It consists of two parts:
  • A language for transforming XML documents (XSLT)
  • An XML vocabulary for specifying formatting semantics
An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary.
Like CSS an XSL is linked to an XML document and tell browser how to display each of document's elements. An XML document with an attached XSL can be open directly in Internet Explorers. You don't need to use an HTML page to access and display the data.
There are two basic steps for using a css to display an XML document:
  • Create the XSL file.
  • Link the XSL sheet to XML document.
Creating XSL file
XSL is a plain text file with .css extension that contains a set of rules telling the web browser how to format and display the elements in a specific XML document. You can create a css file using your favorite text editors like Notepad, Wordpad or other text or HTML editor as show below:
general.xsl
employees
{
background-color: #ffffff;
width: 100%;
}
id
{
display: block; margin-bottom: 30pt; margin-left: 0;
}
name
{
color: #FF0000;
font-size: 20pt;
}
city,state,zipcode
{
color: #0000FF;
font-size: 20pt;
}

Linking
To link to a style sheet you use an XML processing directive to associate the style sheet with the current document. This statement should occur before the root node of the document.
<?xml-stylesheet type="text/xsl" href="styles/general.xsl">
The two attributes of the tag are as follows:
href
The URL for the style sheet.
type
The MIME type of the document begin linked, which in this case is text/css.
MIME stands for Multipart Internet Mail Extension. It is a standard which defines how to make systems aware of the type of content being included in e-mail messages.
The css file is designed to attached to the XML document as shown below:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!--This xml file represent the details of an employee-->
<?xml-stylesheet type="text/xsl" href="styles/general.xsl">
<employees>
<employee id="1">
<name>
<firstName>Girdhar</firstName>
<lastName>Gopal</lastName>
</name>
<city>Nissing</city>
<state>Haryana</state>
<zipcode>132024</zipcode>
</employee>
<employee id="2">
<name>
<firstName>Gopal</firstName>
<lastName>Girdhar</lastName>
</name>
<city>Kurukshetra</city>
<state>Haryana</state>
<zipcode>136119</zipcode>
</employee>
</employees>

 

The Futute of XML !

The future of XML is still unclear because of conflicting views of XML users. Some say that the future is bright and holds promise. While others say that it is time to take a break from the continuous increase in the volume of specifications.
In the past five years, there have been substantial accomplishments in XML. XML has made it possible to manage large quantities of information which don't fit in relational database tables, and to share labeled structured information without sharing a common Application Program Interface (API). XML has also simplified information exchange across language barriers.
But as a result of these accomplishments, XML is no longer simple. It now consists of a growing collection of complex connected and disconnected specifications. As a result , usability has suffered. This is because it takes longer to develop XML tools. These users are now rooting for something simpler. They argue that even though specifications have increased, there is no clear improvement in quality. They think in might be better to let things be, or even to look for alternate approaches beyond XML. This will make XML easier to use in the future. Otherwise it will cause instability with further increase in specifications.
The other side paints a completely different picture. They are ready for further progress in XML. There have been discussions for a new version, XML 2.0. This version has been proposed to contain the following characteristics:
  • § Elimination of DTDS
  • § Integration of namespace
  • § XML Base and XML Information Set into the base standard
Research is also being carried out into the properties and use cases for binary encoding of the XML information set.
Future of XML Applications
The future of XML application lies with the Web and Web Publishing. Web applications are no longer traditional. Browsers are now integrating games, word processors and more. XML is based in Web Publishing, so the future of XML is seen to grow as well.

 

No comments:

Post a Comment