Welcome

To my corner on the net... Warning, this is a techie blog! Non-techie people may suffer bouts of epilepsy on viewing this blog. The author cannot be held responsible.

More on XML (Task 3)

Saturday, 29 October 2011

What is XML well-formedness and how could you measure it?

A correctly formed XML document which strictly adheres to the XML syntax rules and guides is termed a "well formed" XML document. To be a well formed XML document it must at least adhere to the rules listed below:

  • All XML documents must contain at least 1 (one) element.
  • The first element has to be defined in between the opening and closing tags
  • The whole XML document must be embraces between a unique pair of opening and closing tags to be valid. 
  • Nesting is also very important. Overlapping tags are not allowed. 
  • Each tag is identified by the "<" and ">" or angled brackets and the use of any other type of brackets is not allowed.
  • XML is case sensitive. This means that for example, <BOOK> is not the same as <book>. Furthermore, the XML spec defines that tags should be in lower case except in DTDs where keywords are usually used in upper case such as ELEMENT, ATTLIST, #IMPLIED, and #REQUIRED. Actually, when creating custom elements a mixture of character cases may be used, however it has the be ensures that the opening and closing elements use the same mixture precisely.
  • Empty tags have to be closed with a slash such as the <BR> tag which is a line break in HTML must be closed with a slash such as <br/>. 
  • Attributes must always be qouted (enclosed in qoutes)
Modern browsers are equipped with an XML parser or "translator" that is built in. If any of the above rules are ignored by the XML markup, the browser may not be able to parse the document properly and consequently raise an error condition. Special parser tester applications may be employed by the developer to check the XML syntax and that the rules above are being followed.The job of the parser is to read the XML file and look for the hierarchical tree structure that is inherent to all XML documents. 
There are many examples of XML parsers. The Xeres Java Parser may be used to check the integrity of business data in XML.  XT-XP and XP are both parsers written in the JAVA language and offer high performance and speed. XT is a set of tools for building program transformation systems
 XML parser for Java is also popular. This parser like others mentioned above has the advantage of being cross-platform and will run on any operating system. 


What is XML validation and what is the output?
Validation refers to checking if an XML document conforms to the standards of markup and syntax as specified by the XML spec. It also refers to the DTD (Document Type Definition) which is another set of rules that defines what the tags in an XML document mean. Since XML is extensible and customizable the definition in a DTD is required to make sense of the data contained. The DTD is important and must thus be included in any validation argument because it describes what is allowed in the structure of a document such as  names that can be used for elements types, the frequency an element may or must be used and the order of the elements. It also includes the rules on how elements may be nested. DTD also specified what attributes are used with which elements and is they can be omitted or not.

Valid XML elements must have their elements in the specified order, the first element being the “root” element. To be valid it must also include the correct DOCTYPE declaration which tells the parser about the document.

The DTD appears as part of the prolog of an XML document and can be put inside DOCTYPE declaration, which contains the name of the root element. Consequently, the name of the root element used in XML needs to be identical to the name specified in the DTD.  Using an external DTD allows the same DTD to be re-used in multiple XML documents and it is used as a guide in the validation of all the documents. This is of course preferable to repeating the DTD in every single XML file. It also offers substantial advantages in terms of streamlining the XML date.

Example calling and external DTD:

<?xml version="1.0"?>
<!DOCTYPE film SYSTEM "book.dtd">
<book>
    <title id="1">Cooking at Home</title>
    <genre>&COO;</genre>
    <year>2010</year>

    <title id="2">The Killer</title>
    <genre>&THR;</genre>
    <year>2000</year>
</book>

Example showing DTD in same document:

<?xml version="1.0"standalone= “yes”>
<!DOCTYPE book [
  
    <!ENTITY COO "Cookery">
    <!ENTITY THR "Thrillers">
   
    <!ELEMENT book(title+,genre,year)>
    <!ELEMENT title (#PCDATA)>
    <!ATTLIST title

     xml:lang NMTOKEN "EN"
     id ID #IMPLIED>
    <!ELEMENT genre (#PCDATA)>
    <!ELEMENT year (#PCDATA)>
    ]>

<book>
    <title id="1">Cooking at Home</title>
    <genre>&COO;</genre>
    <year>2010</year>

    <title id="2">The Killer</title>
    <genre>&THR;</genre>
    <year>2000</year>
</book>



How can an XML document be best presented?
XML does not include a styling system “out of the box” and thus the data contained in it, if displayed, will not look like much and will not make much sense to the viewer. Other languages have been created to satisfy this need such as XSL, XUL. XIML and the more well known CSS.

CSS is the most commonly used was to style XML and is widely used in web design today. CSS allows the developer to separate content from styling by creating style sheets which contain instructions about how the display system (usually a browser) will display the data. CSS can be re-used across many XML documents and is usually used to group the styling of multiple documents in one easily accessible place. CSS in its latest incarnation which is version 3 allows the developer to change the style and color of text and include images for example. It does have some shortcomings such as the exclusion of the use of simple arithmetic when specifying styles. This is usually achieved through the use of PHP. 

CSS also allows for styling through the use of classes. Classes allow the grouping of a number of elements by making them “belong” to the specific class. For example, a class may be created to display something in red and a font size of 15 pixels. Any element to which this class is applied will be rendered on the browser in the specified format. Elements can also be made unique through the use of the “id” element. 

Lists and tables can also be styled through CSS. The way how lists is fully customizable through CSS and table spacing and padding as well as colour of cells, rows and columns may also be specified. 

Hypertext links can also be styled in the same way as other elements through CSS and it supports various states of the link such as differentiating between hovering and visited.




What is the importance of the first line of an XML document?  
XML documents begin with an XML declaration which identifies the document as an XML document and specifies a version number such as version number is 1.0 which is the most commonly used. Example :


 <?xml version= “1.0”?>


This first line that includes the XML declaration, processing instructions and the “encoding” of the document is called a “Prolog”. The Encoding specified is specified as follows :


<?xml version= “1.0” encoding= “utf-8”?>.


The above will specify that the document is encoded in utf-8 (Unicode Transformation Format 8). Others can be specified such as utf-16 and ISO-8859-1.


When working with XML documents they have to be saved in the editor in the same encoding version that is specified in the document or there is a risk that applications such as browsers will not render the document correctly.


The Prolog of the XML document can also include the DTD declaration which specifies that the document also uses an external DTD which is the used to validate the document. At this point the “Standalone” attribute can also be specified so that the document can be parsed without referring to external sources.


Finally processing instructions may also be included in the Prolog of an XML document. These instructions will tell the application that will process the documents any other things that need to be done when working with the document.



What are the differences between elements and attributes and what are their different uses?


In XML elements are used to describe the data while attributes contain information that will be used to display the data and are therefore used in presentation. Attributes are usually included in the opening tags. Web developers are very used to this syntax in html, for example :

<table border=’1’>

<table> is the actual element while the border=’1’ is an attribute that tells the browser that the table must be shown with a border or 1 pixel.
There is a defined set of rules for elements and attributes in the DTD. Here are some of them :
The element declaration:

<!ELEMENT element-name (regular – expression)>

For example: <!ELEMENT book(author, publishdate, genre)>

Using the “pipe” symbol (vertical bar) specifies and OR condition. For example :

<!ELEMENT book(green| red| blue)>

Is specifying that the element book can be red, green or blue.

The asterisk * is used to specify zero or more such as :

 <!ELEMENT car (spoiler*)>

The above specifies that the car may have more than one spoiler or none at all.
The plus (+) symbol specifies that the element may occur once or more than once (one or more). 
For example :

<!ELEMENT car(spoiler+)>

A car can have one or more spoilers.


The question mark (?) can also be used and will specify “zero or one” which means that the element can occur once and once only or none at all.
Empty elements may also be specified which mean that valid XML document cannot include tags of this type:

<!ELEMENT tintedglass EMPTY>


Attribute in DTDs
Attributes are used in DTD to achieve the following :

·         Define default values
·         Define sets of allowed and valid values
·         Create references between elements
·         Define fixed values

These rules and definitions cannot be specified in the XML itself and this is what makes the DTD important.

The DTD is then used by the parser to validate the document. The attributes of an elements are declared in a single list using ATLIST:

<!–ATLIST element-name attribute-specification … attribute-specification>

Elements must be defined in the same DTD and more than one attribute can be specified in a single ATLIST element.
Attribute specification will have the form “name type value”  name where the name is a chosen attribute. Attribute names may only appear once in an attribute declaration, but the same attributes name can be used with different elements.
The CDATA attribute type is the most commonly used and specifies ‘character data’.
Attribute keywords specify whether an attribute is compulsory or required, optional or implied, or constant or fixed. Here is an example of a “required” or compulsory attribute :

<!–ATLIST author  type CDATA #REQUIRED>

What is AJAX and what are its purposes?

AJAX is well known for allowing web developers to create web pages parts of which can be updated with fresh information without having to refresh the whole page again which is usually the case with other web languages such as HTML. The AJAZ paradigm was developed in 2005 and originates from a tradition of client-side scripting languages such as Java Script, DHTML (Dynamic HTML) and JAVA Applets. 

The element that sets AJAX apart is the ‘XMLHttpRequest’. This element allows browsers to send requests to the server and get replies without warranting the user to confirm. The server may respond with an XML document or a simple stream of characters which can then be processed by client scripts and though the DOM (Document Object Model) be used to update the page. ‘XMLHttpRequest’ does in fact come with some security considerations against which browsers guard against. Cross domain requests are forbidden and sending requests to a domain other than to the domain that originated the page is not allowed to mitigate these risks.

Ajax can be considered as combination of:
  • A standards-based presentation using XHTML and CSS
  • Dynamic display and interaction using the document object model
  • Data interchange and manipulation using XML and XSLT (eXtensible Stylesheet Language Transformation)
  • Asynchronous data retrieval using ‘XMLHttpRequest’
  • JavaScript binding everything together

Ajax is written either by using Javascript to build Ajax code or special APIs can be sued such as the Google Ajax API amongst others such as JQuery which also includes a rich Ajax implementation framework.

Ajax also has some well known disadvantages such as its limited set of capabilities. It does not support multimedia for instance and interaction with hardware such as printers and web cameras. 

Perhaps the most obvious limitation is Ajax’s dependence on an internet connection. Since Ajax needs constant updates from the server it obviously does not work without a connection to the server.

Many times “browser side caching” has to be employed to ensure that an Ajax application does not “lock up” while it is waiting for data from the server. This could have serious performance implications if browser caching is not used extensively. 

Finally, a developer need to be conversant with Javascript to be able to develop sites in Ajax and for this reason Ajax is often considered as a second-tier programming language.





XML Introduction

Tuesday, 18 October 2011

About this blog
I did not start this blog specifically for the CMT3315 course work but instead decided to continue adding to the blog that we had to set up in the last module for the sake of continuation. 


1) What is the “X” in XML and what is its significance? 
The X in XML stands for eXtensible. The significance of XML being extensible means that it is flexible and is very customisable and can be changed to meet the requirements of the data structures it is being used to represent.

2) What is a markup language?
A system of embedded codes to make an electronic document display on a web page as required.


What is a mark-up language and how is it used? 
A "Mark-up Language" is the formalization of a set of rules that describes how information or data is to be the laid out, structured, and formatted. XML is a typical mark-up language in which special tags are used to "mark-up" or identify portions of text that has special meaning to the application which will use it.


3) What does SGML stand for?  
SGML stands for Standard Generalized Markup Language.

The significance of SGML
Most of today's internet technologies such as HTML and XML are based on SGML which certainly lends to SGML's significance. SGML defines the use of tags that are used to tag elements that will later be interpreted in different ways according to the context they are used in.

4) What is the relationship between SGML and XML? 
While both SGML and XML are mark-up languages, SGML is basically a standard that defined the data structure of a document while XML which is ultimately derived from SGML is

SGML, XML and HTML are all markup languages. SGML is a standard for data structure of a document. XML is derived from SGML for special purpose applications such as storing information about a music collection. With XML, you can specify your own tags as long as they are well structured.  HTML is similar to SGML and XML but it has limited tags and is used for web applications to display a web page on to a browser.




5) The relationship between SGML and  HTML
As stated above, HTML is an application or implementation of SGML. In practice HTML is SGML application that is used to design web pages for internet browser.




6) Which of the following statements are true?
 The answer is C that special purpose markup languages have been derived from XML. While RDF (Resource Definition Language) is an application of XML, XSL and XSLT are well known subsets of XML while XIML is and XML based based interface representation language.



7)   Which of the following statements DOES NOT apply to XML?  
The answer is B. As explained above XML is derived from SGML and is an implementation of it so it cannot possibly be it's predecessor. On the other hand, all the other statements are true. XML is in fact a set of rules for encoding of documents electronically and it does define the knowledge structure in an encoded document. The release dates of both languages is also telling, SGML has been around since the 1970s while XML was developed in 1996


8)   Which of the following statements DOES NOT apply to XML? 
At first glance, XML is certainly not written in JAVA which makes D the answer to this question. XML is written as text using a normal text file. Its "code" simply provides a method to encode documents electronically so it has nothing to do with Java whatsoever. The other suggested answers are all true. XML does underpin Office Applications and the WWW and does support the creation of new markup languages such as XSL and XSLT.


A simple XML example :
<?xml version="1.0" encoding="ISO-8859-1"?>
<book>
<author>George Orwell</author>
<name>Animal Farm</name>
<datepublished>1985</datepublished>
<isbn>78237652764</isbn>
</book>
 


9)   What does XIML stand for?
XIML stands for Extensible Interface Markup Language and therefore the answer is C. 


XIML was originally created to tackle the lack of standardized methods of representing interaction data. Interaction Data refers to the data that pertains to the user interfaces. XIML is a markup language based on XML, which is extensible ( hence the ‘X’, from ‘eXtensible’ ),

 


10) What is the purpose of XIML? 
The answer is A because as explained in 9 above XIML is a universal language for user interfaces, In fact it is an implementation of XML that explicitely deals with user interfaces.

XIML is a technology for interactive user interface development usually used in web design. It also provides with seamless multimedia integration. XIML enables the user to create a full-featured multimedia rich site without requiring the user to be a proficient programmer or be versed in Flash. Its simplicity aims to provide a non technical user with the ability to create complex multimedia projects on the Web.




11) What is XUL? 
The answer is B in that XUL is a user interface for XML.


Why is XUL needed?
XUL is Mozilla’s XML- based User Interface Language. XUL is used to enrich functionality of cross-platform applications. These applications can be executed while the user is connected to the internet and even if he is not. XUL allows these programs to be easily customized by adding alternative text, graphics, and presentation. This allows them to be branded and localized for specific markets. 


XUL inherently allows web developers that are already familiar with Dynamic HTML to become familiar with XUL quickly.  XUL is intuitively oriented towards familiar application objects such as labels, windows, and buttons rather than pages, heading levels, and hypertext links which we are accustomed to in HTML. 


XUL is based on W3C standards such as HTML 4.0, CSS, DOM and JavaScript. Another advantage of XUL is that it is portable and can run on any operating system. This platform independence gives it a clear edge. Because XUL offers separation of content from styling, the layout and appearance of XUL products can be altered without effecting the application business logic. This also allows easier localization for different languages and locales.



12) What is XSL? 
The answer is D as XSL is a styling language. XSL stands for  Extensible Stylesheet Language and as its name implies it is used for styling purposes.

Why is XSL needed?
XSL is used to specify stylesheets that will ultimately be used to display the data in an XML document in a specific and predefined way . XSL transforms XML documents (XSLT) and vocabulary for stating formatting semantics. XSL also offers the following :




  • Paging and scrolling
  • Selectors and tree constructors
  • An extended page layout model
  • A comprehensive area model
  • Internationalization and writing
  • Modes and linking.


13) Below is some XML. The last line is missing.  What should it be? 
The answer is B because in every XML document the proper XML syntax rules should be followed. Perhaps the most fundamental rule is being broken here, namely, that every XML tag should have a corresponding closing tag. In the example the <note> tag is missing a corresponding </note> tag and this is of course illegal in XML. 

<?xml version="1.0"?>
<note>
<to>Class</to>
<from>Ray</from>
<heading>Reminder</heading>
<body>Don't forget to complete your Blog!</body>



14) Below is some XML.  What is the missing line?
The answer is d. Again as in 13 above, a cardinal rule of XML is being broken because the <caption> tag  does not have a corresponding closing tag.
<?xmlversion="1.0"encoding='UTF-8'?>
<painting>
<imgsrc="madonna.jpg"alt='Foligno Madonna, by Raphael'/>
<caption>This is Raphael's "Foligno" Madonna, painted in
<date>1511</date>-<date>1512</date>
</painting>