XML Part 6

Discuss the roles of entities in XML, giving examples what problems can occur when they are not used properly.

In XML Entities are declarations that are intended to specify references to values. Entities are advantageous because they are used to replace frequently typed (or included) text. Usually and entity is defined to contain text and during rendering this text is replaced as specified thus ensuring that the same text appears and also making the data shorter. The parser will also check that the entity names are spelled correctly as specified.

Entities may be :

Replacement text or parsed
Un-parsed which may be or may not be replacement text
General entities which are only used in the context of the document itself
Parameter entities which are parsed entities used within the DTD.

Entity declarations can also be used by more than one XML file, thus further adding to the advantages outlined above. Using entities encourages the re-use of text and is often used in defining standardized warning messages and also in applications that require language and locale customizations. When this paradigm is implemented on a large scale some sort of “Entity Management System” would need to be employed to keep things clear. XML itself does not include such a system and it would have to be added on.

Entity references are applied in XML documents by using a special character “the ampersand” or “&” followed by a semicolon (;). Entities must have unique names and when using external entities, the SYSTEM keyword is used to identify and locate a file.

Optionally the “PUBLIC” keyword is used in addition to the SYSTEM keyword. Below is an example of an entity called ‘whc’ referring to the internal value ‘WebHomeCover.com’ and external entity reference in XML document:

<!ENTITY bgc “borggardencenter.com”>

<companyName>&bgc;</companyName>

One of the problems with XML is that it already uses a number of characters from the standard character set in its syntax such as “<” and “>” and the ampersand symbol itself. These are used in defining the XML itself and the parser will of course interpret them as part of it. To use these special characters as part of the data an entity is used. HTML web developers are very aware of the pre-defined entities in HTML that cover for commonly used symbols such as :

< Left angle bracket (<)

> Right angle bracket (>)

' Single quote character (')

" Double quote character (")

& Ampersand (&)

Many times characters that are not accessible through the keyboard need to be included in documents. These special cases are special cases of entity references which refer to characters in the UNICODE reference.

These special references can either be made through a decimal notation such as &#2513 or as in hexadecimal notation such as &#413E.

As already mentioned, external entities also help in reducing the size of documents and also in separating the document into logical divisions which are stored as separate external entities. This makes the data more manageable and easier to handle. This feature thus provides a mechanism that is used to creat re-usable components that can be used across multiple documents.

External Binary data such as that which is used to render images can also be included as an external entity reference. The NDATA keyword is used to denote such data and to instruct the parser to render them as whatever they are intended to be. NDATA stands for Non-Parsable Data which indicates that it should not be handled by the parse as standard textual data. Example :

<!ENTITY logo SYSTEM “mylogo.gif” NDATA gif>

Entities also aid the construction of a DTD and as many as required can be specified in a DTD.

Example with an internal DTD

<?xml version="1.0" standalone="yes" ?>

<!DOCTYPE publisher [

<!ELEMENT publisher (#PCDATA)>

<!ENTITY hc "Harper Collins">

<!ENTITY mp “Macmillan Publishers”>

<!ENTITY yp “Yale Publishers”>

]>

Corresponding XML :

<publisher>&hc;</ publisher >

< publisher >&mp;</ publisher >

< publisher >&yp;</ publisher >

Example of referencing External entity files :

<?xml version="1.0" ?>

<!DOCTYPE publisher [

<!ELEMENT publisher (#PCDATA)>

<!ENTITY hc SYSTEM "data1.xml">

<!ENTITY mp SYSTEM “data2.xml”>

<!ENTITY yp SYSTEM “data3.xml”>

]>

< publisher >&hc;</ publisher >

< publisher >&mp;</ publisher >

< publisher >&yp;</ publisher >

Review the correct use of character sets in XML, discussing the advantages and disadvantages of different sets, with examples. How would you select your character sets to present Chinese characters?

In XML character sets specify which characters are permitted in the XML document and are mainly two types, “broad” or “restrictive”. Restrictive character sets, for example, would typically be used to restrict the text in a document to upper case. On the other hand a “broad” character set would be used to include many characters such as Arabic notation and other non-roman character notations.

ASCII

The most well known and widely used character set is ASCII where each character is represented by a “character encoding value”. In ASCII the character code value for an capital "A" would be 65, and for “B” it would we 66 and so on. ASCII is based on a 7-bit encoding scheme which means that only 128 different values are possible and hence 128 characters. ANSI extends this limitation by using 8 bits instead of 7 and therefore providing for 256 different characters. ASCII does not support languages that contain non-European alphabets such as Cryllic and Arabic which is perhaps its biggest limitation. It also does not support a great deal of symbols such as mathematical and technical symbols. On the other hand ASCII does have its advantages and it’s widespread use is probably its biggest advantage together with the fact that many documents that are already in electronic archives are written in ASCII. Its lack of proper symbols for technical works makes it unsuitable but it does include the basic arithmetic symbols which are enough for most non-technical documents.

Unicode

Unicode is the solution to the problems experiences with ASCII, namely the limited number of characters that are included. Unicode is the preferred character set for XML and includes enough characters to cover the world’s languages and alphabets. The two most widely used encoding schemes for Unicode are UTF-8, and UTF-16. UTF-8 uses 8 bits, and is compatible with 7-bit ASCII. UTF-8 is able to represent other characters using two or more byte combinations.

Unicode covers over 107, 000 characters in over than 90 scripts. These include European, Asian, American, African and even some languages which are no longer spoken.

Unicode was designed to be compatible with the widely used ASCII and the first 256 characters are identical to ISO-8859-1 of which 128 characters are the same as in ASCII. This compatibility ensures easy conversion of documents from ASCII to Unicode.

Unicode also has full support for a very large number of mathematical, technical, scientific, cultural and artistic expressions.

The main disadvantage with Unicode if it can be classified as a disadvantage is mainly its late adoption. It cannot be used in Database object names such as field names, for example.

UTF-8 encoding supports both simplified and traditional Chinese text. Special “meta” tags are used in the document to specify that UTF-8 encoding is going to be used such as :

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

It is also possible to use GB encoding to display Chinese characters although this is mostly used to display simplified Chinese characters. In this case the “meta” tag would be :

<meta http-equiv="Content-Type" content="text/html; charset=gb18030"/>

Finally Chinese text may also be displayed using “Big5” encoding but this is usually used for Traditional Chinese. In this case the “meta” tag would be :

<meta http-equiv="Content-Type" content="text/html; charset=big5"/>

Describe and discuss the logical modelling of data in the context of XML.

The Logical Data Model (LDM) in XML deals with the actual implementation of a conceptual module in a database and represents the normalized design of the common data model in information systems. The LDM refers to the representation of the data in a particular organization and to managing the enterprise data in the information systems of that organisation.

There are many several of the LDM, namely, the relational model, the object oriented model and the Extensible Markup Language (XML) model. The relational model defines the data model in terms of traditional rows and columns and allows for the definition of relationships between tables. The object oriented model defines data in terms of classes and objects which have attributes and associations. Finally, the XML model represents data in terms of tags, attributes and elements and is quickly becoming the selected way to build components information systems.

Using XML it is possible to model information system in natural and intuitive ways. This paradigm attempts to ensure that time and effort is spent on doing what is needed to be done rather than concentrate on how to do it.

XML supports “Heterogeneity” where each "record" can include different data fields which is a considerable advantage as in a real scenario data is not organized into tables, rows, and columns. This allows the data to be displayed as it is in the real world and does away with many of the restrictions attributed to traditional database systems. Coupled with this, the XML data model also supports Extensibility where new types of data can be added at will and don't need to be thought out beforehand. Again, this is an advantage over traditional systems.

The advantage of using XML’s Logical Data Model is that the model is inherently self-describing which means that applications can be made to automatically re-build themselves according to the data. XML can be considered as a universal information structuring which in itself contains the information to build the data structures, thus doing away with separate database design mechanisms.

In order to effectively model information using XML natural pattern identification has to be made.

What is the role of namespaces in XML? What benefits does the use of namespaces confer and what problems may be avoided?

In XML, from one master schema document, Namespaces are used to include references to other schemas as required by the application. This encourages and allows the use of several schemas within an XML document.
Namespaces are collections of names of elements, element types, and attributes in a schema. This paradigm presents the possibility that collisions may occur because, for example, an element in one schema may have the same name as an element in another scheme resulting in a conflict. When one considers that there may be many elements in different schemas the possibility of this occurring is a real danger. For this reason Namespaces are also referred to as “vocabularies” as the contain a collection of names and definitions just like a traditional vocabulary.

With namespaces, prefixes may be added to associate names with schemas and thus making them unique. This will enable elements, types, and attributes to be referenced from a particular schema precisely and without room for confusion. Names from XML schema documents usually employ the “xs:” or “xsd:” prefix.

The example below shows how components from other bookstore schemas can be included in the XML document by first identifying the namespaces and defining prefixes for those namespaces in the schema element of the schema document :

<xml version=”1.0” encoding=”UTF-8”?>
<xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema”
targetNamespace=”http://xmlfd.com/ns/bookstore”
xmlns:ob=”http://www.oregonbooksellers.com/oregonBooks”
xmlns:nwb=”http://www.northwestbooks.net/new”
xmlns:fic=”http://www.fictionwriters.org/fiction”
elementFormDefault=”qualified”>

Source : XML for Dummies by Wiley Publishing

The xsd:import elements are then added to indicate where the schema documents for
these namespaces can be found, as shown below :

<xsd:import namespace=”http://www.oregonbooksellers.com/oregonBooks”
schemaLocation=”books.xsd”/>
<xsd:import namespace=”http://www.northwestbooks.net/new”
schemaLocation=”newbooks.xsd”>
<xsd:import namespace=”http://www.fictionwriters.org/fiction”
schemaLocation=”fiction.xsd”>
<xsd:element name=”books”>

Source : XML for Dummies by Wiley Publishing

The first line of the schema element declares the default namespace for the schema or the XML Schema namespace and associates the xsd: prefix with this namespace so that names such as schema, element, attribute, as defined by the XML Schema specification can be used:

xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema”

Secondly on the second line namespace is being created for the elements defined in this schema document. By specifying a target namespace we are allowing the use these components in other schema documents as follows :

targetNamespace=”http://xmlfd.com/ns/bookstore”

The next three lines in the schema element associate prefixes with additional namespaces, so they can be used as components from those schemas in the schema document:

xmlns:ob=”http://www.oregonbooksellers.com/books.xsd”
xmlns:nwb=”http://www.northwestbooks.net/newbooks.xsd”
xmlns:fic=”http://www.fictionwriters.org/fiction.xsd”>

By default only Global Elements which are children of the schema element are associated with the target namespace.Local elements may be added to the target namespace as follows :

elementFormDefault=”qualified”

Finally, the next three lines in the schema document point to the location of the external schemas associated with the three imported namespaces:

<xsd:import namespace=”http://www.oregonbooksellers.com/books.xsd”
schemaLocation=”books.xsd”/>
<xsd:import namespace=”http://www.northwestbooks.net/newbooks.xsd”
schemaLocation=”newbooks.xsd”>
<xsd:import namespace=”http://www.fictionwriters.org/fiction.xsd”
schemaLocation=”fiction.xsd”>

The prefix is the included to access elements from the other namespaces as follows

<xsd:element name=”book”>
<xsd:complexType>
<xsd:sequence maxOccurs=”unbounded”>
<xsd:element ref=”author”/>
<xsd:element ref=”ob:title”/>
<xsd:element ref=”nwb:publisher”/>
<xsd:element ref=”fic:price”/>
<!–xsd:sequence>
<!–xsd:complexType>
<!–xsd:element>
<xsd:element name=”author” type=”xsd:string”/>

Examples are sourced from : XML for Dummies by Wiley Publishing

What is Xpath and what does it do? Evaluate the strengths and weaknesses of this concept, with examples.

Xpath looks at an XML document as a hierarchy of nodes or trees. Its primary purpose is to use a path notation that helps to navigate through the tree structure of the document. XPath is a major element in W3C’s XSLT standard – and XQuery and XPointer are both built on XPath expressions. (http://www.w3schools.com/xpath/default.asp)

Xpath, by definition looks at the structure of an XML document rather than its surface syntax. This logical structure, known as the data model, is defined in [XQuery 1.0 and XPath 2.0 Data Model (Second Edition)].] and is designed to be embedded in a host language such as XSL Transformations (XSLT) or XQuery.

As an XML Query Language XPath has a subset that can be used for testing whether or not a node matches a pattern and this use of XPath is described in XSL Transformations (XSLT).

Xpath is concise, simple while at the same time powerful and was specifically designed for XML. Xpath provides a single homogenous syntax and queries are compact and easy to read, key-in and understand. Furthermore, queries can be easily embedded in applications, scripts, and XML and their inherent simplicity especially for the commonly used queries, makes the queries easily and quickly parsed. With Xpath you can uniquely identify any node in an XML document and equally easily specify any path that can occur in an XML document and any set of conditions for the nodes in the path.

Xpath queries do not return repeated nodes and query conditions can be evaluated at any level of a document and are not expected to navigate from the top node of a document. Xpath also encourages queries to be declarative rather than procedural which means that emphasis is made on “what should be found” rather than “how to find it”. Xpath is also context-independent which means that it can be used in many contexts.

Xpath also has some disadvantages, the most notable of perhaps the extra resources requires both in terms of parses processing time as well as adding another learning curve for developers to climb. Cross browser compatibility and version dependency issues also arise and can cause problems.

My programming blog while studying for my first BSc degree.

Pages

Welcome

XML Part 6

Tuesday, 22 November 2011

0 comments:

Post a Comment

Blog Archive

My favourite haunts

Web Development News

About Me