Code Butchering: XML

Showing posts with label XML. Show all posts

Thursday, October 15, 2009

[.NET] XmlDocument VS XmlReader

To cut a long story short I recently learned (thanks to a co-worker) that XmlDocument performance sucks.

It's very handy to manipulate XML stuff with XmlDocument because it comes with all the DOM stuff, and for example you can use XPath for node selection like this:

// load up the xml doc
XmlDocument doc = new XmlDocument();
doc.Load(filename);

// get a list of all the nodes you need with xPath
XmlNodeList nodeList = doc.SelectNodes("root/whatever/nestedWhatever");

The above is quite cool and it works just fine if you're not loading hundreds of files for a total size of hundreds of MBs, in which case you'll notice a lethal blow to performance.

If you need speed you wanna go with XmlReader, a bare bones class that will scan the old way (forward only) element after element your XML file. Bad thing is that you won't have all the nice DOM stuff, so you'll have to manually parse elements retrieving innerText and/or attribute values. An example:

XmlReader xmlReader = XmlReader.Create(fileName);

while (xmlReader.Read())
{
//keep reading until we see my element
if (xmlReader.Name.Equals("myElementName") && (xmlReader.NodeType == XmlNodeType.Element))
{
  // get attributes (or innerText) from the Xml element here
  string whatever = xmlReader.GetAttribute("whatever");
  // do stuff
}
}

I can't be bothered benchmarking as it bores the sh*t out of me - but performance increases a whole lot with XmlReader, and if you want figures to look at you can find plenty on google (this guy here did a pretty good job for example) and here's another good overview of XML classes (from Scott Hanselman's blog).

Anyway - here comes the common sense advice - whatever you're doing go with XmlDocument, if it's too slow for you needs switch over to XmlReader and you'll be grand.

Monday, January 28, 2008

[JAVA] Easy management of XML content (Serializing, Deserializing, XSD ing, XSL ing)

problem: actually it is not a problem, it si more likely a collection of tips to manage a DOM document (using Xerces and Xalan open source library from Apache Group): parsing an XML file, serialization and deserialization, aplying an XSL or validating against a defined XSD. You don't need a degree to discover how to do it, but as you have thousands of thousands of classes inside that libraries, without Google I found it hard.

solution: ok, where is the beginning?
What we'll do first is a simple reading from a text file. Xerces gives you a simple way to open an XML file, by using:

 private Document parseXmlFromFile(String filePath){

 try {
   //get the factory
  DocumentBuilderFactory dbf =
    DocumentBuilderFactory.newInstance();
   //Using factory get an instance of document builder
  DocumentBuilder db = dbf.newDocumentBuilder();

  //parse using builder to get DOM representation 
   //of the XML file
  return db.parse(filePath);

 }catch(IOException ioe) {
  ioe.printStackTrace();
 }
}

Sometimes anyway you have only a string that contains your XML data (maybe got from an HTTP request or a webservice), and you just want to deserialize it into a Document object:


private Document deserialize(String xml)
  try{
      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

       dbf.setIgnoringComments(true);

dbf.setIgnoringElementContentWhitespace(true);

        DocumentBuilder db = dbf.newDocumentBuilder();

       StringReader reader = new StringReader(xml);

       InputSource source = new InputSource(reader);

        return db.parse(source);
 }
  catch(Exception e){

      e.printStackTrace();
 }
    }

Ok, I know, it's quite the same of the previous code; the only difference is that now you take your data from a StringReader object rather then from the path of the file. I also have added two properties to the builder, one that ignores comments and the other that ignores useless white spaces (for example formatting spaces).

And what if you want to obtain the string equivalent to the content of a Document? Don't be scared, there is a solution:


private String serialize(Document document){
         try {

        OutputFormat format = new OutputFormat(document);

StringWriter stringOut = new StringWriter();

        XMLSerializer serial = new XMLSerializer(stringOut, format);

        serial.asDOMSerializer();

        serial.serialize(document.getDocumentElement());

return stringOut.toString();

  } catch (Exception e) {

e.printStackTrace();


  }
 }

...and voilà your string is right there!

And what if you need to apply an XSL style sheet to your wonderfull XML ?
No way of trouble, just use this simple code:


public String transform(Document xml, String urlSchema){
    try{

         TransformerFactory tFactory = TransformerFactory.newInstance();

         Transformer transformer =
                        tFactory.newTransformer(new StreamSource(urlSchema));

         StringWriter writer = new  StringWriter();

transformer.transform(new DOMSource(xml),
                                new StreamResult(writer));

return writer.toString();
  }
    catch(Exception e){

   e.printStackTrace();
  }
 }

Ending dear fellow butcher, if you want to validate your XML data against a XSD definition, just copy&paste this code:


public boolean validate(String xml, String schemaURL) {
    try {

   String schemaLang = "http://www.w3.org/2001/XMLSchema";

   SchemaFactory factory = SchemaFactory.newInstance(schemaLang);

Schema schema = factory.newSchema(new StreamSource(schemaURL

));

Validator validator = schema.newValidator();

   StringReader reader = new StringReader(xml);

   validator.validate(new StreamSource(reader));

return true;
  } catch (Exception e) {

   e.printStackTrace();
     return false;
  }
 }

This method returns true/false depending of the result of the the validation process; moreover you can catch, outside the method, the exception (if it is a DOMParserException or a SAXParseException) to have an idea of why xml content is not valid.

I showed how to use DOM Classes to manage the whole XML content of a specific file/string, without discovering all the whys and becauses of each line; I'm pretty sure you can find several other ways to do what I coded but this fits my usual needs.

Wednesday, December 12, 2007

[Javascript] XML Loading

Problem: writing down a custom visual XML editor (based upon a specific and complex XSD schema definition), the first task was parsing XML string / remote XML file into a DOM compliant document object.

Solution: the first task is to recognize Explorer and Mozilla browser compliance. This can be surprisely done by checking browser's capability to handle XML content, that is to say wich objects the browser can use.
To achieve this target, I used those global variables:

var isIE = (window.ActiveXObject)?true:false;
var isMozilla = (document.implementation.createDocument)?true:false;

That means if you can instantiate an ActiveXObject, you are Explorer, while if you can execute che createDocument method you certanly are Mozilla compliant.
I have no documentation of the reason of those differences, exception made by the fact that Explorer fully operates with ActiveX objects).

Next step depends on the way you want to open XML data: from a file or from a string.
In the first case this is the code:

var xmlDoc  = null ;

function importXML(url)
{

 if (isMozilla)
 {
  xmlDoc=document.implementation.createDocument("","",null);
  xmlDoc.onload = onContentLoad;
 }
 else if (isIE)
 {
  xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
  xmlDoc.onreadystatechange = function () {
   if (xmlDoc.readyState == 4) onContentLoad()
  };
  }
 else
 {
  alert('Your browser can\'t handle this script');
  return;
 }
 xmlDoc.load(url);
}

function onContentLoad()
{
/* handle the loaded content */
}

As you can see I used a global variable "xmlDoc" wich is the DOM document: this is correct in a procedural coding style, but in my opinion would be better to put it into OOP style (so "xmlDoc" becomes a member of the class).
The function simply load different object is case of IE or Mozilla, and as the file is supposed to be in another domain - it can take several time to load - after the loading is called the "onContentLoad" function, in wich you do wathever you want to inizialize your application.

I used this method to load some XML templates or create a new XML document, useful to create new fragment of the defined document (insted of inserting XML logic into javascript code, is better to load a specific fragment, so if your schema definition changes, your javascript script is still effective).

I use a Java Servlet in my application, that aswers HTTP requests: one of this requests is a "loadSchema" request, with wich I receive in a string an entire XML document (wich itself describes the schema - not a XSD schema - of a specific application behavior): so the problem is to parse a string into an XML document.
This is the code:

function loadXML(xmlString)
{
if (isIE)
{
this.xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
this.xmlDoc.loadXML(xmlString);
}

else if (this.isMozilla) {
var domParser = new DOMParser();
this.xmlDoc = domParser.parseFromString(xmlString,"application/xml");
}

if(xmlDoc==null)
alert("Error.");
}

Code is quite simple, so no explanation is needed.

Ending, this is the code to handle a "GET" HTTP request via Javascript (off course there can be many ways to customize this code, there are several options, but guys...this is butchering!):

var xmlHttp = null;

function requestSchema(url)
{
xmlHttp=GetXmlHttpObject()
if (xmlHttp==null)
{
alert ("Browser does not support HTTP Request")
return
}
xmlHttp.onreadystatechange=stateChangedRequestSchema;
xmlHttp.open("GET",url,true);
xmlHttp.send(null);
}

function stateChangedRequestSchema()
{
if (xmlHttp.readyState==4 xmlHttp.readyState=="complete")
loadXML(xmlHttp.responseText);
}

The "requestSchema" simply makes the GET request, and delegates "stateChangedRequestSchema" to handle the load event (calls "loadXML" passing the content received by the request, that is supposed to be a valid XML document....but you have to write down the code to check if it is correct!).

Once loaded the content in the DOM object, you can use all DOM compliance methods to navigate into your document and do wathever you want. A good and quick reference to javascritp DOM reference can be found here.

You should not be surprised or highly shocked if handling XML content is so simple: HTML is an XML based language itself, so it would have been fucking idiot not to support natively XML parsing.

Bye Bye, hope to help you!