Handling XML¶
This page provides an introduction into reading/writing XML effectively within Mantid. In Mantid, we use Poco to handling XML serialisation. A useful introductory presentation can be found here.
This page is a work in progress.
Examples¶
Parsing¶
#include <Poco/DOM/Document.h>
#include <Poco/DOM/DOMParser.h>
#include <Poco/DOM/Element.h>
Poco::XML::DOMParser pParser;
Poco::AutoPtr<Poco::XML::Document> pDoc;
try {
pDoc = pParser.parseString(cuboidStr);
} catch (...) {
// Handle the failure as appropriate
}
// Get pointer to root element
Poco::XML::Element *pRootElem = pDoc->documentElement();
Iterating over an element’s children¶
Poco::XML::Element *pRootElem = pDoc->documentElement();
//Iterate over every child node (non-recursively)
for (Node *pNode = pRootElem->firstChild(); pNode != 0; pNode = pNode->nextSibling()) {
auto pElem = dynamic_cast<Poco::XML::Element*>(pNode);
if(pElem) {
//We can now do something with this element safely
}
}
Inspecting an element¶
Poco::XML::Element *pElem;
//Reasonably quick operations
const std::string tag = pElem->tagName(); //for <foo>bar</foo> the tag is 'foo'
const std::string value = pElem->innerText(); //for the above example: 'bar'
Poco::XML::Node *pNext = pElem->nextSibling();
Poco::XML::Node *pPrev = pElem->previousSibling();
Poco::XML::Node *pChild = pElem->firstChild(); //avoid lastChild, it's expensive
Avoid NodeList¶
There are numerous functions that return a Poco::XML::NodeList
. You
should avoid using them and the list they return as best you can.
NodeList is a very inefficient container. Its item
method has a cost
equivalent to the value of i
given to it, and its length
method
a cost of n
, where n
is the number of nodes in the list.
This means that running the following is horrendously slow:
// NEVER DO THIS
Poco::AutoPtr<Poco::XML::NodeList> pElems = pElem->getElementsByTagName("foo");
for(int i = 0; i < pElems->length(); ++i) { // length costs N, and is called N times (N² cost)
Poco::XML::Node* pNode = pElems->item(i); // item costs at least i and is called N times, with i from 0 -> N-1 (N² + N cost)
}
// NEVER DO THIS
Even if the compiler is smart enough to optimise pElems->length()
to
a single call, we still have N² + N performance at best. Instead, we
should always use the iteration example given earlier, as that runs
in N time, instead of N².