Home · All Classes · Modules |
The QtXmlPatterns module implements PyQt's XQuery support. More...
An introduction to PyQt's XQuery support.
To import the module use, for example, the following statement:
from PyQt4 import QtXmlPatterns
XQuery is a pragmatic language that allows XML to be queried and created in fast, concise and safe ways.
<bibliography> { doc("library.xml")/bib/book[publisher = "Addison-Wesley" and @year > 1991]/ <book year="{@year}">{title}</book> } </bibliography>
The query opens the file library.xml, and for each book element that is a child of the top element bib, and whose attribute by name year is larger than 1991 and has Addison-Wesley as a publisher, it constructs a book element and attaches it to the parent element called bibliography.
XQuery is tailor made for selecting and aggregating information in safe and efficient ways. Hence, if an application selects and navigates data, XQuery could be a possible candidate for implementing that in a quick and bug-free manner. With QAbstractXmlNodeModel, these advantages are not constrained to XML files, but can be applied to other data as well.
Maybe XQuery can be summarized as follows:
On top of that the language is designed to be high level such that it is easy to analyze what the user is computing. With this, it is easier to optimize both speed and memory use of XML operations.
Evaluating queries can be done via an ordinary Qt C++ API and using a command line interface.
Applications that use QtXmlPatterns' classes need to be configured to be built against the QtXmlPatterns module. To include the definitions of the module's classes, use the following directive:
#include <QtXmlPatterns>
To link against the module, add this line to your qmake .pro file:
QT += xmlpatterns
QtXmlPatterns is part of the Qt Desktop Edition, Qt Open Source Edition and the Qt Console Edition. Note that QtXmlPatterns is disabled when building Qt, if exceptions are disabled or if a compiler that doesn't support member templates, such as MSVC 6, is used.
See QXmlQuery for how to use the C++ API.
A command line utility called xmlpatterns is installed and available like the other command line utilities such as moc or uic. It takes a single argument that is the filename of the query to execute:
xmlpatterns myQuery.xq
The query will be run and the output written to stdout.
Pass in the -help switch to get a brief description printed to the console, such as how to bind variables using the command line.
The command line utility's interface is stable for scripting, but descriptions and help messages are not designed for the purpose of automatic parsing, and can change in undefined ways in a future release of Qt.
See A Short Path to XQuery for a round of XQuery.
XQuery and Qt has different data models. All data in XQuery takes the form of sequences of items, where an item is either a node, or an atomic value. Atomic values are the primitives found in W3C XML Schema, and nodes are usual XML nodes, although they might represent other things using QXmlNodeModelIndex and QAbstractXmlNodeModel.
Atomic values, when not being serialized, are represented with QVariant. The mappings are as follows.
From XQuery | To Qt |
---|---|
xs:integer | QVariant.LongLong |
xs:string | QVariant.String |
xs:double | QVariant.Double |
xs:float | QVariant.Double |
xs:boolean | QVariant.Bool |
xs:decimal | QVariant.Double |
xs:hexBinary | QVariant.ByteArray |
xs:base64Binary | QVariant.ByteArray |
xs:time | Not supported because xs:time has a zone offset, and QTime does not. Use xs:dateTime, or convert the value to xs:string. |
xs:date | QVariant.DateTime |
xs:dateTime | QVariant.DateTime |
xs:gYear | QVariant.DateTime |
xs:gYearMonth | QVariant.DateTime |
xs:gMonthDay | QVariant.DateTime |
xs:gDay | QVariant.DateTime |
xs:gMonth | QVariant.DateTime |
xs:string* | QVariant.StringList |
xs:anyURI | QVariant.Url |
xs:untypedAtomic | QVariant.String |
xs:ENTITY | QVariant.String |
xs:QName | QXmlName. Note that the returned QXmlName can only be used with the QXmlQuery instance that it was created with. |
From Qt | To XQuery |
---|---|
QVariant.LongLong | xs:integer |
QVariant.Int | xs:integer |
QVariant.UInt | xs:nonNegativeInteger |
QVariant.ULongLong | xs:unsignedLong |
QVariant.String | xs:string |
QVariant.Double | xs:double |
QVariant.Bool | xs:boolean |
QVariant.Double | xs:decimal |
QVariant.ByteArray | xs:base64Binary |
QVariant.Date | xs:date. The QDate is assumed to be in timezone UTC. |
QVariant.Time | QTime cannot properly represent xs:time. Convert QTime to a QDateTime with a valid arbitrary date, and bind the time as a QDateTime instead. |
QVariant.DateTime | xs:dateTime |
QVariant.StringList | xs:string* |
QVariant.Url | xs:string |
QVariantList | A sequence of atomic values, whose type is the same as the first item in the QVariantList instance. If all the items in the QVariantList is not of the same type, behavior is undefined. |
Any other type | It is not supported and will either lead to undefined behavior, or an unexisting variable binding, depending on context. |
XQuery is a language designed for, and modeled on XML. However, it doesn't have to be constrained to that. By sub-classing QAbstractXmlNodeModel one can write queries on top of any data that can be modeled as XML.
By default when QtXmlPatterns is asked to open files or to produce content, this is done using an internal representation. For instance, in this query:
<result> <para>The following Acne removers have shipped, ordered by shipping date(oldest first):</para> { for $i in doc("myOrders.xml")/orders/order[@product = "Acme's Acne Remover"] order by xs:date($i/@shippingDate) descending return $i } </result>
an efficient internal representation is used for the file myOrders.xml. However, by sub-classing QAbstractXmlNodeModel one can write a query on any data, by mapping XML elements and attributes to the custom data model. For instance, one could write a QAbstractXmlNodeModel sub-class that mirrors the file system hierarchy like this:
<?xml version="1.0" encoding="UTF-8"?> <directory name="home"> <file name="myNote.txt" mimetype="text/plain" size="8" extension="txt" uri="file:///home/frans/myNote.txt"> <content asBase64Binary="TXkgTm90ZSE=" asStringFromUTF-8="My Note!"/> </file> <directory name="src"> ... </directory> ... </directory>
and hence have a convenient way to navigate the file system:
<html> <body> { $myRoot//file[@mimetype = 'text/xml' or @mimetype = 'application/xml'] / (if(doc-available(@uri)) then () else <p>Failed to parse file {@uri}.</p>) } </body> </html>
Converting a data model to XML(text) and then read it in with an XML tool has been one approach to this, but that has disadvantages such as being inefficient. The XML representation is separated from the actual data model, and that two representations needs to be maintained simultaneously in memory.
With QAbstractXmlNodeModel this conversion is not necessary, nor are two representation kept at the same time, since QXmlNodeModelIndex is a small, efficient, stack allocated value. Also, since the XQuery engine asks the QAbstractXmlNodeModel for the actual data, the model can create elements, attributes and data on demand, depending on what the query actually requests. For instance, in the file system model above, the model doesn't have to read in the whole file system or encoded the content of a file until it is actually asked for.
In other words, with QAbstractXmlNodeModel it's possible to have one data model, and then use the power of the XQuery language on top.
Some examples of possible data models could be:
The documentation for QAbstractXmlNodeModel has the details for implementing this.
Since QtXmlPatterns isn't constrained to XML but can use custom data directly, it turns XQuery into a mapping layer between different custom models or custom models and XML. Once QtXmlPatterns can understand the data, simple queries can be used to select in it, or to simply write it out as XML using QXmlQuery.serialize().
Consider a word processor application that needs to be able to import and export different formats. Instead of having to write C++ code that converts between the different formats, one writes a query that goes from on type of XML, such as MathML, to another XML format: the one for the document representation that the DocumentRepresentation class below exposes.
In the case of CSV files, which are text, a QAbstractXmlNodeModel sub-class is used again in order to expose the comma-separated file as XML, such that a query can operate on it.
XQuery is subject to query injection in the same manner that SQL is. If a query is constructed by concatenating strings where some of the strings are from user input, the query can be altered by carefully crafting malicious strings, unless they are properly escaped.
The best solution against these attacks is typically to never construct queries from user-written strings, but instead input the user's data using variable bindings. This avoids all query injection attacks.
See Avoid the dangers of XPath injection, Robi Sen or Blind XPath Injection, Amit Klein for deeper discussions.
QtXmlPatterns has, as all other systems, limits. Generally, these are not checked. This is not a problem for regular use, but it does mean that a malicious query can relatively easy be constructed that causes code to crash or to exercise undefined behavior.
QtXmlPatterns aims at being a conformant XQuery implementation. In addition to supporting minimal conformance, the serialization and full-axis features are supported. 97% of the tests in W3C's test suite for XQuery passes, as of this writing, and it is expected this will improve over time. Areas where conformance is not tip top and where behavior changes may happen in future releases are:
XML 1.0 and XML Namespaces 1.0 are supported, as opposed to the 1.1 versions. When strings are fed into the query using QStrings, the characters must be XML 1.0 characters. Otherwise, the behavior is undefined. This is not checked.
Since XPath 2.0 is a subset of XQuery 1.0, that is supported too.
The specifications discusses conformance further: XQuery 1.0: An XML Query Language. W3C's XQuery testing effort can be of interest as well, XML Query Test Suite.
Currently fn:collection() does not access any data set, and there is no API for providing data through the collection. As a result, evaluating fn:collection() returns the empty sequence. We hope to provide functionality for this in a future release of Qt.
When opening XML files, this is done with support for xml:id. In practice this means elements that has an attribute by name xml:id, can be looked up fairly quickly with the fn:id() function. See xml:id Version 1.0 for details.
Note: Only queries encoded in UTF-8 are supported.
When QtXmlPatterns attempts to load XML resources, such as via XQuery's fn:doc() function, the following schemes are supported:
Scheme Name | Description |
---|---|
file | Local files. |
data | The bytes are encoded in the URI itself. For instance, data:application/xml,%3Ce%2F%3E is <e/>. |
ftp | Resources retrieved via FTP. |
http | Resources retrieved via HTTP. |
https | Resources retrieved via HTTPS. This will succeed if no SSL errors are encountered. |
qrc | Qt Resource files. Expressing it as an empty scheme, :/..., is not supported. |
URIs are first passed to QAbstractUriResolver(see QXmlQuery.setUriResolver()) for possible rewrites.
PyQt 4.10.1 for MacOS | Copyright © Riverbank Computing Ltd and Nokia 2012 | Qt 4.8.4 |