Welcome!

Open Source Authors: Salvatore Genovese, Maureen O'Gara, Jeremy Geelan, Liz McMillan, Reuven Cohen

Related Topics: XML, Open Source

XML: Article

eXist - An Introduction To Open Source Native XML Database

I am going to introduce you to the open source, free (GNU LGPL license), native XML database eXist (www.exist-db-org)

Query4
Return quotation, separated by several unspecified level, matching headword "lime." This query can be expressed in XQuery as:

XQuery:
for $ent in /dictionary/e
where $ent//hw = "lime"
return
$ent//qt

Query4 results
<qt>notornis according to the thin notornis poach permanentl</qt>
<qt>dependencies x-ray thinly bold sheaves;daringly</qt>
<qt>ruthless, ironic sheaves mold silently fluffy patterns-carefully busy
dependencies through the careful, quick waters use within the sly dependencies;permanent,
busy decoys beside the bold T</qt>
<qt>multipliers poach ironically about the ironic multipliers;i</qt>
<qt>quiet waters toward the daring, fluffy braids belive without the boldly ironic
pains.ironic, silent frays engage ideas?idle forges at th</qt>
<qt>fluffy sauternes will dazzle finally-blithe realms upon the closely close
theodolites boost stealthly behind the forges-ideas might poach;theodolite</qt>
<qt>slow, daring epitaphs around the sly, ironic foxes shall have to x-ray bravely brave, stealth</qt>
...

Items found numbered 60. Compilation time is 16 msec. Execution time is 172 msec. In total, answering this query took about 0.2 seconds!

eXist Internals
eXist uses a path join algorithm for efficient query processing. It uses a numbering scheme for assigning a unique identifier to each node. This numbering scheme provides the information on the structural relationship between nodes such as parent-child and ancestor-descendant relationships. Any two nodes can be tested for these relationships. The example below demonstrates how to use a path join algorithm.

Explanation of (/dictionary/e[//hw = "flower"]//qt ) Using Path Join Algorithm

  1. Perform an index lookup for the "dictionary" root element, followed by an index lookup for "e" elements. Use the path join algorithm on "dictionary" and "e" unique identifiers (using the parent-child relationship). The results of this step are a series of unique "e" identifiers that are children of "dictionary."
  2. Perform an index lookup for "hw" and select all identifiers from the result set whose text value is "flower" (alternatively, it's possible to check for hw='flower' at the end).
  3. Run the path join algorithm on the "e" identifiers from step 1 and the "hw" identifiers from step 2. The result set will contain "e" identifiers that have an ancestor-descendant relationship with the set of "hw" identifiers.
  4. Perform an index lookup on "qt." Run the path join algorithm on the result set of the node identifiers and the "e" identifiers from step 3 (the nodes must have an ancestor-descendant relationship). Any identifiers that remain represent nodes that satisfy the original query.
Storing Binary Data
eXist can also store binary resources in addition to XML files. Most of the native XML databases can store only XML. In real life we have XML and also non-XML data. Being able to store binary data in eXist can be very handy. For example, the popular image format JPEG 2000 can have several XML boxes, which are used to store metadata. Depending on business needs, a developer may want to extract the XML data from a JPEG 2000 image, and store the XML and the remaining non-XML data separately but in the same database. A major advantage of this approach is the time efficiency of querying the XML data of an associated image. Extraction of the XML from a JPEG 2000 image is done only once compared to each time a new query comes. By providing a binary data storage feature, eXist makes this procedure possible.

Last Note
Keep in mind that eXist is an evolving product. There are ongoing improvements and bug fixes. Learn about these by checking the eXist home page at www.exist-db.org. Wolfgang Meier founded this project in late 2000. There are many contributing developers.

Acknowledgements
I would like to thank to Saaid Baraty and Glenn Hoffman for providing helpful suggestions and useful comments.

References

More Stories By Selim Mimaroglu

Selim Mimaroglu is a PhD candidate in computer science at the University of Massachusetts in Boston. He holds an MS in computer science from that school and has a BS in electrical engineering.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
XML News Desk 11/29/05 07:22:28 PM EST

eXist - An Introduction To Open Source Native XML Database. In this article I am going to introduce you to the open source, free (GNU LGPL license), native XML database eXist (www.exist-db-org). Data is important, no question about it. Data that can't be queried is not very useful. Users expect to have good query response time. From my personal experience and testing, I am confident in saying that eXist is a fairly good database. It has very good query response time, it is very user friendly, it's easy to set up and operate, and it's written in Java, therefore it is platform independent.