Welcome!

Open Source Authors: Maureen O'Gara, Jeremy Geelan, Liz McMillan, Reuven Cohen, Lavenya Dilip

Related Topics: Open Source

Open Source: Article

Mixing Data & Data Structures in an Object Database

An introduction to the trie

The trie is anchored by its root node. So, to locate a word in the trie you search for the first letter in the root. If you find the letter, you follow the corresponding node pointer to the child node. On the child, you look for the second letter in the word. If found you move deeper in the tree.

The search process terminates in one of three ways.

  • You reach the end of the word, and the data-pointer references the word you're looking for.
  • You read the end of the word, and the data-pointer is null. In that case, the word is not in the trie.
  • You encounter a node whose node-pointer is null, but you're not done with the word. Again, the word is not in the trie.
Our implementation of a trie is composed of three classes. The first, the trie class, is the outermost encompassing class. The "outside world" interacts directly with objects of the Trie class, which encapsulates all the mechanisms needed to manage the trie.

The trie class includes a single data member:

private TriePnode root;

This root member is the node at which all searches begin. To be precise, the root node will hold all the first letters of all the words in the trie. The root node reference is of class TriePnode, which looks like this:

public class TriePnode
{
      private char[] chars;
      private TriePnode[] pnodes;
      private TrieDnode[] dnodes;

      ... TriePnode methods ...

}

The body of the trie structure is made up of TriePnode objects. Each TriePnode carries the array of characters that hold the letters of the words (as shown in Figure 1), the pnodes array (the pointers "down" into the lower levels of the trie), and the dnodes array (the references to the actual data objects).

Finally, the leaves of the trie are TrieDnode objects. The TrieDnode class is more or less a wrapper that includes a ThesaurusEntry item as its single member.

So now, we have all the pieces that we need to build our Thesaurus. We have the structures that hold the words and connections to the synonyms, and we have the structures that support searching.

Time for Persistence
We now have all the pieces needed to build our thesaurus: structures to hold the words, connections among synonyms, and a trie for locating words. Now what we need is someplace to put everything. This is where our object database, db4o, steps on the stage.

db4o is an open source object database from db4objects available for downloading at www.db4objects.com. There are two versions of db4o, one for Java, and another for the .NET platform. (A Mono version is also available, though it's largely identical to the .NET version.) db4o can be used either as an embedded database library that executes in the same process space as your application, or in client/server form for remote or multi-user access to a database server. We'll be using db4o in its "monolithic" form (part of a single application).

db4o will do more for us than simply serve as a persistence mechanism for our thesaurus. It will give us everything we'll need to get all our classes into a database in the first place. Put another way, we can "bootstrap" our data into the database using db4o's API then let db4o slip into the background as we use the navigation capabilities we've built into trie and ThesaurusEntry to fetch words and synonyms.

We'll be adding words "the hard way" - i.e., using hard coding - so you can see how the class methods work. Obviously, if we're going to add a lot of words, we'd write code to read a vocabulary from a file.

So, to add an entry for the word "heat" as both a noun and a transitive verb, our code would look like this:

ThesaurusEntry te;
POSSynonyms ps;
TrieDnode te;
Trie thesaurusTrie;
...
te = new ThesaurusEntry("heat");
ps = new POSSynonyms(PartsOfSpeech.NOUN);
te.addPOSSynonym(ps);
ps = new POSSynonyms(PartsOfSpeech.VERT_T);
te.addPOSSynonym(ps);
td = new TrieDnode(te);
thesaurusTrie.insert(te.getWord(), td);
...

The code should be pretty easy to follow. First, we create a new ThesaurusEntry for the word "heat." Next, we create POSSynonym objects for noun and transitive verb. (PartsOfSpeech is a helper class we defined to enumerate the parts of speech.) The POSSynonym objects are added to the ThesaurusEntry, which is encapsulated in a TrieDnode. Finally, we insert the word into the trie, with the ThesaurusEntry node associated.


More Stories By Rick Grehan

Rick Grehan is a QA engineer at Compuware's NuMega Labs, where he has worked on Java and .NET projects. He is also a contributing editor for InfoWorld Magazine. His articles have appeared in Embedded Systems Programming, EDN, The Microprocessor Report, BYTE, Computer Design, and other journals. He is also the coauthor of three books on computer programming.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.