Welcome!

Open Source Authors: Liz McMillan, Maureen O'Gara, RealWire News Distribution, Jeremy Geelan, Reuven Cohen

Related Topics: Open Source

Open Source: Article

Mixing Data & Data Structures in an Object Database

An introduction to the trie

Similar code can be used to put more words into the trie. Notice, however, that this code doesn't create synonym connections. That happens with a bit of Java as follows:

...
td = thesaurusTrie.search("heat");
te = td.getData();
ps = te.getAPOSSynonym(PartsOfSpeech.NOUN);
ps.addSynonym(thesaurusTrie.search("warmth").getData());
ps.addSynonym(thesaurusTrie.search("glow").getData());
...

This code assumes that we already have the words "heat," "warmth," and "glow" stored in the trie and that they have NOUN entries associated. We locate the "heat" ThesaurusEntry and create connections from it to "warmth" and "glow."

Of course, we'll want connections pointing the other way too. So, we'd want code like:

...
td = myTrie.search("warmth");
te = td.getData();
ps = te.getAPOSSynonym(PartsOfSpeech.NOUN);
ps.addSynonym(myTrie.search("heat").getData());
...

and something similar for "glow." (Note that all this code could be considerably shortened if we dispensed with all the intermediate objects. We've shown them, however, to make the mechanics clearer.)

But, wait, this is all still in memory. How do we get it into our database? Actually, with db4o, nothing could be simpler. The code looks like this:

...
db = Db4o.openFile("thesaurus.yap");

... <code to create the trie here> ...

db.set(thesaurusTrie);
db.commit();
db.close();
...

We've left out the imports, and we've indicated where all the code goes to build the trie. But once the trie is built, the only method we have to call to put the trie in the database is db.set(). (We also commit() the transaction and close() the database, but the real workhorse is db.set().)

db4o implements persistence by reachability. That means that when we first store an object in the database with the set() method, db4o will crawl through our object's tree, locating all referenced objects, and make them persistent too. In other words, when we store thesaurusTrie, everything is references is stored too. dbo puts the whole kit and caboodle in the database for us in one shot.

Reading Our Thesaurus
When we wrote our thesaurus, we did so by building all the data structures in memory then storing them all in the database. We could, if we wanted, read everything into memory for searching the database. However, we may want to be more memory-frugal. We may want to only load those parts of the data structures we need to satisfy a search.

db4o lets us do that by controlling the "activation depth" of objects read from the database. Setting the activation depth tells db4o how deep into an object tree it should go when it retrieves objects from the database.

You can see how this work if you look at the code for the search() method that we have overloaded to work with the db4o database:

public TrieDnode search(String key,
      ObjectContainer db) {
TriePnode t;
TrieDnode d;
char c;
int index;

// Empty trie?
if((t=root)==null) return(null);

int slen = key.length();
for(int i=0; i<slen; i++) {
      c = key.charAt(i);
      if((index=t.isCharOnNode(c))==-1) return(null);
      if(i==slen-1) break;
      t = t.getPnodePointer(index);
      db.activate(t, 2);
}
d = t.getDnode(index);
db.activate(d, 2);
return(d);
}

The algorithm begins at the root and verifies that the first character of the word exists on the root node. If so, the algorithm fetches the TriePnode corresponding to that character's location in the node. Then the algorithm calls db.activate(t,2). This tells the database to fetch references at least two deep so that not only is the node itself fetched, but the content of the arrays in the node are fetched as well.

Similarly, after the call to getDnode() - which fetches the data node - we call db.activate(d, 2) to fetch the content of the TrieDnode's ThesaurusEntry.

With our database-enabled search algorithm in hand, we can now construct a simple routine to fetch the synonyms for a particular word.

...
td = myTrie.search(args[0],db);
if(td==null) {
      ... word not found ...
}

te = td.getData();
for(int i=0; i<td.numberOfPOSes(); i++) {
      ps = te.getIthPOSSynonym(i);
      db.activate(ps,2);
      System.out.println(strPOS(ps.getPOS()));
      for(int j=0; j<ps.numberOfSynonyms(); j++)
           System.out.println(ps.getSynonyms(j).getWord());
}
...

The search algorithm returns a TrieDnode. We use getData() to fetch the ThesaurusEntry then step through all the parts of speech for that. Finally, we step through all the synonyms for each part of speech and display the word. The result would look something like this:

c:\ SearchDatabase heat
NOUN
warmth
glow
VERB

Which shows that the word "heat" is recorded in the thesaurus as both a noun and a verb, and the noun form of the word is associated with the synonyms "warmth" and "glow." There are no synonyms associated (yet) with the verb form.

Persistent Thesaurus
Of course, the charm of Visual Thesaurus is its almost life-like user interface and the ease with which one can explore a network of related terms. We will have to leave that implementation to the reader.

Our goal was to illustrate how easily an object database could be used to persist the data structures behind such a UI. The really nice advantage of using an object database like db4o is the fact that the structure that we have defined in our classes is the same structure that exists in the database. We didn't have to write any translation code to move between our object structure and a relational representation. db4o's easy-to-understand API made our construction work that much easier.

More Stories By Rick Grehan

Rick Grehan is a QA engineer at Compuware's NuMega Labs, where he has worked on Java and .NET projects. He is also a contributing editor for InfoWorld Magazine. His articles have appeared in Embedded Systems Programming, EDN, The Microprocessor Report, BYTE, Computer Design, and other journals. He is also the coauthor of three books on computer programming.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.