| By Rick Grehan | Article Rating: |
|
| May 9, 2007 11:00 AM EDT | Reads: |
14,324 |
Similar code can be used to put more words into the trie. Notice, however, that this code doesn't create synonym connections. That happens with a bit of Java as follows:
...
td = thesaurusTrie.search("heat");
te = td.getData();
ps = te.getAPOSSynonym(PartsOfSpeech.NOUN);
ps.addSynonym(thesaurusTrie.search("warmth").getData());
ps.addSynonym(thesaurusTrie.search("glow").getData());
...
This code assumes that we already have the words "heat," "warmth," and "glow" stored in the trie and that they have NOUN entries associated. We locate the "heat" ThesaurusEntry and create connections from it to "warmth" and "glow."
Of course, we'll want connections pointing the other way too. So, we'd want code like:
...
td = myTrie.search("warmth");
te = td.getData();
ps = te.getAPOSSynonym(PartsOfSpeech.NOUN);
ps.addSynonym(myTrie.search("heat").getData());
...
and something similar for "glow." (Note that all this code could be considerably shortened if we dispensed with all the intermediate objects. We've shown them, however, to make the mechanics clearer.)
But, wait, this is all still in memory. How do we get it into our database? Actually, with db4o, nothing could be simpler. The code looks like this:
...
db = Db4o.openFile("thesaurus.yap");
... <code to create the trie here> ...
db.set(thesaurusTrie);
db.commit();
db.close();
...
We've left out the imports, and we've indicated where all the code goes to build the trie. But once the trie is built, the only method we have to call to put the trie in the database is db.set(). (We also commit() the transaction and close() the database, but the real workhorse is db.set().)
db4o implements persistence by reachability. That means that when we first store an object in the database with the set() method, db4o will crawl through our object's tree, locating all referenced objects, and make them persistent too. In other words, when we store thesaurusTrie, everything is references is stored too. dbo puts the whole kit and caboodle in the database for us in one shot.
Reading Our Thesaurus
When we wrote our thesaurus,
we did so by building all the data structures in memory then storing
them all in the database. We could, if we wanted, read everything into
memory for searching the database. However, we may want to be more
memory-frugal. We may want to only load those parts of the data
structures we need to satisfy a search.
db4o lets us do that by controlling the "activation depth" of objects read from the database. Setting the activation depth tells db4o how deep into an object tree it should go when it retrieves objects from the database.
You can see how this work if you look at the code for the search() method that we have overloaded to work with the db4o database:
public TrieDnode search(String key,
ObjectContainer db) {
TriePnode t;
TrieDnode d;
char c;
int index;
// Empty trie?
if((t=root)==null) return(null);
int slen = key.length();
for(int i=0; i<slen; i++) {
c = key.charAt(i);
if((index=t.isCharOnNode(c))==-1) return(null);
if(i==slen-1) break;
t = t.getPnodePointer(index);
db.activate(t, 2);
}
d = t.getDnode(index);
db.activate(d, 2);
return(d);
}
The algorithm begins at the root and verifies that the first character of the word exists on the root node. If so, the algorithm fetches the TriePnode corresponding to that character's location in the node. Then the algorithm calls db.activate(t,2). This tells the database to fetch references at least two deep so that not only is the node itself fetched, but the content of the arrays in the node are fetched as well.
Similarly, after the call to getDnode() - which fetches the data node - we call db.activate(d, 2) to fetch the content of the TrieDnode's ThesaurusEntry.
With our database-enabled search algorithm in hand, we can now construct a simple routine to fetch the synonyms for a particular word.
...
td = myTrie.search(args[0],db);
if(td==null) {
... word not found ...
}
te = td.getData();
for(int i=0; i<td.numberOfPOSes(); i++) {
ps = te.getIthPOSSynonym(i);
db.activate(ps,2);
System.out.println(strPOS(ps.getPOS()));
for(int j=0; j<ps.numberOfSynonyms(); j++)
System.out.println(ps.getSynonyms(j).getWord());
}
...
The search algorithm returns a TrieDnode. We use getData() to fetch the ThesaurusEntry then step through all the parts of speech for that. Finally, we step through all the synonyms for each part of speech and display the word. The result would look something like this:
c:\ SearchDatabase heat
NOUN
warmth
glow
VERB
Which shows that the word "heat" is recorded in the thesaurus as both a noun and a verb, and the noun form of the word is associated with the synonyms "warmth" and "glow." There are no synonyms associated (yet) with the verb form.
Persistent Thesaurus
Of course, the charm of
Visual Thesaurus is its almost life-like user interface and the ease
with which one can explore a network of related terms. We will have to
leave that implementation to the reader.
Our goal was to illustrate how easily an object database could be used to persist the data structures behind such a UI. The really nice advantage of using an object database like db4o is the fact that the structure that we have defined in our classes is the same structure that exists in the database. We didn't have to write any translation code to move between our object structure and a relational representation. db4o's easy-to-understand API made our construction work that much easier.
Published May 9, 2007 Reads 14,324
Copyright © 2007 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Rick Grehan
Rick Grehan is a QA engineer at Compuware's NuMega Labs, where he has worked on Java and .NET projects. He is also a contributing editor for InfoWorld Magazine. His articles have appeared in Embedded Systems Programming, EDN, The Microprocessor Report, BYTE, Computer Design, and other journals. He is also the coauthor of three books on computer programming.
- 4th International Cloud Computing Conference & Expo Starts Today
- Publishing Synergy: Blog, Twitter and Ulitzer
- Performance Tuning Essentials for Java
- Cloud Expo New York Call for Papers Deadline December 15
- Google Wave
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- Cloud Computing Can Revitalize Your Career as Software Developer
- SOA World Magazine "Readers' Choice Awards" Voting Is Now Open
- Oracle+MySQL Opponents Take to the Barricades
- Virtualization Expo Call for Papers Deadline December 15
- Oracle Faces Growing Price for MySQL
- SpringSource Moving to Spring 3.0
- 4th International Cloud Computing Conference & Expo Starts Today
- Deputy CIO of the CIA to Keynote 1st Annual GovIT Expo
- Publishing Synergy: Blog, Twitter and Ulitzer
- Performance Tuning Essentials for Java
- Cloud Expo New York Call for Papers Deadline December 15
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Google Wave
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- Cloud Computing Can Revitalize Your Career as Software Developer
- Oracle-Sun: IBM Reportedly Behind Delay
- Citrix Aims To Cripple VMware’s Cloud Designs
- Oracle Trashes HP Relationship for Sun
- After Ubuntu, Windows Looks Increasingly Bad, Increasingly Archaic, Increasingly Unfriendly
- SCO CEO Posts Open Letter to the Open Source Community
- Simula Labs Launches Hosted Delivery Platform To Enable Enterprise Open Source Adoption
- Where Are RIA Technologies Headed in 2008?
- Source Claims SCO Will Sue Google
- How Open Is "Open"? – Industry Luminaries Join the Debate
- Latest SCO News is Plain Weird
- IBM Tells SCO Court It Can't Find AIX-on-Power Code
- SCO Claims Linux Lifted ELF
- Flashback: Investing in 'Professional Open Source' - Exclusive 2004 Interview with David Skok, Matrix Partners
- HP Starts Pushing Desktop Linux
- Linux Business Week Exclusive: Linux Kernel To Be Re-Written To Counter Microsoft FUD





























