| By Rick Grehan | Article Rating: |
|
| May 9, 2007 11:00 AM EDT | Reads: |
14,323 |
The trie is anchored by its root node. So, to locate a word in the trie you search for the first letter in the root. If you find the letter, you follow the corresponding node pointer to the child node. On the child, you look for the second letter in the word. If found you move deeper in the tree.
The search process terminates in one of three ways.
- You reach the end of the word, and the data-pointer references the word you're looking for.
- You read the end of the word, and the data-pointer is null. In that case, the word is not in the trie.
- You encounter a node whose node-pointer is null, but you're not done with the word. Again, the word is not in the trie.
The trie class includes a single data member:
private TriePnode root;
This root member is the node at which all searches begin. To be precise, the root node will hold all the first letters of all the words in the trie. The root node reference is of class TriePnode, which looks like this:
public class TriePnode
{
private char[] chars;
private TriePnode[] pnodes;
private TrieDnode[] dnodes;
... TriePnode methods ...
}
The body of the trie structure is made up of TriePnode objects. Each TriePnode carries the array of characters that hold the letters of the words (as shown in Figure 1), the pnodes array (the pointers "down" into the lower levels of the trie), and the dnodes array (the references to the actual data objects).
Finally, the leaves of the trie are TrieDnode objects. The TrieDnode class is more or less a wrapper that includes a ThesaurusEntry item as its single member.
So now, we have all the pieces that we need to build our Thesaurus. We have the structures that hold the words and connections to the synonyms, and we have the structures that support searching.
Time for Persistence
We now have all the pieces
needed to build our thesaurus: structures to hold the words,
connections among synonyms, and a trie for locating words. Now what we
need is someplace to put everything. This is where our object database,
db4o, steps on the stage.
db4o is an open source object database from db4objects available for downloading at www.db4objects.com. There are two versions of db4o, one for Java, and another for the .NET platform. (A Mono version is also available, though it's largely identical to the .NET version.) db4o can be used either as an embedded database library that executes in the same process space as your application, or in client/server form for remote or multi-user access to a database server. We'll be using db4o in its "monolithic" form (part of a single application).
db4o will do more for us than simply serve as a persistence mechanism for our thesaurus. It will give us everything we'll need to get all our classes into a database in the first place. Put another way, we can "bootstrap" our data into the database using db4o's API then let db4o slip into the background as we use the navigation capabilities we've built into trie and ThesaurusEntry to fetch words and synonyms.
We'll be adding words "the hard way" - i.e., using hard coding - so you can see how the class methods work. Obviously, if we're going to add a lot of words, we'd write code to read a vocabulary from a file.
So, to add an entry for the word "heat" as both a noun and a transitive verb, our code would look like this:
ThesaurusEntry te;
POSSynonyms ps;
TrieDnode te;
Trie thesaurusTrie;
...
te = new ThesaurusEntry("heat");
ps = new POSSynonyms(PartsOfSpeech.NOUN);
te.addPOSSynonym(ps);
ps = new POSSynonyms(PartsOfSpeech.VERT_T);
te.addPOSSynonym(ps);
td = new TrieDnode(te);
thesaurusTrie.insert(te.getWord(), td);
...
The code should be pretty easy to follow. First, we create a new ThesaurusEntry for the word "heat." Next, we create POSSynonym objects for noun and transitive verb. (PartsOfSpeech is a helper class we defined to enumerate the parts of speech.) The POSSynonym objects are added to the ThesaurusEntry, which is encapsulated in a TrieDnode. Finally, we insert the word into the trie, with the ThesaurusEntry node associated.
Published May 9, 2007 Reads 14,323
Copyright © 2007 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Rick Grehan
Rick Grehan is a QA engineer at Compuware's NuMega Labs, where he has worked on Java and .NET projects. He is also a contributing editor for InfoWorld Magazine. His articles have appeared in Embedded Systems Programming, EDN, The Microprocessor Report, BYTE, Computer Design, and other journals. He is also the coauthor of three books on computer programming.
- 4th International Cloud Computing Conference & Expo Starts Today
- Publishing Synergy: Blog, Twitter and Ulitzer
- Performance Tuning Essentials for Java
- Cloud Expo New York Call for Papers Deadline December 15
- Google Wave
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- Cloud Computing Can Revitalize Your Career as Software Developer
- SOA World Magazine "Readers' Choice Awards" Voting Is Now Open
- Oracle+MySQL Opponents Take to the Barricades
- Virtualization Expo Call for Papers Deadline December 15
- Oracle Faces Growing Price for MySQL
- SpringSource Moving to Spring 3.0
- 4th International Cloud Computing Conference & Expo Starts Today
- Deputy CIO of the CIA to Keynote 1st Annual GovIT Expo
- Publishing Synergy: Blog, Twitter and Ulitzer
- Performance Tuning Essentials for Java
- Cloud Expo New York Call for Papers Deadline December 15
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Google Wave
- IBM Hardware Chief, Intel VC Exec Arrested in Insider Trading Scam
- Cloud Computing Can Revitalize Your Career as Software Developer
- Oracle-Sun: IBM Reportedly Behind Delay
- Citrix Aims To Cripple VMware’s Cloud Designs
- Oracle Trashes HP Relationship for Sun
- After Ubuntu, Windows Looks Increasingly Bad, Increasingly Archaic, Increasingly Unfriendly
- SCO CEO Posts Open Letter to the Open Source Community
- Simula Labs Launches Hosted Delivery Platform To Enable Enterprise Open Source Adoption
- Where Are RIA Technologies Headed in 2008?
- Source Claims SCO Will Sue Google
- How Open Is "Open"? – Industry Luminaries Join the Debate
- Latest SCO News is Plain Weird
- IBM Tells SCO Court It Can't Find AIX-on-Power Code
- SCO Claims Linux Lifted ELF
- Flashback: Investing in 'Professional Open Source' - Exclusive 2004 Interview with David Skok, Matrix Partners
- HP Starts Pushing Desktop Linux
- Linux Business Week Exclusive: Linux Kernel To Be Re-Written To Counter Microsoft FUD






























