Welcome!

Open Source Authors: Maureen O'Gara, Jeremy Geelan, Liz McMillan, Reuven Cohen, Lavenya Dilip

Related Topics: Open Source

Open Source: Article

Mixing Data & Data Structures in an Object Database

An introduction to the trie

The first element of this class, theWord, holds the string of the word itself. The second is the array of POSSynonyms.

The result lets us create a network of interconnected synonyms. You can get a feel for what it might look like if you examine Figure 2. The ThesaurusEntry object for the word "heat" would include a posSyns array pointing to a POSSynonyms object for the verb synonyms, and another for the noun synonyms. The verb POSSynonyms object would, in turn, include a synonyms' array that points to ThesaurusEntry objects corresponding to the words boil, seethe, simmer, and so on. The POSSynonyms object would include a synonyms' array pointing to ThesaurusEnty objects for warmth, glow, temperature, and so on.

Building a Trie
So much for the storage of words and connections to synonyms. Now we need a data structure that's suitable for locating a word so a user can do a search. Note that this data structure only has to locate words. We don't require lexicographical ordering (as would be provided in, say, a binary tree or a B-Tree). While there are several data structures that would serve for this task, for this example, we'll use a trie.

A trie is a tree-like data structure. However, whereas most tree data structures store one or more entire keys (in our case, keys are words) in each node, a trie chops the key apart and stores pieces of it on each node. So our trie will store one letter on each node.

This will be easier to understand if you look at the diagram in Figure 3. Each node in a trie contains three elements and each element is an array:

  • A character array. This holds the letters of the words that the trie stores.
  • A node-pointer array. Members of this array point to child nodes in the trie.
  • A data-pointer array. Members of this array point to the actual data that the trie indexes.

More Stories By Rick Grehan

Rick Grehan is a QA engineer at Compuware's NuMega Labs, where he has worked on Java and .NET projects. He is also a contributing editor for InfoWorld Magazine. His articles have appeared in Embedded Systems Programming, EDN, The Microprocessor Report, BYTE, Computer Design, and other journals. He is also the coauthor of three books on computer programming.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.