Click here to close now.

Welcome!

Open Source Authors: Pat Romanski, Mike Kavis, Elizabeth White, Liz McMillan, Yeshim Deniz

Related Topics: Linux, Open Source

Linux: Article

Linux Cover Story — Multidimensional Tagging

Finding information naturally

Multidimensional tagging, a key component in social sharing sites, can potentially help enterprises manage large stores of information. In this article, I'll examine the ways that multidimensional tagging will be implemented using Open Source tools.

As storage costs have continued to decrease, organizations create and retain more information. It is easy - and useful - to keep information online, just in case it's needed. The challenge has become managing information not simply storing it. Structured systems, such as relational database management systems (RDBMS), have well-developed tools for indexing, locating, and retrieving information. Unfortunately not all information fits well into structured systems. Graphics, presentations, spreadsheets, and word processor documents have all proliferated, no longer bound by storage costs. This is especially true of desktop systems where 200GB hard drives are common.

Finding exactly the information needed at the right moment has become difficult. A typical desktop drive has tens of thousands of files. A file server or NAS device could easily have hundreds of thousands or even millions of files. Even if system files and applications are discounted, that's still a lot of files to wade through.

File Systems and Search Engines Provide Some Relief
File systems impose a certain degree of order on unstructured information. By allowing information to be placed in hierarchical directories, file systems group files that share some relationship into meaningful categories. The downside to categorization using a file system is complexity for the user. The user is forced to descend through more and more layers of directories to find what he wants. Just remembering where the correct directory is becomes a chore. Information organization via the file system becomes less useful when information is spread across an enterprise. Incompatible systems, individual ways of building directory structures, and shear scale make this a difficult way to manage information assets across an organization, even a small one.

Search engines provide some respite from our information organizational woes. By indexing keywords and storing references to their source in a database, it's possible to look quickly through all the available files. The vendors of searching engine technology, drawing from vast experience in indexing hundreds of millions of Web sites, provide tools that let users find information spread across a huge number of desktop disks and enterprise storage systems.

Tagging Provides Necessary Clues
The trouble with the typical search engine is that the user must first know what he's looking for. He has to have some idea what keywords are indexed for a particular piece of information. If the indexed words don't match the words that a user thinks apply to the information, then the search engine won't find what the user is looking for. A marketing brochure, for example, may not have any words in it that say "marketing brochure" but that's how the user thinks of it. The way the information is categorized in the users head doesn't always match the strict keyword index of the engine.

This has already become something of a problem for Web sites that allow users to share large amounts of information. Cues are needed to help visitors who come to a site find what interests them without knowing the exact nature of the content. In typical Internet fashion an organic solution has arisen called multidimensional tagging, labeling, social tagging, folksonomy, or just simply tagging. It's multidimensional because a single piece of information can have many different tags, reflecting the different dimensions that users apply to it. Tagging lets users assign a set of categories to a piece of information when they create it. The tag system, also known as the tag cloud, grows as people use the information and see relationships in it that the original author might have missed. Users categorize information according to how they view the information, which makes it useful for groups of people who don't always think alike, such as engineers and marketing people.

Tagging is a key feature in social sharing sites such as Yahoo's del.icio.us and Flickr as well as You Tube, and Userscripts.org. Whether it's sharing interesting Web pages, photos, video, or Greasemonkey scripts, all of these sites rely on user categorization. Without tagging, no one would find anything of interest and the site would fail. Unlike simple storage sites (such as Yahoo Photos and Yahoo Briefcase), they require a way of presenting information to users that lets them find it quickly. Users can find information even when they're not really sure what they want. Social sites don't abandon search engines. Instead they integrate searching with tagging to provide a breadth of information retrieval options. Most let users search the tag cloud as well as scanned keywords, providing a rich search environment.

Tagging: New to the Enterprise
Tagging technology for the enterprise environment is new and not widely deployed in products. That's unfortunate. Not only is it extremely useful for finding information, it's also a natural way to do it. This is especially true for people used to social sharing sites. The tagging methodology facilitates the efficient sharing of information across many users and a large file space. It's exactly what enterprises need to make best use of the great stores of unstructured information on their corporate networks.

There are three ways that tagging is being implemented in corporate environments: integrated into applications; as a part of a standalone information management system; and, eventually, as a file system feature. The first kind of implementation is readily available. Image management systems, even ones directed at the desktop environment like Google's Picasa, include tagging as a core feature. The next version of the Thunderbird e-mail client (version 2.0) is expected to include e-mail tagging, augmenting its current search capabilities. Of course, once users get used to tagging for managing certain types of information, they will wonder why they can't use it for all the information that they need to access. They'll expect tag clouds that span all kinds of information in the enterprise.

Tagging is also being implemented in targeted information management tools. Tools for searching large stores of information in a corporate network are still at an early stage but tagging should be expected to make an appearance in information management and search engine tools in the near future. Consider this, Yahoo uses tagging in Flickr and del.icio.us as well as the upcoming My Web 2.0. Is it a stretch to expect it to implement tagging in the corporate search arena? The same is true for Google, which uses tagging on its eBlogger site, GMail Web-based mail service, and Picasa image management tool.

Finally, tagging can be expected to become a feature of the file system and operating system. Some aspects of tagging already exist in operating systems. The ability to attach keywords to files in Microsoft Windows is an example of file system-level tagging. These keywords are currently read by Microsoft's desktop search engine, creating a crude multidimensional tagging feature. Of course, entering and displaying tags is clunky and tags can't be displayed in and of themselves, rendering it more of a hack than a real feature. It does, however, point the way to future features of the operating system. Fully integrated into an operating system as normal metadata and using standard visual cues such those used on social sites, tagging will become a typical part of most corporate environments.

Tools Exist, File System Hooks Don't
The tools for corporate tagging capabilities already exist in the Open Source community. Most of it is encapsulated in the tools used by social bookmarking sites, which are often based on the LAMP stack. They're typically written in common scripting languages, such as Perl or Python, or Java. One such Open Source tool is unalog. Ostensibly a social bookmarking system, it's written in Python and the source is readily available on SourceForge. While the core tools exist, the hooks into the file system are still mostly missing.

A somewhat different but innovative approach is evident with Flickrfs or the Flickr File System. Based on FUSE, it creates a virtual file system with tagging for the Flickr digital photo management service. A fusion of file system and service, Flickrfs lets Linux users access the Flickr service as if it were any other mounted Linux file system. Photos can be accessed through the same tags available on Flickr using standard Linux commands such as cp. Flickrfs represents another way that tagging may come to information management - as a specific application or service but integrated into the normal file system.

Conclusion
Multidimensional tagging provides an opportunity to let users manage information more in line with their natural way of thinking. By sharing tags across the enterprise, users will spend less time looking for information and more time making use of it. Unlike other collaborative systems, users do all the work without legions of editors making decisions that users find mystifying. The social sites on the Internet have shown this to be a viable information management model. It's a matter of how and when, not if, these features become available to the corporate enterprise.

References

More Stories By Tom Petrocelli

Tom Petrocelli, president of Technology Alignment Partners, is a veteran of over 21 years in the technology arena. His background encompasses software engineering, marketing, IT, sales, marketing, and general management. He has worked in various industries including defense, digital signal processing, call center/CRM, networking, and data storage and storage networking. Tom is also the author of a new book entitled Data Protection and Information Lifecycle Management, published by Prentice Hall.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Linux News Desk 05/21/06 09:33:34 AM EDT

Multidimensional tagging, a key component in social sharing sites, can potentially help enterprises manage large stores of information. In this article, I'll examine the ways that multidimensional tagging will be implemented using Open Source tools.

@ThingsExpo Stories
When it comes to the Internet of Things, hooking up will get you only so far. If you want customers to commit, you need to go beyond simply connecting products. You need to use the devices themselves to transform how you engage with every customer and how you manage the entire product lifecycle. In his session at @ThingsExpo, Sean Lorenz, Technical Product Manager for Xively at LogMeIn, will show how “product relationship management” can help you leverage your connected devices and the data they generate about customer usage and product performance to deliver extremely compelling and reliabl...
SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
The IoT market is projected to be $1.9 trillion tidal wave that’s bigger than the combined market for smartphones, tablets and PCs. While IoT is widely discussed, what not being talked about are the monetization opportunities that are created from ubiquitous connectivity and the ensuing avalanche of data. While we cannot foresee every service that the IoT will enable, we should future-proof operations by preparing to monetize them with extremely agile systems.
There’s Big Data, then there’s really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. Learn about IoT, Big Data and deployments processing massive data volumes from wearables, utilities and other machines.
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Intelligent Systems Services will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-critical systems. ISS has completed many successful projects in Healthcare, Commercial, Manufacturing, ...
PubNub on Monday has announced that it is partnering with IBM to bring its sophisticated real-time data streaming and messaging capabilities to Bluemix, IBM’s cloud development platform. “Today’s app and connected devices require an always-on connection, but building a secure, scalable solution from the ground up is time consuming, resource intensive, and error-prone,” said Todd Greene, CEO of PubNub. “PubNub enables web, mobile and IoT developers building apps on IBM Bluemix to quickly add scalable realtime functionality with minimal effort and cost.”
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com), moderated by Ashar Baig, Research Director, Cloud, at Gigaom Research, Nate Gordon, Director of T...
DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
The Internet of Things (IoT) is causing data centers to become radically decentralized and atomized within a new paradigm known as “fog computing.” To support IoT applications, such as connected cars and smart grids, data centers' core functions will be decentralized out to the network's edges and endpoints (aka “fogs”). As this trend takes hold, Big Data analytics platforms will focus on high-volume log analysis (aka “logs”) and rely heavily on cognitive-computing algorithms (aka “cogs”) to make sense of it all.
The Internet of Everything (IoE) brings together people, process, data and things to make networked connections more relevant and valuable than ever before – transforming information into knowledge and knowledge into wisdom. IoE creates new capabilities, richer experiences, and unprecedented opportunities to improve business and government operations, decision making and mission support capabilities. In his session at @ThingsExpo, Gary Hall, Chief Technology Officer, Federal Defense at Cisco Systems, will break down the core capabilities of IoT in multiple settings and expand upon IoE for bo...
Sensor-enabled things are becoming more commonplace, precursors to a larger and more complex framework that most consider the ultimate promise of the IoT: things connecting, interacting, sharing, storing, and over time perhaps learning and predicting based on habits, behaviors, location, preferences, purchases and more. In his session at @ThingsExpo, Tom Wesselman, Director of Communications Ecosystem Architecture at Plantronics, will examine the still nascent IoT as it is coalescing, including what it is today, what it might ultimately be, the role of wearable tech, and technology gaps stil...
With several hundred implementations of IoT-enabled solutions in the past 12 months alone, this session will focus on experience over the art of the possible. Many can only imagine the most advanced telematics platform ever deployed, supporting millions of customers, producing tens of thousands events or GBs per trip, and hundreds of TBs per month. With the ability to support a billion sensor events per second, over 30PB of warm data for analytics, and hundreds of PBs for an data analytics archive, in his session at @ThingsExpo, Jim Kaskade, Vice President and General Manager, Big Data & Ana...
For years, we’ve relied too heavily on individual network functions or simplistic cloud controllers. However, they are no longer enough for today’s modern cloud data center. Businesses need a comprehensive platform architecture in order to deliver a complete networking suite for IoT environment based on OpenStack. In his session at @ThingsExpo, Dhiraj Sehgal from PLUMgrid will discuss what a holistic networking solution should really entail, and how to build a complete platform that is scalable, secure, agile and automated.
We’re no longer looking to the future for the IoT wave. It’s no longer a distant dream but a reality that has arrived. It’s now time to make sure the industry is in alignment to meet the IoT growing pains – cooperate and collaborate as well as innovate. In his session at @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, will examine the key ingredients to IoT success and identify solutions to challenges the industry is facing. The deep industry expertise behind this presentation will provide attendees with a leading edge view of rapidly emerging IoT oppor...
In the consumer IoT, everything is new, and the IT world of bits and bytes holds sway. But industrial and commercial realms encompass operational technology (OT) that has been around for 25 or 50 years. This grittier, pre-IP, more hands-on world has much to gain from Industrial IoT (IIoT) applications and principles. But adding sensors and wireless connectivity won’t work in environments that demand unwavering reliability and performance. In his session at @ThingsExpo, Ron Sege, CEO of Echelon, will discuss how as enterprise IT embraces other IoT-related technology trends, enterprises with i...
One of the biggest impacts of the Internet of Things is and will continue to be on data; specifically data volume, management and usage. Companies are scrambling to adapt to this new and unpredictable data reality with legacy infrastructure that cannot handle the speed and volume of data. In his session at @ThingsExpo, Don DeLoach, CEO and president of Infobright, will discuss how companies need to rethink their data infrastructure to participate in the IoT, including: Data storage: Understanding the kinds of data: structured, unstructured, big/small? Analytics: What kinds and how responsiv...
Cloudian, Inc., the leading provider of hybrid cloud storage solutions, today announced availability of Cloudian HyperStore 5.1 software. HyperStore 5.1 is an enhanced Amazon S3-compliant, plug-and-play hybrid cloud software solution that now features full Apache Hadoop integration. Enterprises can now transform big data into smart data by running Hadoop analytics on HyperStore software and appliances. This in-place analytics, with no need to offload data to other systems for Hadoop analyses, enables customers to derive meaningful business intelligence from their data quickly, efficiently and ...
Since 2008 and for the first time in history, more than half of humans live in urban areas, urging cities to become “smart.” Today, cities can leverage the wide availability of smartphones combined with new technologies such as Beacons or NFC to connect their urban furniture and environment to create citizen-first services that improve transportation, way-finding and information delivery. In her session at @ThingsExpo, Laetitia Gazel-Anthoine, CEO of Connecthings, will focus on successful use cases.