| By Amyuni Tech | Article Rating: |
|
| February 23, 2010 03:51 PM EST | Reads: |
3,005 |
In a previous article, we reviewed the benefits and uses of PDF metadata. Specifically, we looked at the emergence of XMP metadata as a potential standardization for metadata frameworks and at how it can help PDF developers store, exchange, track, and retrieve information. In this article we will explore the injection of custom XMP metadata tags into existing PDF/A documents.
PDF/A and XMP: Inevitable Convergence
In 2001, developers recognized the need to add their own customized XMP metadata tags to PDF documents. They understood that by adding their own information, they could make a document easier to retrieve and include data that would not change regardless of where or how it was processed.
However, the introduction of the PDF/A standard has challenged some developers and has forced them to rethink how they would incorporate their XML customizations, especially into PDF/A documents. They realized that although there were many ways to add custom XML tags, there were only a few ways to keep their new data valid without compromising the PDF/A's format restrictions.
Because of the rising popularity of PDF/A, Amyuni Technologies saw the need to provide developers with a tool that would help them solve specific PDF challenges, such as working within the confines of ISO archiving standard. This tool is the PDF Analyzer.
PDF/A: Driving Archiving and Document Conformance
One example of inserting customized XML information into PDF/A documents is with ERP purchasing applications. Often, developers use these applications to generate invoices or purchase orders in PDF/A for archiving and retrieval purposes. Although these files already contain XMP metadata, additional information can be added to make these files more useful—extractable information from the text content itself.
Items such as P.O. numbers, contact details, or the names of sales persons, departments, authors, and projects are all valuable pieces of information developers can use to create reports and enhance document retrieval. Developers can automate PDF Analyzer to:
- Verify the internal structure of incoming PDF/A purchase orders from an ERP application for PDF/A compliance and repair them if necessary (and possible).
- Locate and extract specific text strings (names, addresses, dates, etc.) from the PDF/A documents and convert them into customized XMP metadata (Figure 1).
Figure 1: Text String Extraction and Conversion

- Convert these text strings into XML and then with this information, create customized XMP extension schemas.
- Reinsert the new customized XMP extension schemas into the XMP streams of the PDF/A documents.
- Resave the document PDF/A and still keep its adherence to the ISO specifications.
Figure 2 outlines how this automated text conversion process would operate, either for single or multiple PDF/A documents:
Figure 2: Text String to XMP Metadata Workflow

When companies implement PDF/A, it’s often because they have high volumes of documents that require archiving. Insurance companies, banks, medical institutions, and manufacturing companies are just some examples of where the PDF Analyzer's automation and XMP customization capabilities can bring a higher degree of efficiency to their documentation workflows.
Learn more about PDF Analyzer at www.pdfanalyzer.com.
Published February 23, 2010 Reads 3,005
Copyright © 2010 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Amyuni Tech
Franc Gagnon is a technical copywriter for Amyuni Technologies–a PDF solution provider. Amyuni products are integrated into applications used worldwide. The company's software tools are the PDF engines behind several leading business applications created by large software companies such as Intuit, Sage, CaseWare and many more.
- Asynchronous Logging Using Spring
- What to Expect in 2012: Cloud Computing and Open Source Software
- Will PaaS Finally Bring Open Source Love to the Enterprise?
- AT&T Joins OpenStack, Floats Cloud Architect
- Red Hat Sets Up GlusterFS Advisory Board
- Linux Virtualization and Tired Open Source Myths
- Acquia Announces Two New Board Members
- OpenOffice.com Lives
- Cloud Computing: A Platform-First Approach
- Powering the Cloud with Open Source
- Top 10 Open Source eCommerce Software (Joomla and Drupal)
- Piston Delivers First OpenStack-Based Cloud OS
- Adobe Sends Flex to the Apache Foundation
- i-Technology in 2012: Five Industry Predictions
- Microsoft Tries Hadoop on Azure
- OpenXava 4.3: Rapid Java Web Development
- Asynchronous Logging Using Spring
- StorSimple Supports OpenStack
- What to Expect in 2012: Cloud Computing and Open Source Software
- Will PaaS Finally Bring Open Source Love to the Enterprise?
- AT&T Joins OpenStack, Floats Cloud Architect
- More Use Cases for Big Data Analytics
- Red Hat Sets Up GlusterFS Advisory Board
- Linux Virtualization and Tired Open Source Myths
- After Ubuntu, Windows Looks Increasingly Bad, Increasingly Archaic, Increasingly Unfriendly
- SCO CEO Posts Open Letter to the Open Source Community
- Simula Labs Launches Hosted Delivery Platform To Enable Enterprise Open Source Adoption
- Where Are RIA Technologies Headed in 2008?
- Source Claims SCO Will Sue Google
- How Open Is "Open"? – Industry Luminaries Join the Debate
- Latest SCO News is Plain Weird
- SCO Claims Linux Lifted ELF
- IBM Tells SCO Court It Can't Find AIX-on-Power Code
- Flashback: Investing in 'Professional Open Source' - Exclusive 2004 Interview with David Skok, Matrix Partners
- Developing an Application Using the Eclipse BIRT Report Engine API
- HP Starts Pushing Desktop Linux























