Wright Information Indexing Services

Software is heavily involved in the indexing process: the indexer needs to understand the file formats needed by the publisher, and the indexer needs to use software that can create the needed formats or coding. Wright Information can provide indexes in:

  • Word doc, docx, and rtf formats
  • XML coding
  • HTML coding
  • EndNote databases
  • FileMaker Pro databases
  • Tab-delimited files
  • Comma-delimited files
  • Ebook formats
  • Standalone indexing tools
  • Embedded indexing tools (InDesign, Word, Frame)
  • Tagging tools (a form of embedding with placeholder tags)
  • Taxonomy, thesaurus, and controlled-language software packages
  • Folksonomy and tag-related tools (tag cloud development)
  • Other software tools, including keywording, weighted-text search tools, automated indexing software (concordance software), abstracting and citation-control software, and web indexing software

We can work with you determine what formats you need, and help figure out when in the process you need indexing created.

Which software package and technique is used depends on variables such as budget, eventual re-usability of the source material, translation needs, time constraints, media used to publish the material, file sizes and transferal issues, and individual preferences. All of these terms can sound confusing, so here's an overview of what software can accomplish in indexing.

Standalone indexing tools, usually used for back-of-the-book indexes, allow indexers to work from page-numbered galleys. The indexing is completely separate from the published material. Wright Information uses CINDEX for these tasks, which allows formatting of RTF files and the generation of a database of entries for any translation purposes.

Embedded indexing tools allow indexing codes to be embedded in the electronic text of a book or file, and allow the index's locators to be updated as the text changes.

Text rolls from page to page in electronic files. An index entry embedded near the word "The" on page 4 (in the above example) will wind up on page 5 (in the following example) if new text is added on page 3, forcing "The" onto page 5. Because the codes flow with the text, the index can be regenerated at any time to match new text arrangements.

When inserting codes, or "embedding" codes, indexers must work in the same files as the publishers.
InDesign embedded codes:

InDesign's codes are invisible until you open up the Index palette or the Index Marker dialog box. Framemaker's codes are invisible unless you use IxGen, Emdex, or the structured XML window. Word's codes can have their visibility switched on or off. (For more about InDesign-specific ebook indexing, click HERE.)

Word's embedded codes:

Each embedded indexing module has its own characteristics, coding mechanisms, and foibles. Some can import and export text with indexing intact; some cannot. We can recommend processes that work with each program.

Tagging tools allow indexing codes to be embedded in the electronic text after the indexing is complete. The publisher or the indexer inserts numbered dummy tags in the files, and then builds the index separately. The final step uses macros to insert the indexing at each tag in the files. Many of these tools are developed in-house to fit the publishing group's needs.

A sample of index entries for Microsoft's tagged text system:

A sample of text tagged in preparation for Microsoft's tagged indexing system:

Taxonomy, thesaurus, and controlled-vocabulary tools aid in building controlled languages and sets of keywords for metadata and web sites.

A screen shot of the TermTree interface:

Folksonomies, tag clouds, and tagging tools vary in nearly every application and web site in which they are used.

A folksonomy is a list of labels or tags that users generate. They can be a label for a picture, a title for a file, a category for a blog posting. When several people combine their tags, the results can be displayed as a folksonomy. Usually these labels are displayed in a tag cloud, allowing a reader to easily see which term has the most information, and what other words are being used.

Folksonomies work best when administrators can merge similar words and edit the vocabulary. Folksonomies are the most powerful for personal use when you can retain your own tags that have meaning for you. The ideal software solution would allow both merging terms at a broad level as well as allowing users to keep their own data sets as well. Drop-down or automatic-fill boxes for tags can suggest already-approved tags to help with consistency.

At some point, all folksonomies become chaotic, and control and cleanup should be done to make them usable again.

Ebook indexing tools: Indexes in ebooks are startlingly unused at this point in eBook development. To answer this need, the American Society for Indexing's Digital Trends Task Force has focused on educating and creating standards for indexes in eBooks. The vision is that search in eBooks can integrate with indexing, and that the indexing can inform the search, making it better and more productive. We feel the user should still be able to browse the index when needed, but a dead chapter in the back of an ebook does no one any good.

With the PDF format, active indexes are easily generated if your indexing is embedded into the files in a tool like Framemaker or InDesign. Once the entries are embedded, you can output a PDF and the index links will be active. Word files are not quite so easily converted, but using a tool like Sonar Activate, you can convert a Word-generated index output to PDF quite easily.

With the other formats, such as ePub or mobi, your layout tool may not output the needed index entries to make the index active once it goes into an eBook format. Older versions of InDesign, for instance, do not output index entries when generating the ePub format. InDesign CC and later versions do, although you need to work with the CSS style sheet to get the indents formatted correctly. Check our page on InDesign.

We are happy to work with you to find a way to activate the index in your eBook. We can embed anchor points into HTML, XHTML or XML, and link the indexing to those points instead of pages (or both, if you need to output print as well). We can coordinate with ePub conversion houses to actively link your index, and we can work with you to make that process seamless. And we have scripts to help indexes work with InDesign EPUB outputfor versions that do not output the links natively.

Web indexing software aids in building HTML web indexes. Wright Information uses a variety of proprietary tools as needed by the client to build metadata sets, Web-based indexes, and compiled scripted Web indexes. More and more web sites are including an A-Z index to help users find information. These indexes will not link to every document on a web site, but rather to portal pages. By portal pages, we mean a page that is the main location for a governmental department, or the lead-in page for a body of knowledge. By linking to portals, the web index does not need to track ever-changing documents, and can survive the updating that is necessary for web site information.

Other software tools:
Keywording is used primarily in online help materials. It can be hard-coded jumps, similar to HTML jumps, or it can be inserted as embedded coding and compiled into a list by the software. Wright Information has worked in a variety of help authoring tools.

Weighted-text search tools, similar to the intelligence in agents or Microsoft help systems, involve building terminology sets for helping the intelligence work. An example would be helping an agent identify the different between a cell in an Excel spreadsheet and a cell in a jail. Often terminology sets are built specifically for the information system, outlining all the synonyms and special meanings that a particular product uses. Indexing thought and practice comes into play in the building of these terminology sets.

Automated indexing software builds a concordance, or a word list, from processed files. Although the manufacturers often claim these packages build indexes, the actual results are a list of words and phrases, sometimes useful in the beginning stages of building and index. Usability tests of these packages have shown that the word lists omit many key ideas and phrases, and cannot fine-tune terminology for easy retrieval, or build the needed hierarchies of ideas that professional indexing can. Free-text search, also produced automatically by software, is useful in some environments, but tests have shown the retrieval is much higher with a human-generated index. Wright Information owns software that will generate concordances, but doesn't use it for a finished index.

Abstracting and citation-control software aids in building abstracts with associated keywords. Wright Information uses EndNote for abstracting needs.

Contact Wright Information at jancw@wrightinformation.com.