Extraction Settings

This article is only relevant if you still use Acrolinx Classic edition. For all Sidebar editions, you configure extraction settings in Content Profiles directly in the Acrolinx Dashboard. Your Guidance Package comes with default Content Profiles for different file types and editors.


Extraction settings store the filter and segmentation settings for a specific document type. You can update the extraction settings for existing document types or create new extraction settings for new document types.

After you have set up your checking profiles, you add extraction settings to your checking profiles. You can also update the extraction settings and document type identifiers to ensure that users receive the correct filter and segmentation settings.

You create and edit extraction settings files by updating the filter and segmentation settings in the plug-in options dialog box. When you connect to the Acrolinx server, you might be prompted for your user ID. You can use your own user ID or a separate user ID that you have created to administer checking profiles. Ensure that your chosen user ID meets the following requirements:

  • The user ID has the Profile Administrator role.
  • The user ID belongs to a user group that has the relevant checking profile assigned to it.

    For example, to edit DITA tasks, assign a checking profile that has extraction settings for DITA tasks to the user group.
    Tip: To change extraction settings for any checking profile, you can assign all checking profiles to the "profile administrator" role and give yourself this role.

Creating New Extraction Settings for Checking Profiles

If you just started working with checking profiles, you might not have saved or migrated extraction settings. You can easily create new extraction settings in the Acrolinx plug-ins. After you create new settings in the plug-in interface, your revision will appear in Dashboard. You can then save and assign the new extraction settings to checking profiles.

To create new extraction settings, follow these steps:

  1. Start an Acrolinx plug-in, open a document that has the relevant document type, and edit the filter and segmentation settings.
    • If you use a legacy plug-in , you would edit the filter and segmentation settings in the plug-in options.
      For example, suppose that you want to add extraction settings for DITA tasks.

      • You would open a DITA task and edit the filter and segmentation settings in the plug-in options.
      • After you click OK , an entry for your revision should appear in the change history in the Dashboard.


    • If you use a Windows SDK-based plug-in with full support for checking profiles, you would update the filter and segmentation settings in the extraction settings editor.
      For example, suppose that you want to add extraction settings for DITA tasks.

      • You would open a DITA task, open the Check tab of the plug-in options, and click  to open the Edit Extraction Settings dialog box.
      • You select an extraction setting, add a name for the settings revision, and edit the filter and segmentation settings.
      • After you click OK, an entry for your revision should appear in the change history in the Dashboard.
    Attention: If you create extraction settings for a Microsoft Office product, remember that font and style names are language-dependent. For example, if you create extraction settings on an English version of Word, you can’t use these extraction settings on a German version of Word. If your company uses Office products with different locales, create the extraction settings for each locale separately.
  2. Navigate to Guidance Settings Checking Profiles Extraction Settings and confirm that your revision appears in the change history.
  3. Save the revision as a new extraction setting and assign it to checking profiles.

Updating Saved Extraction Settings

If you have saved extraction settings that are used in a checking profile, you can easily update those settings. After you revise the settings in the plug-in interface, your revision will appear in Dashboard. You can overwrite the saved extraction settings with your new revision.

To update saved extraction settings, follow these steps:

  1. Confirm that an applicable checking profile is assigned to your user group.

    An applicable checking profile is a profile which contains the extraction settings that you want to update.

    For example, suppose that you’re a profile administrator and want to update the settings for DITA tasks. You have a checking profile called "EN Techpubs" which contains extraction settings for DITA tasks.

    You would confirm that the checking profile "EN Techpubs" is assigned to the Acrolinx role "profile administrator".
    Tip: If you’re unsure about which checking profile to look for, open the Dashboard and navigate to Guidance Settings >Checking Profiles Usage Reports Guidance Settings. Sort by resource type and find extraction settings that you want to change. The Checking Profiles column displays all checking profiles that use these extraction settings.

  2. Start an Acrolinx plug-in, open a document that has the relevant document type, and edit the filter and segmentation settings.
    • If you use a legacy plug-in, you would update the filter and segmentation settings in the plug-in options.
      For example, suppose that you want to revise the extraction settings for DITA tasks.

      • You would open a DITA task, and update the filter and segmentation settings in the plug-in options.
      • After you click OK , an entry for your revision should appear in the change history in the Dashboard.


    • If you use a Windows SDK-based plug-in with full support for checking profiles, you would update the filter and segmentation settings in the extraction settings editor.
      For example, suppose that you want to revise the extraction settings for DITA tasks.

      • You would open a DITA task, open the Check tab of the plug-in options, and click  to open the Edit Extraction Settings dialog box.
      • You select a filter mode, add a name for the settings revision, and edit the filter and segmentation settings.
      • After you click OK, an entry for your revision should appear in the change history in the Dashboard.


  3. Navigate to Guidance Settings Checking Profiles Extraction Settings and confirm that your revision appears in the change history.
  4. Save the revision as a new extraction setting and assign it to checking profiles.
  5. Ask another user who works with the same extraction settings to open a similar document and check that they can see your changes.
    Remember: If you want to update the new extraction setting in the plug-in again, you must first restart the editor application. The extraction settings editor displays the new revision only after a restart of the editor.

Deleting Saved Extraction Settings and Revisions

You might delete saved extraction settings if they become obsolete. For example, during the migration the server might create extraction settings for document types that you no longer use. You can ensure that the settings aren’t added to checking profiles by deleting the obsolete settings. You can also delete revisions to the extraction settings from the change history. You might delete a revision that contains a mistake to prevent users from saving it as new extraction settings.

To delete saved extraction settings and revisions, follow these steps:

  1. Navigate to Guidance Settings Checking Profiles Extraction Settings.
  2. To delete a revision, click the delete icon  next to the revision in the Change History table.
  3. To delete saved extraction settings, click the delete icon  next to the settings name in the Saved Extraction Settings table.

Creating Two Sets of Extraction Settings for the Same Document Type

In some circumstances, different user groups might require different filter and segmentation settings for the same document type. For example, one user group might check text in an element that another user group prefers to exclude from checking. In this case, you must save two sets of extraction settings for the same document type. After you’ve saved the extraction settings, you assign the settings to the relevant checking profiles.

These checking profiles should each be assigned to the different user groups. For example, take the following scenario:

  • Users in the "Transport" division want to have the draft-comments element excluded when checking DITA tasks.

    Users in the "Transport" division have the checking profile "en-transport" assigned to their user group.


  • Users in the "Appliances" division want to check draft comments when checking DITA tasks.

    Users in the "Appliances" division have the checking profile "en-appliances" assigned to their user group.


In this scenario, you would save one set of extraction settings that is configured to exclude draft comments from tasks and another set of settings where draft comments are included. You would then add the first set of extraction settings to the checking profile "en-transport" and the second set to "en-appliances".

To create two sets of extraction settings for the same document type, follow these steps:

  1. If you do not have any initial extraction settings, save the first set of extraction settings and add those settings to your first checking profile.
  2. Start an Acrolinx plug-in and add the changes that you need for the second set of extraction settings.
  3. Navigate to Guidance Settings Checking Profiles Extraction Settings and select your revision from the change history.
  4. In the Saved Extraction Settings panel, select Save with new name and enter a name for the settings.

    Ensure that the name clearly distinguishes these settings from the first set of settings.

    For example, you could have two sets of extraction settings called "DITA Tasks (comments included)" and "DITA Tasks (comments excluded)".

  5. Add the second set of extraction settings to a second profile.
  6. Ask users from both groups to check their filter and segmentations and confirm that they each receive the correct settings.

How the Plug-ins Get Extraction Settings from the Server

If your users don’t receive the extraction settings that you expect, you might have to troubleshoot how the extraction settings are allocated. To troubleshoot effectively, it’s important to understand how the plug-ins get the right extraction settings from the Acrolinx server.

Document Type Identifiers

The Acrolinx plug-ins use different document type identifiers to get the right extraction settings from the Acrolinx server. For example, all plug-ins have a unique signature that the Acrolinx server can also use to determine the document type.

For example, if you check a document with the Acrolinx Plug-in for Microsoft Word, the server can assign the identifier "Word document" to your document and get the right extraction settings.

If you work with XML, the Acrolinx server can also use the information about the Document Type Definition (DTD) that is included in each XML file.

Example Scenario

Suppose that you have several sets of extraction settings that are saved in Acrolinx and included in one checking profile that is assigned to you. When you run a check or open the Acrolinx options, Acrolinx takes the following steps to find the right extraction settings for your document:

  1. The Acrolinx plug-in sends information about the document type to the Acrolinx server.

    This information is divided up into identifiers. One identifier could be the plug-in signature and another identifier could be the public ID.

    To learn more about these identifiersreview the full list of the document type identifiers.

  2. The server checks this information against the information about the currently saved extraction settings.
    • If all identifiers from the plug-in match the identifiers in a set of saved extraction settings, the server makes these settings available to you. Some plug-ins send only one identifier such as the plug-in signature:
      • If you have saved extraction settings that also contain the same plug-in signature and nothing else, the server makes these settings available to you.
      • If your saved extraction settings contain the same plug-in signature but also another identifier such as the public ID, you won’t receive these extraction settings.

        This behavior occurs because there’s a mismatch between the identifiers that the plug-in provides and the identifiers that the server has stored.

      If you use server-side extraction, and you have several matching sets of extraction settings, the server selects the first set of extraction settings in the list of matches and applies the settings to the document.

    • If the identifiers from the plug-in don’t match any combination of identifiers that are in each of your saved extraction settings, you won’t receive any extraction settings at all.

      In this case, you must define new extraction settings in the plug-in interface, save them, and assign them to your checking profile.

Document Type Identifiers

In versions 4.0 or later of Acrolinx, the following document type identifiers are available to match settings to documents. The Acrolinx server can use one or more of these identifiers to assign extraction settings to document types. Each Acrolinx plug-in sends at least one of these identifiers to the server when you run a check.

When you save extraction settings, the server saves the identifiers that it has received from the plug-in as part of the saved extraction settings. The server can then use these identifiers to find your extraction settings the next time you need them.

You can also use these identifiers to assign context definitions for specific document types.

The following table describes each of the document type identifiers. If you have to edit an extraction settings configuration file directly, you can use this information to correctly enter the identifier. If you work with context definitions, the table also contains instructions on how to add document type identifiers to a context definition file.

TABLE 1. DOCUMENT TYPE IDENTIFIERS

Element NameValues
legacyFallback

The legacyFallback identifier is an identifier that legacy plug-ins send to the server. A legacy plug-in is any plug-in with the version 4.0 or earlier. Use this identifier according to the following examples:

  • Example for context definitions:
    • For XML files, enter the public ID as the element content. For example, to define a context definition for DITA concepts, you would enter the legacyFallback element as follows:
      <docType>
              <legacyFallback>-//OASIS//DTD DITA Concept//EN</legacyFallback>
          </docType>
    • For HTML files, enter the name of the root element as the element content. The root element for HTML documents is nearly always "html ", so you would enter the legacyFallback element as follows:
      <docType>
              <legacyFallback>html</legacyFallback>
          </docType>
    • For all remaining document types, enter the plug-in short name as the element content. For example, to define a context definition for Microsoft Word documents, you would enter the legacyFallback element as follows:
      <docType>
              <legacyFallback>acrocheck4word\\_</legacyFallback>
          </docType>
  • Example for extraction settings:
    "legacyFallback": "-//OASIS//DTD DITA Concept//EN"
type

Describes the type of file that was checked. The format could be text, html or xml.

To assign settings to HTML files, you would enter the type identifier as follows:

  • Example for context definitions:
    <type>html</type>
  • Example for extraction settings:
     "documentTypeDescription": { "type": "html", },
publicId

The public ID of an XML document type.

For example, to assign settings to DITA concepts, you would enter the publicId identifier as follows:

  • Example for context definitions:
    <publicId>-//OASIS//DTD DITA Concept//EN</publicId>
  • Example for extraction settings:
     "documentTypeDescription": { "publicId": "-//OASIS//DTD DITA Concept//EN", },
systemId

The system ID of an XML document type.

For example, to assign settings to DITA concepts, you would enter the systemId identifier as follows:

For example, assign settings to DITA concepts, you would enter the publicId identifier as follows:

  • Example for context definitions:
    <systemId>concept.dtd</systemId>
  • Example for extraction settings:
     "documentTypeDescription": { "systemId": "concept.dtd", },
schemaName

The name of an XML document schema.

For example, to assign settings to XML files that use the schema "notes", you would enter the schemaName identifier as follows:

  • Example for context definitions:
    <schemaName>note.xsd</schemaName>
  • Example for extraction settings:
    "documentTypeDescription": { "schemaName": "note.xsd", },
rootElement

The root element of an XML or HTML document.

You might use this identifier for simple XML documents that don’t have a document type definition or schema.

For example, to assign settings to XML files that begin with the element <product>, you might enter the rootElement as follows:

  • Example for context definitions:
    <rootElement>product</rootElement>
  • Example for extraction settings:
     "documentTypeDescription": { "rootElement": "product", },

You can also use this identifier to assign settings to HTML files. The root element for HTML documents is always html.

templateName

The name of the Microsoft Word template that a Word document is based on.

You might use this identifier if you have different Microsoft Word templates for different types of Word documents.

For example, to assign settings to Word documents that use a template called userGuides.dotm, you might enter the templateName element as follows:

  • Example for context definitions:
    <templateName>userGuides.dotm</templateName>
  • Example for extraction settings:
     "documentTypeDescription": { "templateName": "userGuides.dotm", },
clientSignature

The signature that an Acrolinx plug-in sends when it authenticates with an Acrolinx server.

You might use this identifier for customized Acrolinx integrations such as CMS integrations that don’t send a plug-in short name.

For example, to assign settings to an Acrolinx integration that uses the signature "xmF0YZzgQ233lY2tlcg", you would enter the clientSignature element as follows:

  • Example for context definitions:
    <clientSignature>xmF0YZzgQ233lY2tlcg</clientSignature>
  • Example for extraction settings:
     "documentTypeDescription": { "clientSignature": "xmF0YZzgQ233lY2tlcg", },