Get Started with Content Profiles

Configure How Acrolinx Reads Your Content

Let's imagine that you have an XML document that's similar to other XML documents that you have, and you want Acrolinx to recognize and process specific custom elements or attributes. So, you want to tell Acrolinx how to read your XML documents. Where do you start? Let’s look at a sample document and create a Content Profile for it:

 <!DOCTYPE HTML PUBLIC "//acrolinx example-v1" "_"> 
<document> 
    <h1>Content Profiles Available From Acrolinx Server 5.3 Help Extend How Acrolinx Reads Your Text</h1> 
    <paragraph> Content Profiles are a cool new feature in Acrolinx Server 5.3. In this tutorial, we'll learn how to configure a Content Profile.
        <comment>Note: Previously, you used CSDs to configure server-side segmentation.</comment> 
    </paragraph> 
</document>

All the changes that you make in your Content Profile are effective immediately.

The GENERAL Tab

Name your Content Profile. It’s for XML so you can add that to its name, and maybe you want to be more specific, for example, 'XML - Tutorial'. A good name includes the type of file as well as what you want to use the Content Profile.

Add a description that reminds you and others what you're using the Content Profile for.

The CRITERIA Tab

Define how to tell Acrolinx when it should use this Content Profile to read your content. We're going to use the document type and the Public ID to identify the content we want Acrolinx to read with this Content Profile.

  1. Select the Type. For example, 'XML'.
  2. Enter the Public ID. It's the most reliable way to identify or match XML content. In this example, it's //acrolinx example-v1.


  3. You can already test if the Content Profile is working. To test which Content Profile Acrolinx used, follow these steps:
    • Run a check on your XML document.
    • Open the Scorecard and expand the Administrative Information section.

      You should see Content Profile: XML - Tutorial.

      If Acrolinx used a different Content Profile, then check the criteria of that Content Profile to troubleshoot.

The EXTRACTION Tab

And now we come to the heart of your Content Profile. Chances are, you’ve done this before. You’ve defined the extraction settings for elements and you’re familiar with the terms. If you haven’t done it before or if you’d like a refresher, check out the Extraction Settings section in the Content Profiles Reference Guide.

The Starting Element set to Include works for most cases. This means that Acrolinx will read everything, so we can begin by excluding what we don’t want. Referring back to our XML example, let's take the element ‘<comment>’. You probably don’t want Acrolinx to read comments, so let’s exclude it.

  1. Select exclude in Filter Mode.
  2. Enter the name of the tag in Element Name. In this example, it's comment.
  3. Press Enter to add it.
  4. Test the Content Profile:
    • Make sure that you have an issue in the comment, just like in our example.
    • Run a check, and the issue shouldn't appear anymore.

      It worked!

Next, let’s look at the title. Acrolinx recognizes sentences by punctuation, but titles are a special case since they don’t need punctuation. We can tell Acrolinx this using the Break Level.

  1. Select include in Filter Mode.
  2. Set the Break Level as sentence.
  3. Enter your Element Name, which is h1.
  4. Press Enter to add it to your list.
  5. Test the Content Profile:
    • Run a check.

      You'll notice that the example has a very long title in its h1 element.

      You should get a Sentence too long issue.

Before we move on...

Since we're setting up a new XML Content Profile, it's best practice to select Remove Extra Whitespace. This helps Acrolinx to ignore whitespace that your XML editor might add when pretty-printing your content.

Click Advanced and select Remove Extra Whitespace.

The CONTEXT Tab

Much of our Guidance is context-dependent. For example, you want your title guidelines like 'Capitalization' to only ever apply to titles.

The most important ones are the following four contexts:

  • LIST
  • PARAGRAPH
  • TABLE
  • TITLE

When you create a new Content Profile, they’ll already be there but you need to fill them out.

Let’s do that for TITLE.

  1. Select TITLE from the list.
  2. Click + to add a new MAPPING.
  3. Enter h1.
    This tells Acrolinx that the content in ‘h1’ is a title. If you have additional title contexts such as h2, h3, h4, simply add additional mappings.
  4. Test the Content Profile:
    • Run a check.
      The Sentence too long issue should now be a Title too long issue for your title.

Congratulations! You've now successfully configured a Content Profile.