Define How You Want Acrolinx to Recognize Your Content

Content Reference or Content Fingerprint?

Content Recognition lets you decide how you want Acrolinx to recognize different versions of your content. This gives you more exact and useful analytics.

There are two ways Acrolinx can recognize your content. The approach you choose will depend on your organization's workflows and content types.

  • The Content Reference lets you define exactly how Acrolinx should recognize your content. For example, you could use the document file path or some content-specific metadata. This option gives you fine-grained control since you can define a different Content Reference for each content type via Content Profiles. Use the Content Reference if you know that a document's file path won't change, or that your content includes some stable and unique metadata.
  • The Content Fingerprint lets Acrolinx recognize your content for you. Acrolinx looks at the written content itself, and doesn't rely on file paths or metadata. This is a global option that applies across all your content types. For this reason, it's great for workflows where you copy content between different file types. It also works well for workflows where files might move and metadata might change over time.

By default, Acrolinx uses the Content Reference method to recognize your content, and uses your content's file path as its reference. The rest of this article will walk you through how and why you can modify this behavior.

Content Reference

In this method, Acrolinx uses the file path as the unique reference for your content by defailt. Basically, what you see as the "Source" in a Scorecard is the default reference. 

If you're happy with this, then you don't need to change anything. But maybe you have a better way of recognizing your content. 

Why would you need a better way? Well, consider the following scenario.

The Problem

You're working on a Markdown file called "introduction-to-demo-inc-greeblies.md". You've moved it to a few different locations and even changed your mind about the file name just before you published it.

Your checking statistics might evolve as follows:

Check TimeFile PathContent ReferenceScoreImprovement
First version/doc/drafts/introduction-to-demo-inc-greeblies.md"/doc/drafts/introduction-to-demo-inc-greeblies.md"65No Previous Data
Revised version/doc/demo-inc/topics/introduction-to-demo-inc-greeblies.md"/doc/demo-inc/topics/introduction-to-demo-inc-greeblies.md"78No Previous Data
Final version/doc/demo-inc/topics/overview-of-demo-inc-greeblies.md"/doc/acme/topics/overview-of-demo-inc-greeblies.md"84No Previous Data

The last column shows "No Previous Data" because Acrolinx treats each version as a separate file rather than different iterations of the same file.

The Solution

To get around this problem, you can tell Acrolinx to use a different reference. As long as you have an attribute that's stable across content versions, Acrolinx will recognize that content.

You can also take the content reference directly from the document as long as you can reach it with an XPath.

For example, the front matter in your Markdown file might have a "slug" parameter. You've set the slug to "greeblies-intro" and it doesn't change between the different versions. You can configure Acrolinx to use the value for the "slug" parameter as the content reference. 

Your checking statistics would now evolve as follows:

Check TimeFile PathContent ReferenceScoreImprovement
First version/doc/drafts/introduction-to-demo-inc-greeblies.md"greeblies-intro" 65No Previous Data
Revised version/doc/acme-widgets/topics/introduction-to-demo-inc-greeblies.md"greeblies-intro" 78+13
Final version/doc/acme-widgets/topics/overview-of-demo-inc-greeblies.md"greeblies-intro" 84+6

The "Improvement" column shows progress because all scores are attributed to the "greeblies-intro" content reference.

To configure the Content Reference, follow these steps:

  1. In the Dashboard, navigate to Guidance Settings > Content Profiles, lick on the relevant Content Profile. For our Markdown example above, we'd pick the Markdown Content Profile.
  2. Navigate to DETAILS > Content Reference.
  3. Select Use part of the content as content reference, and enter the XPath for your stable attribute.
    1. For example, your XPath could look like this if you use the "slug" parameter in Markdown front matter:
    2. For example, your XPath could look like this if you use the html title tag <title> from your content as content reference:
      Example html tag title as reference


To test if your content reference is working, check a relevant file, and go to the Scorecard Archive dashboard. In the Document column, you should now see the content reference that you defined instead of the file path.

Content Fingerprint

If you have a reliable, stable attribute that applies to all iterations of each piece of content, the Content Reference is a useful solution.

You might have a workflow where an article starts its life in a word processor like Microsoft Word. Then, it's copied into an XML editor like XMetaL Author and, finally, it makes its way into a CMS like Adobe Experience Manager. You want to see how each step in that process influences your content alignment. But, those different versions of your content don't share a file path or any other unique metadata attribute. The content itself is the only thing that links those versions of your article.

Enter the Content Fingerprint. With this method, Acrolinx groups the content that you check based on the similarity of the text. If your content moves or the format changes, Acrolinx will recognize the similarity in the substance of each article. It'll then treat them as different versions of the same article. In cases where there isn't enough content for Acrolinx to make a decision, it'll revert to the default Content Reference method.

When you switch between these recognition methods, it fundamentally changes the way Acrolinx classifies your content for Analytics. This can make it difficult to compare data from before and after the switch. To get the most valuable Analytics, we recommend that you don't switch between these methods regularly.

Because the Content Fingerprint method automatically classifies your content for you, similar sets of articles can potentially result in false positives. Before you decide to use this method, consider the nature of your content and decide if it’s the best option for you.

To set a Content Fingerprint for all of your content, follow these steps:

  1. In the Dashboard, navigate to Analytics > Administration > Content Recognition.
  2. Select Content Fingerprint.

Once you select Content Fingerprint as the method to recognize your content, this will apply to all future checks. It doesn't apply to existing check data.