Configure Automatic Data Mapping

Analytics is only useful if your data is clean - especially if you have custom fields. However, it's often difficult to get people to fill out custom fields properly. Instead, you can configure Acrolinx to extract data from your document and automatically populate your custom fields. No human intervention necessary.

For example, suppose that you have custom fields for "product" and "department" that need to be filled out for each document that you check. You can configure Acrolinx, so that whenever someone checks a document, the product and department fields are filled out automatically. This feature only works with structured documents. For example, a suitable document could be an XML document that has "product" and "department" attributes in the header section. In this case, you would write XPath expressions for your custom fields so that Acrolinx knows where to find the relevant data.

To configure automatic data mapping, follow these major steps:

Configure Your Document-Specific Fields

As mentioned in the introduction, you can extract data for custom fields such as "product" and "department" from your documents. This feature is only available for document-level fields because document-level data isn't that useful for any other field types. Before you start collecting data, make sure that your document-level fields are configured to receive this data.

To configure custom fields to receive data from your content, follow these steps:

  1. Navigate to Analytics Administration Custom Fields and select the DOCUMENTS tab.
  2. In the Input Type column, select the option From Content for each relevant field.
    • If the field doesn't exist yet, click ADD FIELD and choose From Content as the Input Type.

Define how you want to get the data

Once you've set up your fields, you can head over to the Content Profiles section for the next step.

  1. Open the Content Profile for your intended document type.

    For example you might want to extract data from HTML meta tags. Let's suppose that you already have a Content Profile for HTML files called "Published HTML". Open that Content Profile and select the DOCUMENT INFORMATION tab. You should see the fields that you configured in the previous step. For example, if you configured the "Product" and "Department" fields, you should now see them on the DOCUMENT INFORMATION tab. 

  2. In the location field, enter an XPath that defines where to find your data.

    For example, suppose that you want to extract the product name from the following meta tag.

    <meta name="Product.Name" content="Widget Detection API">

    In this case, you would enter the following XPath.

    //meta[@name="Product.Name"]/@content

    Your changes take effect immediately.


The following table shows some more examples of XPaths for different file formats.

Document TypeTarget ContentCorresponding XPath
DITA

The topic title in the following code block.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="topic_hdx_w2s_2p">
    <title>git: Distributed and Shared Access to Content</title>
/topic/title
HTML

The document name in the following code block.

<meta name="Product.Version" content="Version 6.2.173">
<meta name="Document.Name" content="Widget Detection API Developer's Guide">
<meta name="Document.Id" content="192893721">
<meta name="Date.Created" content="2019/2/1, 16:08 (GMT)"> 
//meta[@name="Document.Name"]/@content


Check Some Documents and Test the Results

Once you've set up your custom field, it's time to see if it works!

To test your data-mapping configuration, follow these steps:

  1. Check a document that will match the Content Profile.

    For example, if you just edited the "Published HTML" Content Profile, you would check an HTML file published on your website. 

    After the check has finished, open the Scorecard and check that the matching Content Profile is the one that you just updated.


  2. Open the Core Platform Dashboard and navigate to Analytics > Scorecard Archive.

  3. In the Dashboard for the Scorecard Archive, select your field:value combination in the Field filter.

    For example, suppose that you checked a document with the following meta tag

    <meta name="Product.Name" content="My Test Product">

    You want to see if Acrolinx correctly extracted the value "My Test Product. In this case you would look for "Product: My Test Product" in the Field filter.

    The following screenshot shows how the field filter would look when data was successfully extracted from our example meta tag:

    When you click the filter value, you should see the Scorecard for the file you just checked. This means that your data is being extracted correctly. Acrolinx automatically detected the product name from the metadata in the document.

    If you didn't see the results you expected, check the extraction settings in the Content Profile. Acrolinx can't extract data from elements that are excluded or ignored.