Content Profiles Reference Guide

Content Profiles let you configure how Acrolinx reads your content. You'll find Content Profiles under Guidance Settings in the Dashboard. In this guide, we'll go through each tab of the Content Profiles feature in the Dashboard and talk about what the fields are.

If you’re using one of the Acrolinx Sidebar integrations, you’ll notice that the Filter and Segmentation options that are available in the Classic edition aren’t there. This is because you can now define how Acrolinx reads your content on the Core Platform so that your writers don’t have to worry about it.

All the changes that you make in your Content Profile are effective immediately.

Content Profile

On the landing page, you'll see a list of all your Content Profiles. Here you can upload or add a new one.

Content Profile name.

Content Profile shortcut menu.

Default system profile. This means that you can't make any changes to this Content Profile, but you can make a copy of it or download it.

If you have 2 identical Content Profiles, Acrolinx uses the one further up the list.

General

NameGive your Content Profile a descriptive name. You can include the content type, for example, HTML or the integration, for example, PowerPoint.
DescriptionInclude additional information that might be important or relevant about this Content Profile.
Check all content confidentiallyPrevent any text or statistics from being written to the Analytics database. For more information about this feature, see the article Protect Confidential Content.

Criteria

Define how to tell Acrolinx when it should apply this Content Profile to your content.

Source Reference
Field type: Regular Expression

The location of the file or files that the Content Profile applies to.

You might use this property when there's no other data available to identify the file type.

A file path or a path pattern entered as a regular expression.

For example, to apply the Content Profile to all DITA files in the "user-guides" directory, you would enter the property as follows:

.*/user-guides/.*.dita

If you had multiple directories such as "user-guides" and "faqs", you would separate them with the OR operator "|" like this:

.*/user-guides/.*.dita|.*/faqs/.*.dita

If you want to use more complex regular expressions, there are a few things you should look out for. See our detailed section on regular expressions in matching criteria.

TypeChoose your content type from the dropdown.
Public Id
Field type: String

The public ID of an XML document type.

For example, to assign settings to DITA concepts, you would enter the public ID as follows:

-//OASIS//DTD DITA Concept//EN<
System Id
Field type: String

The system ID of an XML document type.

For example, to assign settings to DITA concepts, you would enter the system ID as follows:

concept.dtd
Schema
Field type: String

The name of an XML document schema.

For example, to assign settings to XML files that use the schema "notes", you would enter the schema name as follows:

notes.xsd
Root Element
Field type: String

The root element of an XML or HTML document.

You might use this for simple XML documents that don't have a document type definition or schema.

For example, to assign settings to XML files that begin with the element <product> , you might enter the root element as follows:

product

The root element for HTML documents is always html.

 Advanced
Integration
Field type: String

The signature that an Acrolinx Integration sends when it authenticates with an Acrolinx Core Platform.

You might use this identifier for customized Acrolinx Integrations such as CMS integrations that don't send an integration short name.

Only use this matching criterion if no other matching criteria work for you.

Matching on the signature means that the Content Profile will only match when writers check from a certain editor. It can be useful if the host editor is the only way of determining the document type.

For example, to assign settings to an Acrolinx Integration that uses the signature "xmF0YZzgQ233lY2tlcg", you would enter the integration as follows:

xmF0YZzgQ233lY2tlcg
Language
Field type: String

The language of the check as indicated by the Acrolinx Integration.

You can use this property to change the extraction settings based on the language of the content.

One scenario might be that you're checking a multilingual XML format such as TMX. You would want to include the parts of the file that match the checking language.

For example, to target the German segments in a TMX file, you would enter the language as follows:

de
Writing Guide/Checking Profile
Field type: String

Enter the name of a Writing Guide or Checking Profile.

You could use this criteria if you want a Content Profile to apply to a specific Writing Guide or Checking Profile.

If you have a Writing Guide and a Checking Profile with the same name, this option will select the Checking Profile.

Extraction

Starting ElementThe Starting Element set to ‘include’ works for most cases. This means that Acrolinx will read everything.
Default Break Level

Define how the text should be broken up, such as sentence break or token break elements.

XML processing instructions have their break level set to none by default. You can always set this to a different value if you need to.

Mark excluded elementsInsert placeholders for excluded elements when processing the text.

This property prevents excluded elements causing false issues.

 Advanced
Remove Extra Whitespace

This option is only present for XML Content Profiles.

Replace sequences of whitespace with a single whitespace character.

This is useful if your XML editor sends content to Acrolinx in a pretty-printed format.

If you're setting up a new XML Content Profile, you should generally select this option.

Allowed External DTDs
Field type: Regular Expression

Define a whitelist of external DTDs.

You need this setting if your Content Profile applies to files that reference external DTDs.

For example, if your Content Profile applies to DocBook articles, and those articles reference the DTD "docbookx.dtd", you need to whitelist that DTD. Otherwise, Acrolinx will ignore it.

For example, to whitelist all DTDs in the "dita" directory, you would enter the property as follows:

.*/dita/.*.dtd

If you had multiple directories such as "dita" and "docbook", you would separate them with the OR operator "  " like this:

.*/dita/.*.dtd|.*/docbook/.*.dtd
Entity Conversion Map
Field type: String

Tell Acrolinx how you've defined your own entities in your DTDs.

Enter your entities in the format: entity = value separated by a new line. For example: copyright=©

Acrolinx replaces the entity with the value. So, for the above example, Acrolinx replaces all instances of '&copyright' with '©'.

You can enter Unicode characters two ways:

  • As they are, copied from elsewhere. For example, you can use © as value, like the example above.
  • Use the escape notation like '\uxxxx', where 'xxxx' is the hexadecimal Unicode codepoint.  For example, you could write copyright=\u00a9, since U+00A9 is the Unicode codepoint for ©.

Extraction Settings

Element Name

Name of the element without any syntax. For example, if your element is <comment>, then enter the following as the Element Name:

Example
comment

You can also match an element and its attributes. Use the following format:

elementName attribute=value

For example, if you have the following element: <paragraph internal="true" title="Title of this paragraph"> </paragraph>, then your Element Name could be:

Example
paragraph internal=true

Use wildcards to match multiple elements or attribute values at once.

For example, the following Element Name matches all elements that include the attribute internal="true":

Example
* internal=true

Filter Mode

include

Always check this content regardless of whether it's nested inside an excluded element.

exclude

Never check this content.

empty

Never check this content but include it in the grammatical structure of the sentence.

For example, you might have text that refers to user interface labels. Take the following sentence.

The options Use slithy toves and Gyre and gimble in the wabe will cause a Jabberwocky to appear."

You could exclude the bold text to stop Acrolinx from checking the option names, but then Acrolinx would read the sentence like this:

The options and will cause a Jabberwocky to appear.

To Acrolinx, this sentence would look like a grammar issue. However, if you were to set the bold text as "empty", Acrolinx would read the sentence like this:

The options (empty) and (empty) will cause a Jabberwocky to appear.

In this last example, the grammatical structure is retained and Acrolinx will let it be.

inherited

Use the Filter Mode of the parent element.

Break Level

sentence

If you have elements that contain sentences that don't end with a period, define them as sentence-break elements. This means that the end of the element should always be treated as a sentence end.

token

You might define elements as token breaks if you have words that aren't separated by a space. This setting defines the elements that should cause a token break. Adding a token break is basically adding a boundary between words.

none

Acrolinx doesn't add any break at all.

defaultBreak

Use the Default Break Level defined above.

Attributes to Extract

Attributes in your elements that you want Acrolinx to read.

For example, if you had a paragraph element that has the attributes title and subtitle.

<paragraph internal="true" title="Title of this paragraph" 
subtitle="This is the subtitle of this paragraph"> </paragraph>

If you want Acrolinx to read the title and subtitle, then add 'title' and 'subtitle' to your Attributes to Extract.

Parenthetic

Treat the element as separate sentence, even if it's embedded in another sentence.

A typical example is a footnote that's embedded in the middle of a sentence like this:

<p>This is the <footnote>This is a second
 sentence.</footnote> first sentence.</p>

This situation is kind of rare, but it's good to know about it anyway. You know, just in case.

Context

Much of Guidance is context-dependent. Map your elements to the contexts in your Guidance.

Select a context from the list or click + to add a new context.

Enter the name of the context. You can also change the name of a context.

The mapping field expects an XPath. List the elements that you want to map to your context. Click + to add a new mapping.

Issue Location

Configure issue location to help your users find the issue that they're looking for in the Sidebar, for example, if the issue is in a List or a Table. Here's how it looks in the Sidebar:




Select an issue location from the list or click + to add a new issue location.

Issue location ID.

XPath

Define the location with XPath.

The localization is what appears in the Sidebar that shows the name of the location of the issue.


Document Information

The section lists any custom fields that you've configured to read data from your content. You can then use XPaths to define the precise locations of each data point. This feature only works for structured formats such as XML, HTML, or Markdown. For more information about this feature, see the article Configure Automatic Data Mapping.

Custom FieldThe name of the custom document-level field that is configured to read data from your content.
LocationAn XPath to the precise element, which contains the data to receive.