Configuring Term Discovery Behavior

Configure Who Can Discover Terms (Classic Only)

The Term Discovery component is configured by the Acrolinx linguistic team, who add Term Discovery guidelines to specific Writing Guides within your guidance package.

A Term Discovery report is always generated when users run checks with a Writing Guide that contains Term Discovery guidelines.

However, you can configure Acrolinx so that you and your users have more control over when Term Discovery reports are generated.

You can configure the following:

    • Allow users to activate or deactivate Term Discovery using the check settings.

      By default, the New terms option is inactive in the check settings.
    • Allow the Acrolinx administrator to control which users can run Term Discovery with the Discover Terms privilege.

      By default, the Discover Terms privilege is ignored by the server.

To allow users to control when they discover terms, follow these steps:

  1. Open your overlay of the relevant language configuration file.

    If you haven’t yet created an overlay of this file, create a new version of the file at the following location:

    %ACROLINX_CONFIGURATION_ROOT%\data\<LANG_ID>\configuration.properties

    If this location doesn’t yet exist, create the required subdirectories first.

  2. If it doesn't already exist, add the following property:

    <WRITING GUIDE>.termharvesting.onlyServerSide=false

  3. Save your changes and reload the language configuration on the relevant language servers.

After the server configuration has reloaded, users who connect to Acrolinx can select the New terms option in the checking options.

If a user runs a check with the new terms option selected (in combination with a writing guide that contains Term Discovery guidelines):

  • discovered terms are marked in the document and included in the Acrolinx Scorecard
  • a separate Term Discovery report is generated.

If a user runs a check without the new terms option selected, a Term Discovery report isn’t generated and discovered terms aren’t included in the Scorecard.

If a user doesn’t have a role with the privilege Discover Terms the New terms option isn’t available.

This property only works in writing guides that are configured for Term Discovery. To run Term Discovery, a Writing Guide must also contain the property: <WRITING GUIDE>.harvestingRules=rules/<LANGUAGE_ID>-harvesting

Additionally, a Term Discovery guideline file (indicated by the extension *.thrul ) must exist in the directory %ACROLINX_CONFIGURATION_ROOT%\data\<LANG_ID>\rules

The Acrolinx linguistic team normally configures Writing Guides for Term Discovery when your guidance package is created.

Configure the Appearance of the Term Discovery CSV Reports

When you run a check to discover terms, Acrolinx generates a Term Discovery report that is available in OLIF and CSV formats. You can configure the CSV encoding and delimiter settings to ensure that the CSV files display correctly on your system.

The CSV and OLIF versions of the Term Discovery report are generated in the server output directory:

<INSTALL_DIR>\server\www\output\TH\<LANG_ID>

The OLIF version of the Term Discovery report contains a link to the CSV version.

The following table contains information on the default CSV settings that are applied when no properties are configured:

PropertyDescriptionDefault Value
termHarvestCsv.encoding File encodingutf-16
termHarvestCsv.elementDelimiter Column delimiter;
termHarvestCsv.recordDelimiter Row delimiter\n (line break)
termHarvestCsv.contextDelimiter Context delimiter.

A context is a sentence where the term was found. A cell can contain multiple contexts.

\n (line break)
termHarvestCsv.textDelimiter Text delimiter"

To ensure that a CSV file is displayed correctly in Excel, it is often advised to use the text import wizard instead of opening the file directly. However, the default encoding UTF-16 is not available as an encoding option in the Excel text import wizard. If you use the default encoding UTF-16, you can ensure that the CSV file displays correctly in Excel by double clicking the file to open it directly.

When opening the CSV version of the Term Discovery report, a byte order mark is displayed as the first character in the first cell of the file. Delete the byte order mark if you intend to import the file into the Terminology Manager or another application.

You can change the default delimiter values to any printable character encoded in UTF-8. To ensure compatibility, do not use control characters or backslashes.

To configure the appearance of the Term Discovery CSV report, follow these steps:

  1. Open your overlay of the core server properties file.

    You find the overlay for the core server properties file in the following location:

    %ACROLINX_CONFIGURATION_ROOT%\server\bin\coreserver.properties

  2. Add one or more the following properties: 

    termHarvestCsv.encoding=<ENCODING_TYPE>
    termHarvestCsv.elementDelimiter=<CHARACTER>
    termHarvestCsv.recordDelimiter=<CHARACTER>
    termHarvestCsv.contextDelimiter=<CHARACTER>
    termHarvestCsv.textDelimiter=<CHARACTER>

    For example, to change the column delimiter to a comma, add the following property:

    termHarvestCsv.elementDelimiter=,
    Restriction: Although the character \n is the default record and context delimiter, you cannot configure \n as a value for the configuration properties. For example, if you have configured an alternative record delimiter and want to return to the default delimiter \n , do not change the value of the record delimiter property to \n . Remove the entire record delimiter property instead.
  3. Save your changes and restart the core server.

POS Abbreviations Used in the Term Discovery CSV Report

The part-of-speech tagger uses the Penn Treebank tag set, which is listed in the following table:

TagDescriptionExample
CCCoordinating conjunction´and, or, but
CDCardinal number28
DTDeterminerthe
EXExistential therethere
FWForeign word
INPreposition or subordinating conjunctionby
JJAdjectivebright
JJRAdjective, comparativesmaller
JJSAdjective, superlativesmallest
LSList item marker
MDModalmust
NNNoun, singular, or masshouse
NNSNoun, pluralhouses
NNPProper noun, singularPeter
NNPSProper noun, plural
PDTPredeterminer
POSPossessive ending's
PRPPersonal pronounshe
PRP$Possessive pronounmy
RBAdverbslowly
RBRAdverb, comparative
RBSAdverb, superlative
RPParticle
SYMSymbol
TOtoto
UHInterjection
VBVerb, base formbe, have, do, specify, write
VBDVerb, past tensewas/were, had, did, specified, wrote
VBGVerb, gerund, or present participlebeing, having, doing, specifying, writing
VBNVerb, past participlebeen, had, done, specified, written
VBPVerb, non-3 rd person singular presentam/are, have, do, specify, write
VBZVerb, 3rd person singular presentis, has, does, writes
WDTWh-determiner
WPWh-pronounwho
WP$Possessive wh-pronounwhose
WRBWh-adverb
$,Comma,
$.Dot, Exclamation Mark, Question Mark, …?
$ :Colon, Semicolon, Ellipsis;
$(Open Parenthesis (, Open Bracket [, Open Curly Brace {(
$)Close Parenthesis ), Close Bracket ], Close Curly Brace })