Check Websites - Settings and Use Cases

To check web content with the Content Analyzer, you enter the URLs you want to check and adjust a few settings. Here, we describe these settings, and offer some common use cases to show you how to use the Content Analyzer to check your web content.

Settings

SettingSetting Descriptions
Approximate page count

This setting defines approximately how many pages the Content Analyzer imports before stopping the search.

Maximum link depth

The link depth is the distance in links between each URL and a web page.

For example, if you enter 0, the Content Analyzer only checks the specific page that you entered. If you enter 1, the Content Analyzer checks the page that you entered, and any pages that are linked from that page.

If a URL redirects to another URL, the redirection is considered as one level in the link depth. For example, if your starting URL redirects to another URL and you have a depth of 1 configured, the Content Analyzer interprets the redirection as one level and stops following any further links.

Allow pages

You can explicitly define which pages you want to include in the search of the websites you selected. Enter the keywords without a space before or after. Use the pipe symbol | to separate the keywords.

For example, if you want to include all contact and events pages, enter the following:

Example
contact|events

“Allow Pages” works best if the website allows crawling on all directories. Sometimes websites don’t allow crawling on subdomains, so the Content Analyzer can't load the pages.

Deny pages

You can explicitly define which pages you don’t want to include in the search of the websites you selected.

Pages that are denied take precedence over pages that are allowed.

Use Cases

Check an Approximate Number of Pages in a Website

Let’s say you want to review around 20 pages of a website. In this case, follow these steps:

  1. Enter a URL, for example, "https://www.acrolinx.com/", in the URL list box.
  2. Set the Approximate page count to 20.

You can also set the Maximum link depth to a certain number to restrict the search up to a number of links. Note that with Maximum link depth set to 0, the Content Analyzer ignores all links and crawls only the provided URL.

The Approximate page count is a rough number, so the results vary from the number you've entered.

Check a Number of Specific Pages of a Website

Here's what to do if you only want to check specific pages of a website, for example, 5 pages from "https://www.acrolinx.com/":

  1. Enter the 5 URLs in the URL list box.
     
  2. Set the Approximate page count to 1.
    This setting ensures that 1 page per URL is returned.

  3. Set the Maximum link depth to 0.
    This setting prevents the Content Analyzer from following any links on each page.

The Content Analyzer ignores duplicates, dead links (404), or unreachable links without an error message. So, the result list might not reflect the number of URLs you’ve added.

Check Pages from a Website with or Without a Specific Expression in the URL

Let's say you only want to check URLs with a certain expression. For example, you’re only interested in the Acrolinx blog.

The expression should be in the URL and not in the domain name.


To search for these pages, follow these steps:

  1. Enter "https://www.acrolinx.com/" in the URL list box.
  2. Enter "blog" in the Allow pages field.
  3. Set the Approximate page count to (unless you want to restrict the results).

If you want to check a website specifically without the word "blog" in the URL, you can enter "blog" in the Deny pages option. The Content Analyzer loads all pages that don’t have the word "blog" in the URL.

 “Allow Pages” works best if the website allows crawling on all directories. Sometimes websites don’t allow crawling on subdomains, so the Content Analyzer can’t load the pages.