Skip to main content
Skip table of contents

Keyword Pattern Matching

Discovery Attender provides options for users to enter expressions which are more complex than a standard keyword search. Patterns can be used to help evaluate keyword, address, file name and folder criteria.

Two types of Pattern Matching - Wildcards and Regular Expressions - are supported by the search engine. In addition, Discovery Attender includes functionality that can pinpoint specific data using commonly found patterns or locating items that do not have file extensions.

Like Patterns (Wildcards)

Whenever you use a wildcard to expand a keyword search term, you are actually using a Like Pattern. Discovery Attender automatically evaluates any word or phrase containing wildcards as a Like Pattern for use in the engine. However, the user does not need to enter exact syntax (i.e. the Like("") portion) into the wizard. For example, use of  *day* is valid as is the equivalent syntax Like("*day*").

The syntax for a Like Pattern is Like("expression") where the expression is the word or phrase, containing wildcards, you wish to evaluate.

NOTE:  when using a multiple word phrase in a search expression, it is required to deploy the LIKE keyword. For example, "stock market*" AND steak will not find the term stock markets. This is because double quotes around a wildcard has the program thinking you need to find the actual wildcard (in this case the asterisk). Rather, to work correctly, the expression should be LIKE("stock markets*") AND steak. This is very important to remember to adjust thesewhen copying terms from another software tool.

Another thing to keep in mind is that wild cards need to be placed correctly after the root of the word you seek. If you are looking for Agency or Agencies, use the wildcard after the 'c', e.g. Agenc* not Agency*. If you want any of those options plus Agent, then use Agen*.

Supported Wildcards

*

Matches none, one or more characters.

?

Matches any single character

#

Matches any digit

[,]

Matches a range or set of characters or numbers

All the wildcards are reserved and will be translated as a pattern by default. If you wish to use one of the wildcards as a literal match, be sure to put it in double quotes, e.g. "# sign" will find a hit in the phrase Press the # sign for more options.

Examples

Expression

Matches

bicycl*

bicycle,  bicycles, bicycling

river?boat*

river boat, river boats, river boating, but NOT riverboat

Version 3.#

Version 3.0, Version 3.1, Version 3.101

 

Regular Expressions (RegEx)

In addition to the standard wildcard support, Discovery Attender also supports the complex structured pattern language of Regular Expressions, also known as RegEx,. Regular Expressions are very helpful when trying to match complex patterns such as account numbers, credit card numbers, social security or national insurance numbers.

Regular Expressions are complex so you should have a certain amount of before using them in a search. Discovery Attender does provide testing tools to help craft your expression. In fact, using the Keyword or Address testers to validate your regular expression is highly recommended before including it in your search.

Keep in mind that the Discovery Attender search engine is comparing an entire field (body, subject, label etc) against the expression, so make sure it is flexible. Many examples in online Regular Expression libraries cater towards matching an entire field, not a section of a document.

Discovery Attender uses the .Net flavor of RegEx. The syntax is: RegEx("expression") where RegEx("") tells the search engine to analyze and return items that match the expression within the quotes.

Examples

Expression

Finds

RegEx("\b(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b")

Social Security numbers  (just the numbers using allocated limits)

RegEx("\d\d\d[\- ]\d\d[\- ]\d\d\d\d")

Social Security numbers (with ### ## #### or ###-##-#### pattern)

RegEx("(cc|credit(\s{0,3}card)?)[\D]{0,60}(\d{4}([\D]?\d{4}){3}([\D]?\d{3})?|\d{4}[\D]?\d{6}[\D]?\d{5}([\D]?\d{4})?)")

Credit Card patterns with several allocated numbers.

Please view the help document or contact support to for more information on how to deploy regular expressions to find Personally Identifiable Information (PII ) or Payment Card Industry (PCI) data.

PATTERN Reserved Word

Discovery Attender also contains a method for finding certain predefined formats such as credit card and social security numbers in text using the PATTERN reserved word. When used as a keyword, these patterns uses a regular expression combined with programmatic testing (including the Luhn algorithm in the case of credit cards) to find matching hits while reducing false positives (but not necessarily eliminating them).

Expression

Finds

PATTERN(SSN)

Social Security Numbers 

PATTERN(CC)

Credit Card numbers

PATTERN(SIN)

Matches Canadian Social Insurance Numbers

EXT(NONE) Reserved Word

Used only with the File Names and Types criteria, the EXT(NONE) reserved word helps locate or exclude files that do not have an extension.

As always, if you have any questions about this topic or any other, do not hesitate to contact support at support@gimmal.com.

First Published June, 2024

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.