Use regular expressions in proxy definitions |
A regular expression is a group of letters, numbers, and special characters used to match data. You can use Perl-compatible regular expressions (PCRE) in your Firebox configuration to match certain types of traffic in proxy actions. For example, you can use one regular expression to block connections to some web sites and allow connections to other web sites. You can also deny SMTP connections when the recipient is not a valid email address for your company. For example, if you want to block parts of a web site that violate your company s Internet use policy, you can do this if you use a regular expression in the URL Paths category of the HTTP proxy configuration. General guidelines
How to build a regular expressionThe most simple regular expression is made from the text you want to match. Letters, numbers, and other printable characters all match the same letter, number, or character that you type. A regular expression made from letters and numbers can match only a character sequence that includes all of those letters and numbers in order. Example: fat matches fat, fatuous, and infatuated, as well as many other sequences. NOTE: Fireware accepts any character sequence that includes the regular expression. A regular expression frequently matches more than one sequence. If you use a regular expression as the source for a Deny rule, you can block some network traffic by accident. We recommend that you fully test your regular expressions before you save the configuration to your Firebox. To match different sequences of characters at the same time, you must use a special character. The most common special character is the period (.), which is similar to a wildcard. When you put a period in a regular expression, it matches any character, space, or tab. The period does not match line breaks (\r\n or \n). Example: f..t matches foot, feet, f&#t, f -t, and f\t3t. To match a special character, such as the period, you must add a backslash (\) before the character. If you do not add a backslash to the special character, the rule may not operate correctly. It is not necessary to add a second backslash if the character usually has a backslash, such as \t (tab stop). Example: \$9\.99 matches $9.99 You must add a backslash to each of these special characters to match the real character: ? . * | + $ \ ^ ( ) [ Hexadecimal charactersTo match hexadecimal characters, use \x or %0x%. Hexadecimal characters are not affected by the case-insensitive modifier. Example: \x66 or %0x66% matches f, but cannot match F. RepetitionTo match a variable amount of characters, you must use a repetition modifier. You can apply the modifier to a single character, or a group of characters. There are four types of repetition modifiers:
To apply modifiers to many characters at once, you must make a group. To group a sequence of characters, put parentheses around the sequence. Example: ba(na)* matches ba, bana, banana, and banananananana. Character classesTo match one character from a group, use square brackets instead of parentheses to create a character class. You can apply repetition modifiers to the character class. The order of the characters inside the class does not matter. The only special characters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-). To use a caret in the character class, do not make it the first character. To use a hyphen in the character class, make it the first character. Example: gr[ae]y matches gray and grey. A negated character class matches everything but the specified characters. Type a caret (^) at the beginning of any character class to make it a negated character class. Example: [Qq][^u] matches Qatar, but not question or Iraq. RangesCharacter classes are often used with character ranges to select any letter or number. A range is two letters or numbers, separated by a hyphen (-), that mark the start and finish of a character group. Any character in the range can match. If you add a repetition modifier to a character class, the preceding class is repeated. Some ranges that are used frequently have a shorthand notation. You can use shorthand character classes inside or outside other character classes. A negated shorthand character class matches the opposite of what the shorthand character class matches. The table below includes several common shorthand character classes and their negated values.
AnchorsTo match the beginning or end of a line, you must use an anchor. The caret (^) matches the beginning of a line, and the dollar sign ($) matches the end of a line. Example: ^am.*$ matches ampere if ampere is the only word on the line. It does not match dame. You can use \b to match a word boundary, or \B to match any position that is not a word boundary. There are three kinds of word boundaries:
AlternationYou can use alternation to match a single regular expression out of several possible regular expressions. The alternation operator in a regular expression is the pipe character (|). It is similar to the boolean operator OR . Example: m(oo|a|e)n matches the first occurrence of moon, man, or men. Common regular expressionsMatch the PDF content type (MIME type) ^%PDF- Match any valid IP address (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9] [0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]? [0-9][0-9]?) Match most email addresses [A-Za-z0-9._-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4} |