Gambas Documentation
Application Repository
Code Snippets
Compilation & Installation
Components
Controls pictures
Deprecated components
Developer Documentation
Development Environment Documentation
Documents
About The Best Formula In The World
Architecture details
Benchmarks
Books
By Reference Argument Passing
Compatibility between versions
Creating And Using Libraries
Database Datatype Mapping
Database Request Quoting
Date & time management
Dates and calendars
DBus and Gambas
Differences Between Shell And Exec
Differences From Visual Basic
Distributions & Operating Systems
Drag & Drop
DrawingArea Internal Behaviour
External functions datatype mapping
Frequently Asked Questions
Gambas Farm Server Protocol
Gambas Mailing List Netiquette
Gambas Markdown Syntax
Gambas Naming Conventions
Gambas Object Model
Gambas Scripting
Gambas Server Pages
Gambas Unit Testing
Gambas Wiki Markup Syntax
Getting Started With Gambas
Hall Of Fame
Image Management In Gambas
Including Help Comments in Source Code
Interpreter limits
Introduction
Just In Time Compiler
Just In Time Compiler (old version)
License
Localisation and Internationalization
Mailing Lists & Forums
Naming Conventions
Network Programming
ODBC Component Documentation
PCRE Pattern Syntax
Porting from Gambas 2 to Gambas 3
Previous News
Project Directory Structure
Release Notes
Reporting a problem, a bug or a crash
Rich Text Syntax
Screenshots
Text highlighting definition file syntax
The Program has stopped unexpectedly by raising signal #11
Variable Naming Convention
WebPage Syntax
Web site home page
What Is Gambas?
Window & Form Management
Window Activation & Deactivation
Window Life Cycle
XML APIs
Error Messages
Gambas Playground
How To's
Language Index
Language Overviews
Last Changes
Lexicon
README
Search the wiki
To Do
Topics
Tutorials
Wiki License
Wiki Manual

PCRE Pattern Syntax

Here is a brief quick reference to the more common patterns you can use in PCRE regular expressions. The most commonly used is ".*", meaning any number of any character. This is equivalent to the wildcard "*" in the shell.

QUOTING - To prevent a character from being interpreted as a pattern meta-character, quote it.

  • \x where x is non-alphanumeric, indicates a literal x

  • \Q...\E treat enclosed characters as literal

CHARACTERS - How to specify characters, non-printable or programmatically.

  • \a alarm, that is, the BEL character (hex 07)

  • \cx "control-x", where x is any character

  • \e escape (hex 1B)

  • \f formfeed (hex 0C)

  • \n newline (hex 0A)

  • \r carriage return (hex 0D)

  • \t tab (hex 09)

  • \ddd character with octal code ddd, or backreference

  • \xhh character with hex code hh
    \\x{hhh..}  character with hex code hhh..
    

CHARACTER TYPES - Match based on type of character.

  • . any character except newline;
    in dotall mode, any character whatsoever
    

  • \C one byte, even in UTF-8 mode (best avoided)

  • \d a decimal digit

  • \D a character that is not a decimal digit

  • \h a horizontal whitespace character (e.g. space, tab, but not newline)

  • \H a character that is not a horizontal whitespace character

  • \p{xx} a character with the xx property (see below)

  • \P{xx} a character without the xx property (see below)

  • \R a newline sequence

  • \s a whitespace character

  • \S a character that is not a whitespace character

  • \v a vertical whitespace character (e.g. newline or CR)

  • \V a character that is not a vertical whitespace character

  • \w a "word" character

  • \W a "non-word" character

  • \X an extended Unicode sequence

    In PCRE, \\d, \\D, \\s, \\S, \\w, and \\W recognize only ASCII characters.
    

GENERAL CATEGORY PROPERTY CODES for use with p and P

  • C Other

  • Cc Control

  • Cf Format

  • Cn Unassigned

  • Co Private use

  • Cs Surrogate

  • L Letter

  • Ll Lower case letter

  • Lm Modifier letter

  • Lo Other letter

  • Lt Title case letter

  • Lu Upper case letter

  • L& Ll, Lu, or Lt

  • M Mark

  • Mc Spacing mark

  • Me Enclosing mark

  • Mn Non-spacing mark

  • N Number

  • Nd Decimal number

  • Nl Letter number

  • No Other number

  • P Punctuation

  • Pc Connector punctuation

  • Pd Dash punctuation

  • Pe Close punctuation

  • Pf Final punctuation

  • Pi Initial punctuation

  • Po Other punctuation

  • Ps Open punctuation

  • S Symbol

  • Sc Currency symbol

  • Sk Modifier symbol

  • Sm Mathematical symbol

  • So Other symbol

  • Z Separator

  • Zl Line separator

  • Zp Paragraph separator

  • Zs Space separator

CHARACTER CLASSES - Match a range or set of characters. For example, "[abc]" would match either a, b or c.

  • [...] positive character class

  • [^...] negative character

  • [x-y] range (can be used for hex characters)

  • [[:xxx:]] positive POSIX named set

  • [[:^xxx:]] negative POSIX named set

POSIX named sets for use in character classes:

  • alnum alphanumeric

  • alpha alphabetic

  • ascii 0-127

  • blank space or tab

  • cntrl control character

  • digit decimal digit

  • graph printing, excluding space

  • lower lower case letter

  • print printing, including space

  • punct printing, excluding alphanumeric

  • space whitespace

  • upper upper case letter

  • word same as w

  • xdigit hexadecimal digit

    In PCRE, POSIX character set names recognize only ASCII characters. You
    can use Q...E inside a character class.
    

QUANTIFIERS - Use this to limit regular expressions to match as much or as little as possible. For example, given the string "The quick brown fox slyly jumped over the lazy dog", the pattern "T.*e" would return "The quick brown fox slyly jumped over the", while the pattern "T.*?e" would simply return "The". Possessive matches are like greedy ones, except that it evaluates all the way to the end of the string, and if there's more to the pattern after it, that part will be unfulfilled and the match will fail. "T.*+e" would not match the above string at all.

  • ? 0 or 1, greedy

  • ?+ 0 or 1, possessive

  • ?? 0 or 1, lazy

  • * 0 or more, greedy

  • *+ 0 or more, possessive

  • *? 0 or more, lazy

  • + 1 or more, greedy

  • ++ 1 or more, possessive

  • +? 1 or more, lazy

  • {n} exactly n

  • {n,m} at least n, no more than m, greedy

  • {n,m}+ at least n, no more than m, possessive

  • {n,m}? at least n, no more than m, lazy

  • {n,} n or more, greedy

  • {n,}+ n or more, possessive

  • {n,}? n or more, lazy

ANCHORS and SIMPLE ASSERTIONS - Match based on the position within the string.

  • \b word boundary

  • \B not a word boundary

  • ^ start of subject
    also after internal newline in multiline mode
    

  • \A start of subject

  • $ end of subject
    also before newline at end of subject
    also before internal newline in multiline mode
    

  • \Z end of subject
    also before newline at end of subject
    

  • \z end of subject

  • \G first matching position in subject

ALTERNATION - Match any of several possible expressions.

  • expr|expr|expr...

CAPTURING - Return submatches.

  • (...) capturing group

  • (?...) named capturing group (Perl)

  • (?'name'...) named capturing group (Perl)

  • (?P...) named capturing group (Python)

  • (?:...) non-capturing group

  • (?|...) non-capturing group; reset group numbers for
    capturing groups in each alternative
    

COMMENT - You shouldn't need this in Gambas but maybe you're writing regular expressions to be used in several different languages.

  • (?#....) comment (not nestable)

OPTION SETTING - You should use the constants found in the RegExp class as arguments to the Compile or Exec methods to make your code more readable, but if you're generating regular expressions at run time, these may be useful.

  • (?i) caseless

  • (?J) allow duplicate names

  • (?m) multiline

  • (?s) single line (dotall)

  • (?U) default ungreedy (lazy)

  • (?x) extended (ignore white space)

  • (?-...) unset option(s)

BACKREFERENCES - Refer to previous submatches in the current match.

  • n reference by number (can be ambiguous)

  • gn reference by number

  • g{n} reference by number

  • g{-n} relative reference by number

For more detailed information about the library, see http://www.regular-expressions.info/pcre.html or http://www.pcre.org. The above quick reference was adapted from the "pcresyntax" man page.