Regexp (gb.pcre)

This class represents a regular expression, with which you can perform matches against various strings and retrieve submatches (those parts of the subject string that match parenthesized expressions).

This class is creatable.

This class acts like a read-only array.

Constants

Anchored	If this compilation is specified, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string").
BadMagic	This PCRE error constant is not used in Gambas (yet.)
BadOption	This PCRE error constant is not used in Gambas (yet.)
BadUTF8	This PCRE error constant is not used in Gambas (yet.)
BadUTF8Offset	This PCRE error constant is not used in Gambas (yet.)
Callout	This PCRE error constant is not used in Gambas (yet.)
Caseless	Specifies a case-insensitive match.
DollarEndOnly	Specifies that the dollar ($) expression will not match a newline at the end of the subject string, but only the actual end of the string. By default, a dollar will match the end of the string with or without a newline preceding it.
DotAll	Specifies that the dot (.) regular expression will even match line endings, allowing you to treat a multi-line subject text as one long line.
Extended	Specifies extended regular-expression syntax. In this mode, whitespace and comments are allowed in your regular expressions. Comments are Perl-style, that is, they start with a hash (#) and continue until the next newline.
Extra	This compilation option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use.
Greedy	Constant that allows to remove the ungreedy default option of Replace.
MatchLimit	Specifies the maximum number of matches to return.
MultiLine	Specifies that the subject text is multi-line, so that the caret (^) and dollar ($) modifiers will match the beginnings and endings of lines anywhere within the text.
NoAutoCapture	If this compilation option is specified, it disables the use of numbered capturing parentheses in the pattern.
NoMatch	This PCRE error constant is not used in Gambas (yet.)
NoMemory	This PCRE error constant is not used in Gambas (yet.)
NoSubstring	This PCRE error constant is not used in Gambas (yet.)
NoUTF8Check	When the UTF8 option is set, the validity of the pattern as a UTF-8 string is automatically checked.
NotBOL	With the MultiLine constant specified, indicates that the caret (^) expression should not match the beginning of the string. Without the MultiLine constant, indicates that the caret should never match anything.
NotEOL	With the MultiLine constant specified, indicates that the dollar ($) expression should not match the end of the string. Without the MultiLine constant, indicates that the dollar should never match anything.
NotEmpty	An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pattern, they are tried. If all the alternatives match the empty string, the entire match fails.
Null	This PCRE error constant is not used in Gambas (yet.)
UTF8	This option causes PCRE to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character strings. However, it is available only when PCRE is built to include UTF-8 support. If not, the use of this option provokes an error.
Ungreedy	Inverts the greediness of each quantifier. Every quantifier has a greedy and an ungreedy variant, e.g. the "+" quantifier is greedy and its ungreedy variant is "+?". Setting the Ungreedy flag converts all greedy variants to ungreedy ones and vice versa. It does not make all quantifiers ungreedy.
UnknownNode	This PCRE error constant is not used in Gambas (yet.)

Enumerations

RegexpCompileOption
RegexpError
RegexpExecOption

Static methods

FindAll	Since 3.19 Return a string array of all sub-strings matching a pattern.
Match	Since 3.5 This static method takes the same parameters as the Regexp constructor does. However, is does not create a new object but instead it returns whether the subject matches the pattern.
Replace	Since 3.5 Find all occurences of the specified Pattern in the Subject, and return a string where they are all replaced by the Replace string.

Properties

Count	Return the number of submatches produced by the regular expression.
Error	Return the error code raised by the last call to Compile or Exec method.
Offset	Returns the offset of the beginning of the match, i.e. how many characters precede it in the subject string.
Pattern	Return the regular expression pattern.
SubMatches	A submatch is a part of your pattern contained in parentheses. For example, given the pattern "brown (\S+)", the first and only element in the SubMatches collection would be the word following "brown" in the subject text. With a subject of "quick brown fox", SubMatches[0] would contain "fox" and Submatches.Count would be 1.
Subject	Return the regular expression subject.
Text	Contains the text returned by the match. With a subject of "quick brown fox" and a pattern of "brown (\S+)", Text would contain "brown fox".

Methods

Compile	Compile allows you to pre-compile a regular expression for later execution by the Exec method. This is useful when you have one pattern which you want to match against a lot of text, because compiling once and executing many times is much faster than compiling and executing every time.
Exec	Exec lets you execute a previously compiled regular expression against a subject text. This is mainly useful when you have many different subject texts you want to match against, because you can compile a regular expression once and use Exec repeatedly for increased speed.

SubMatches

A submatch is a part of your pattern contained in parentheses.

To access the submatches use the Count property and the array accessor of the RegExp class.

Example

For example, given the regular expression:

brown (\S+)

and subject string:

The quick brown fox slyly jumped over the lazy dog

your Regexp object's Text (or RegEx[0].Text) property would be:

brown fox

and its RegExp[1].Text property would give the text of the first submatch:

fox

the offset in the string is given by the Offset property, 10 in this case.

This is just a simple example of what regular expressions can do for you when parsing textual input; they are a very powerful tool. For example, the following regular expression will extract valid email addresses for you:

(?i)\b[a-z0-9._%\-]+@[a-z0-9._%\-]+\.[A-Z]{2,4}\b

Sample program to parse the output of vmstat -D:

Dim sDiskIO, sVal As String
Dim cVal As New Collection
Dim rMatch As New RegExp

' get disk I/O stats
Exec ["vmstat", "-D"] To sDiskIO
For Each sVal In ["total reads", "read sectors", "writes", "written sectors"]
  rMatch.Compile("^\\s*(\\d+)\\s+" & sVal, RegExp.MultiLine)
  rMatch.Exec(sDiskIO)
  If rMatch.Count = 1 Then
    cVal[Replace(sVal, " ", "_")] = rMatch[1].Text
  Else
    Error.Raise("Missing '" & sVal & "' in 'vmstat -D' output")
  Endif
Next
Print "total reads: " & cVal!total_reads & " read sectors:" & cVal!read_sectors
Print "writes: " & cVal!writes & " written sectors: " & cVal!written_sectors

Regexp (gb.pcre)

SubMatches

Example

See also