comp • gb.pcre • regexp

RegExp (gb.pcre)

This class represents a regular expression, with which you can perform matches against various strings and retrieve submatches (those parts of the subject string that match parenthesized expressions).

This class is
Creates a new RegExp object, and optionally compiles a regular expression and matches it against some subject text.
creatable
.
This class acts like a
Returns a submatch from its index.
read-only
array.

Static methods

Constants
This static method takes the same parameters as the Regexp constructor does. However, is doesn't create a new object but instead it returns whether the subject matches the pattern.
Match  
Find all occurences of the specified Pattern in the Subject, and return a string where they are all replaced by the Replace string.
Replace  
If this compilation is specified, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string").
Anchored  
This PCRE error constant is not used in Gambas (yet.)
BadMagic  
This PCRE error constant is not used in Gambas (yet.)
BadOption  
This PCRE error constant is not used in Gambas (yet.)
BadUTF8  
This PCRE error constant is not used in Gambas (yet.)
BadUTF8Offset  
This PCRE error constant is not used in Gambas (yet.)
Callout  
Specifies a case-insensitive match.
Caseless  
Specifies that the dollar ($) expression will not match a newline at the end of the subject string, but only the actual end of the string. By default, a dollar will match the end of the string with or without a newline preceding it.
DollarEndOnly  
Specifies that the dot (.) regular expression will even match line endings, allowing you to treat a multi-line subject text as one long line.
DotAll  
Specifies extended regular-expression syntax. In this mode, whitespace and comments are allowed in your regular expressions. Comments are Perl-style, that is, they start with a hash (#) and continue until the next newline.
Extended  
This compilation option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use.
Extra  
Constant that allows to remove the ungreedy default option of Replace.
Greedy  
Specifies the maximum number of matches to return.
MatchLimit  
Specifies that the subject text is multi-line, so that the caret (^) and dollar ($) modifiers will match the beginnings and endings of lines anywhere within the text.
MultiLine  
If this compilation option is specified, it disables the use of numbered capturing parentheses in the pattern.
NoAutoCapture  
This PCRE error constant is not used in Gambas (yet.)
NoMatch  
This PCRE error constant is not used in Gambas (yet.)
NoMemory  
This PCRE error constant is not used in Gambas (yet.)
NoSubstring  
When the UTF8 option is set, the validity of the pattern as a UTF-8 string is automatically checked.
NoUTF8Check  
With the MultiLine constant specified, indicates that the caret (^) expression should not match the beginning of the string. Without the MultiLine constant, indicates that the caret should never match anything.
NotBOL  
With the MultiLine constant specified, indicates that the dollar ($) expression should not match the end of the string. Without the MultiLine constant, indicates that the dollar should never match anything.
NotEOL  
An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pattern, they are tried. If all the alternatives match the empty string, the entire match fails.
NotEmpty  
This PCRE error constant is not used in Gambas (yet.)
Null  
This option causes PCRE to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character strings. However, it is available only when PCRE is built to include UTF-8 support. If not, the use of this option provokes an error.
UTF8  
Inverts the greediness of each quantifier. Every quantifier has a greedy and an ungreedy variant, e.g. the "+" quantifier is greedy and its ungreedy variant is "+?". Setting the Ungreedy flag converts all greedy variants to ungreedy ones and vice versa. It does not make all quantifiers ungreedy.
Ungreedy  
This PCRE error constant is not used in Gambas (yet.)
UnknownNode  

Properties

Methods
Return the number of submatches produced by the regular expression.
Count  
Return the error code raised by the last call to Compile or Exec method.
Error  
Returns the offset of the beginning of the match, i.e. how many characters precede it in the subject string.
Offset  
Return the regular expression pattern.
Pattern  
A submatch is a part of your pattern contained in parentheses. For example, given the pattern "brown (\S+)", the first and only element in the SubMatches collection would be the word following "brown" in the subject text. With a subject of "quick brown fox", SubMatches[0] would contain "fox" and Submatches.Count would be 1.
SubMatches  
Return the regular expression subject.
Subject  
Contains the text returned by the match. With a subject of "quick brown fox" and a pattern of "brown (\S+)", Text would contain "brown fox".
Text  
Compile allows you to pre-compile a regular expression for later execution by the Exec method. This is useful when you have one pattern which you want to match against a lot of text, because compiling once and executing many times is much faster than compiling and executing every time.
Compile  
Exec lets you execute a previously compiled regular expression against a subject text. This is mainly useful when you have many different subject texts you want to match against, because you can compile a regular expression once and use Exec repeatedly for increased speed.
Exec  

SubMatches

A submatch is a part of your pattern contained in parentheses.

To access the submatches use the Count property and the array accessor of the RegExp class.

Example

For example, given the regular expression:

brown (\S+)

and subject string:

The quick brown fox slyly jumped over the lazy dog

your Regexp object's Text (or RegEx[0].Text) property would be:

brown fox

and its RegExp[1].Text property would give the text of the first submatch:

fox

the offset in the string is given by the Offset property, 10 in this case.

This is just a simple example of what regular expressions can do for you when parsing textual input; they are a very powerful tool. For example, the following regular expression will extract valid email addresses for you:

(?i)\b[a-z0-9._%\-][email protected][a-z0-9._%\-]+\.[A-Z]{2,4}\b

Sample program to parse the output of vmstat -D:

Dim sDiskIO As String
Dim cVal As New Collection
Dim rMatch As New RegExp

' get disk I/O stats
Exec ["vmstat", "-D"] To sDiskIO
For Each sVal In ["total reads", "read sectors", "writes", "written sectors"]
  rMatch.Compile("^\\s*(\\d+)\\s+" & sVal, RegExp.MultiLine)
  rMatch.Exec(sDiskIO)
  If rMatch.Count = 1 Then
    cVal[Replace(sVal, " ", "_")] = rMatch[1].Text
  Else
    Error.Raise("Missing '" & sVal & "' in 'vmstat -D' output")
  Endif
Next
Print "total reads: " & cVal!total_reads & " read sectors:" & cVal!read_sectors
Print "writes: " & cVal!writes & " written sectors: " & cVal!written_sectors

See also