RegExp (gb.pcre)
This class represents a regular expression, with which you can perform matches against various strings and retrieve submatches (those parts of the subject string that match parenthesized expressions).
Constants
Anchored
|
If this compilation is specified, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in
the string that is being searched (the "subject string").
|
BadMagic
|
This PCRE error constant is not used in Gambas (yet.)
|
BadOption
|
This PCRE error constant is not used in Gambas (yet.)
|
BadUTF8
|
This PCRE error constant is not used in Gambas (yet.)
|
BadUTF8Offset
|
This PCRE error constant is not used in Gambas (yet.)
|
Callout
|
This PCRE error constant is not used in Gambas (yet.)
|
Caseless
|
Specifies a case-insensitive match.
|
DollarEndOnly
|
Specifies that the dollar ($) expression will not match a newline at the end of the subject string, but only the actual end of the string. By default, a dollar will match the end of the string with or without a newline preceding it.
|
DotAll
|
Specifies that the dot (.) regular expression will even match line endings, allowing you to treat a multi-line subject text as one long line.
|
Extended
|
Specifies extended regular-expression syntax. In this mode, whitespace and comments are allowed in your regular expressions. Comments are Perl-style, that is, they start with a hash (#) and continue until the next newline.
|
Extra
|
This compilation option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of
very little use.
|
Greedy
|
Constant that allows to remove the ungreedy default option of Replace.
|
MatchLimit
|
Specifies the maximum number of matches to return.
|
MultiLine
|
Specifies that the subject text is multi-line, so that the caret (^) and dollar ($) modifiers will match the beginnings and endings of lines anywhere within the text.
|
NoAutoCapture
|
If this compilation option is specified, it disables the use of numbered capturing parentheses in the pattern.
|
NoMatch
|
This PCRE error constant is not used in Gambas (yet.)
|
NoMemory
|
This PCRE error constant is not used in Gambas (yet.)
|
NoSubstring
|
This PCRE error constant is not used in Gambas (yet.)
|
NoUTF8Check
|
When the UTF8 option is set, the validity of the pattern as a UTF-8 string is automatically checked.
|
NotBOL
|
With the MultiLine constant specified, indicates that the caret (^) expression should not match the beginning of the string. Without the MultiLine constant, indicates that the caret should never match anything.
|
NotEOL
|
With the MultiLine constant specified, indicates that the dollar ($) expression should not match the end of the string. Without the MultiLine constant, indicates that the dollar should never match anything.
|
NotEmpty
|
An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pattern, they are
tried. If all the alternatives match the empty string, the entire match fails.
|
Null
|
This PCRE error constant is not used in Gambas (yet.)
|
UTF8
|
This option causes PCRE to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character
strings. However, it is available only when PCRE is built to include UTF-8 support. If not, the use of this option provokes an error.
|
Ungreedy
|
Inverts the greediness of each quantifier. Every quantifier has a greedy and an ungreedy variant, e.g. the "+" quantifier is greedy and its ungreedy variant is "+?". Setting the Ungreedy flag converts all greedy variants to ungreedy ones and vice versa. It does not make all quantifiers ungreedy.
|
UnknownNode
|
This PCRE error constant is not used in Gambas (yet.)
|
Static methods
FindAll
|
Return a string array of all sub-strings matching a pattern.
|
Match
|
This static method takes the same parameters as the Regexp constructor does. However, is does not create a new object but instead it returns whether the subject matches the pattern.
|
Replace
|
Find all occurences of the specified Pattern in the Subject, and return a string where they are all replaced by the Replace string.
|
Properties
Count
|
Return the number of submatches produced by the regular expression.
|
Error
|
Return the error code raised by the last call to Compile or Exec method.
|
Offset
|
Returns the offset of the beginning of the match, i.e. how many characters precede it in the subject string.
|
Pattern
|
Return the regular expression pattern.
|
SubMatches
|
A submatch is a part of your pattern contained in parentheses. For example, given the pattern "brown (\S+)", the first and only element in the SubMatches collection would be the word following "brown" in the subject text. With a subject of "quick brown fox", SubMatches[0] would contain "fox" and Submatches.Count would be 1.
|
Subject
|
Return the regular expression subject.
|
Text
|
Contains the text returned by the match. With a subject of "quick brown fox" and a pattern of "brown (\S+)", Text would contain "brown fox".
|
Methods
Compile
|
Compile allows you to pre-compile a regular expression for later execution by the Exec method. This is useful when you have one pattern which you want to match against a lot of text, because compiling once and executing many times is much faster than compiling and executing every time.
|
Exec
|
Exec lets you execute a previously compiled regular expression against a subject text. This is mainly useful when you have many different subject texts you want to match against, because you can compile a regular expression once and use Exec repeatedly for increased speed.
|
SubMatches
A submatch is a part of your pattern contained in parentheses.
To access the submatches use the Count property and the array accessor of the RegExp class.
Example
For example, given the regular expression:
and subject string:
The quick brown fox slyly jumped over the lazy dog
your Regexp object's
Text
(or RegEx[0].Text) property would be:
and its
RegExp[1].Text
property would give the text of the first submatch:
the offset in the string is given by the Offset property, 10 in this case.
This is just a simple example of what regular expressions can do for you when parsing textual input; they are a very powerful tool. For example, the following regular expression will extract valid email addresses for you:
(?i)\b[a-z0-9._%\-]+@[a-z0-9._%\-]+\.[A-Z]{2,4}\b
Sample program to parse the output of vmstat -D:
Dim sDiskIO, sVal As String
Dim cVal As New Collection
Dim rMatch As New RegExp
' get disk I/O stats
Exec ["vmstat", "-D"] To sDiskIO
For Each sVal In ["total reads", "read sectors", "writes", "written sectors"]
rMatch.Compile("^\\s*(\\d+)\\s+" & sVal, RegExp.MultiLine)
rMatch.Exec(sDiskIO)
If rMatch.Count = 1 Then
cVal[Replace(sVal, " ", "_")] = rMatch[1].Text
Else
Error.Raise("Missing '" & sVal & "' in 'vmstat -D' output")
Endif
Next
Print "total reads: " & cVal!total_reads & " read sectors:" & cVal!read_sectors
Print "writes: " & cVal!writes & " written sectors: " & cVal!written_sectors
See also