Dokumentaro de Gambaso
Compilation & Installation
Components
gb
gb.crypt
gb.qt4
Documents
Indekso de Lingvo
Language Overviews
LeguMin
Lexicon
Registro

RegExp (gb.pcre)

This class represents a regular expression, with which you can perform matches against various strings and retrieve submatches (those parts of the subject string that match parenthesized expressions).

This class is creatable.

This class acts like a read-only array.

Constants
Anchored   If this compilation is specified, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string").
BadMagic   This PCRE error constant is not used in Gambas (yet.)
BadOption   This PCRE error constant is not used in Gambas (yet.)
BadUTF8   This PCRE error constant is not used in Gambas (yet.)
BadUTF8Offset   This PCRE error constant is not used in Gambas (yet.)
Callout   This PCRE error constant is not used in Gambas (yet.)
Caseless   Specifies a case-insensitive match.
DollarEndOnly   Specifies that the dollar ($) expression will not match a newline at the end of the subject string, but only the actual end of the string. By default, a dollar will match the end of the string with or without a newline preceding it.
DotAll   Specifies that the dot (.) regular expression will even match line endings, allowing you to treat a multi-line subject text as one long line.
Extended   Specifies extended regular-expression syntax. In this mode, whitespace and comments are allowed in your regular expressions. Comments are Perl-style, that is, they start with a hash (#) and continue until the next newline.
Extra   This compilation option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use.
Greedy   Constant that allows to remove the ungreedy default option of Replace.
MatchLimit   Specifies the maximum number of matches to return.
MultiLine   Specifies that the subject text is multi-line, so that the caret (^) and dollar ($) modifiers will match the beginnings and endings of lines anywhere within the text.
NoAutoCapture   If this compilation option is specified, it disables the use of numbered capturing parentheses in the pattern.
NoMatch   This PCRE error constant is not used in Gambas (yet.)
NoMemory   This PCRE error constant is not used in Gambas (yet.)
NoSubstring   This PCRE error constant is not used in Gambas (yet.)
NoUTF8Check   When the UTF8 option is set, the validity of the pattern as a UTF-8 string is automatically checked.
NotBOL   With the MultiLine constant specified, indicates that the caret (^) expression should not match the beginning of the string. Without the MultiLine constant, indicates that the caret should never match anything.
NotEOL   With the MultiLine constant specified, indicates that the dollar ($) expression should not match the end of the string. Without the MultiLine constant, indicates that the dollar should never match anything.
NotEmpty   An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pattern, they are tried. If all the alternatives match the empty string, the entire match fails.
Null   This PCRE error constant is not used in Gambas (yet.)
UTF8   This option causes PCRE to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character strings. However, it is available only when PCRE is built to include UTF-8 support. If not, the use of this option provokes an error.
Ungreedy   Inverts the greediness of each quantifier. Every quantifier has a greedy and an ungreedy variant, e.g. the "+" quantifier is greedy and its ungreedy variant is "+?". Setting the Ungreedy flag converts all greedy variants to ungreedy ones and vice versa. It does not make all quantifiers ungreedy.
UnknownNode   This PCRE error constant is not used in Gambas (yet.)

Static methods
FindAll   Return a string array of all sub-strings matching a pattern.
Match   This static method takes the same parameters as the Regexp constructor does. However, is does not create a new object but instead it returns whether the subject matches the pattern.
Replace   Find all occurences of the specified Pattern in the Subject, and return a string where they are all replaced by the Replace string.

Properties
Count   Return the number of submatches produced by the regular expression.
Error   Return the error code raised by the last call to Compile or Exec method.
Offset   Returns the offset of the beginning of the match, i.e. how many characters precede it in the subject string.
Pattern   Return the regular expression pattern.
SubMatches   A submatch is a part of your pattern contained in parentheses. For example, given the pattern "brown (\S+)", the first and only element in the SubMatches collection would be the word following "brown" in the subject text. With a subject of "quick brown fox", SubMatches[0] would contain "fox" and Submatches.Count would be 1.
Subject   Return the regular expression subject.
Text   Contains the text returned by the match. With a subject of "quick brown fox" and a pattern of "brown (\S+)", Text would contain "brown fox".

Methods
Compile   Compile allows you to pre-compile a regular expression for later execution by the Exec method. This is useful when you have one pattern which you want to match against a lot of text, because compiling once and executing many times is much faster than compiling and executing every time.
Exec   Exec lets you execute a previously compiled regular expression against a subject text. This is mainly useful when you have many different subject texts you want to match against, because you can compile a regular expression once and use Exec repeatedly for increased speed.

SubMatches

A submatch is a part of your pattern contained in parentheses.

To access the submatches use the Count property and the array accessor of the RegExp class.

Example

For example, given the regular expression:

brown (\S+)

and subject string:

The quick brown fox slyly jumped over the lazy dog

your Regexp object's Text (or RegEx[0].Text) property would be:

brown fox

and its RegExp[1].Text property would give the text of the first submatch:

fox

the offset in the string is given by the Offset property, 10 in this case.

This is just a simple example of what regular expressions can do for you when parsing textual input; they are a very powerful tool. For example, the following regular expression will extract valid email addresses for you:

(?i)\b[a-z0-9._%\-]+@[a-z0-9._%\-]+\.[A-Z]{2,4}\b

Sample program to parse the output of vmstat -D:
Dim sDiskIO, sVal As String
Dim cVal As New Collection
Dim rMatch As New RegExp

' get disk I/O stats
Exec ["vmstat", "-D"] To sDiskIO
For Each sVal In ["total reads", "read sectors", "writes", "written sectors"]
  rMatch.Compile("^\\s*(\\d+)\\s+" & sVal, RegExp.MultiLine)
  rMatch.Exec(sDiskIO)
  If rMatch.Count = 1 Then
    cVal[Replace(sVal, " ", "_")] = rMatch[1].Text
  Else
    Error.Raise("Missing '" & sVal & "' in 'vmstat -D' output")
  Endif
Next
Print "total reads: " & cVal!total_reads & " read sectors:" & cVal!read_sectors
Print "writes: " & cVal!writes & " written sectors: " & cVal!written_sectors

See also