Gambas Documentation
Application Repository
Code Snippets
Compilation & Installation from source code
Components
gb
gb.args
gb.cairo
gb.chart
gb.clipper
gb.complex
gb.compress
gb.crypt
gb.data
gb.db
gb.db.form
gb.db.mysql
gb.db.odbc
gb.db.postgresql
gb.db.sqlite2
gb.db.sqlite3
gb.db2
gb.dbus
gb.dbus.trayicon
gb.debug
gb.desktop
gb.desktop.gnome.keyring
gb.desktop.x11
gb.eval
gb.eval.highlight
gb.form
gb.form.dialog
gb.form.editor
gb.form.htmlview
gb.form.mdi
gb.form.print
gb.form.stock
gb.form.terminal
gb.gmp
gb.gsl
gb.gtk
gb.gtk.opengl
gb.gtk3
gb.gtk3.opengl
gb.gtk3.webview
gb.gui
gb.gui.opengl
gb.gui.qt
gb.gui.qt.ext
gb.gui.qt.opengl
gb.gui.qt.webkit
gb.gui.trayicon
gb.gui.webview
gb.hash
gb.highlight
gb.image
gb.image.effect
gb.image.imlib
gb.image.io
gb.inotify
gb.jit
gb.libxml
gb.logging
gb.map
gb.markdown
gb.media
gb.media.form
gb.memcached
gb.mime
gb.mongodb
gb.mysql
gb.ncurses
gb.net
gb.net.curl
gb.net.pop3
gb.net.smtp
gb.openal
gb.opengl
gb.opengl.glsl
gb.opengl.glu
gb.opengl.sge
gb.openssl
gb.option
gb.pcre
.Regexp.Submatch
.Regexp.Submatches
RegExp
_get
_new
Anchored
BadMagic
BadOption
BadUTF8
BadUTF8Offset
Callout
Caseless
Compile
Count
DollarEndOnly
DotAll
Error
Exec
Extended
Extra
FindAll
Greedy
Match
MatchLimit
MultiLine
NoAutoCapture
NoMatch
NoMemory
NoSubstring
NotBOL
NotEmpty
NotEOL
NoUTF8Check
Null
Offset
Pattern
Replace
Subject
SubMatches
Text
Ungreedy
UnknownNode
UTF8
gb.pdf
gb.poppler
gb.qt4
gb.qt4.ext
gb.qt4.opengl
gb.qt4.webkit
gb.qt4.webview
gb.qt5
gb.qt5.ext
gb.qt5.opengl
gb.qt5.webkit
gb.qt5.webview
gb.qt6
gb.qt6.ext
gb.qt6.opengl
gb.qt6.webview
gb.report
gb.report2
gb.scanner
gb.sdl
gb.sdl.sound
gb.sdl2
gb.sdl2.audio
gb.settings
gb.signal
gb.term
gb.test
gb.util
gb.util.web
gb.v4l
gb.vb
gb.web
gb.web.feed
gb.web.form
gb.web.gui
gb.xml
gb.xml.html
gb.xml.rpc
gb.xml.xslt
Controls pictures
Deprecated components
Developer Documentation
Development Environment Documentation
Documents
Error Messages
Gambas Playground
How To's
Language Index
Language Overviews
Last Changes
Lexicon
README
Search the wiki
To Do
Topics
Tutorials
Wiki License
Wiki Manual

RegExp (gb.pcre)

This class represents a regular expression, with which you can perform matches against various strings and retrieve submatches (those parts of the subject string that match parenthesized expressions).

This class is creatable.

This class acts like a read-only array.

Constants
Anchored   If this compilation is specified, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string that is being searched (the "subject string").
BadMagic   This PCRE error constant is not used in Gambas (yet.)
BadOption   This PCRE error constant is not used in Gambas (yet.)
BadUTF8   This PCRE error constant is not used in Gambas (yet.)
BadUTF8Offset   This PCRE error constant is not used in Gambas (yet.)
Callout   This PCRE error constant is not used in Gambas (yet.)
Caseless   Specifies a case-insensitive match.
DollarEndOnly   Specifies that the dollar ($) expression will not match a newline at the end of the subject string, but only the actual end of the string. By default, a dollar will match the end of the string with or without a newline preceding it.
DotAll   Specifies that the dot (.) regular expression will even match line endings, allowing you to treat a multi-line subject text as one long line.
Extended   Specifies extended regular-expression syntax. In this mode, whitespace and comments are allowed in your regular expressions. Comments are Perl-style, that is, they start with a hash (#) and continue until the next newline.
Extra   This compilation option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use.
Greedy   Constant that allows to remove the ungreedy default option of Replace.
MatchLimit   Specifies the maximum number of matches to return.
MultiLine   Specifies that the subject text is multi-line, so that the caret (^) and dollar ($) modifiers will match the beginnings and endings of lines anywhere within the text.
NoAutoCapture   If this compilation option is specified, it disables the use of numbered capturing parentheses in the pattern.
NoMatch   This PCRE error constant is not used in Gambas (yet.)
NoMemory   This PCRE error constant is not used in Gambas (yet.)
NoSubstring   This PCRE error constant is not used in Gambas (yet.)
NoUTF8Check   When the UTF8 option is set, the validity of the pattern as a UTF-8 string is automatically checked.
NotBOL   With the MultiLine constant specified, indicates that the caret (^) expression should not match the beginning of the string. Without the MultiLine constant, indicates that the caret should never match anything.
NotEOL   With the MultiLine constant specified, indicates that the dollar ($) expression should not match the end of the string. Without the MultiLine constant, indicates that the dollar should never match anything.
NotEmpty   An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pattern, they are tried. If all the alternatives match the empty string, the entire match fails.
Null   This PCRE error constant is not used in Gambas (yet.)
UTF8   This option causes PCRE to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character strings. However, it is available only when PCRE is built to include UTF-8 support. If not, the use of this option provokes an error.
Ungreedy   Inverts the greediness of each quantifier. Every quantifier has a greedy and an ungreedy variant, e.g. the "+" quantifier is greedy and its ungreedy variant is "+?". Setting the Ungreedy flag converts all greedy variants to ungreedy ones and vice versa. It does not make all quantifiers ungreedy.
UnknownNode   This PCRE error constant is not used in Gambas (yet.)

Static methods
FindAll   Return a string array of all sub-strings matching a pattern.
Match   This static method takes the same parameters as the Regexp constructor does. However, is does not create a new object but instead it returns whether the subject matches the pattern.
Replace   Find all occurences of the specified Pattern in the Subject, and return a string where they are all replaced by the Replace string.

Properties
Count   Return the number of submatches produced by the regular expression.
Error   Return the error code raised by the last call to Compile or Exec method.
Offset   Returns the offset of the beginning of the match, i.e. how many characters precede it in the subject string.
Pattern   Return the regular expression pattern.
SubMatches   A submatch is a part of your pattern contained in parentheses. For example, given the pattern "brown (\S+)", the first and only element in the SubMatches collection would be the word following "brown" in the subject text. With a subject of "quick brown fox", SubMatches[0] would contain "fox" and Submatches.Count would be 1.
Subject   Return the regular expression subject.
Text   Contains the text returned by the match. With a subject of "quick brown fox" and a pattern of "brown (\S+)", Text would contain "brown fox".

Methods
Compile   Compile allows you to pre-compile a regular expression for later execution by the Exec method. This is useful when you have one pattern which you want to match against a lot of text, because compiling once and executing many times is much faster than compiling and executing every time.
Exec   Exec lets you execute a previously compiled regular expression against a subject text. This is mainly useful when you have many different subject texts you want to match against, because you can compile a regular expression once and use Exec repeatedly for increased speed.

SubMatches

A submatch is a part of your pattern contained in parentheses.

To access the submatches use the Count property and the array accessor of the RegExp class.

Example

For example, given the regular expression:

brown (\S+)

and subject string:

The quick brown fox slyly jumped over the lazy dog

your Regexp object's Text (or RegEx[0].Text) property would be:

brown fox

and its RegExp[1].Text property would give the text of the first submatch:

fox

the offset in the string is given by the Offset property, 10 in this case.

This is just a simple example of what regular expressions can do for you when parsing textual input; they are a very powerful tool. For example, the following regular expression will extract valid email addresses for you:

(?i)\b[a-z0-9._%\-]+@[a-z0-9._%\-]+\.[A-Z]{2,4}\b

Sample program to parse the output of vmstat -D:
Dim sDiskIO, sVal As String
Dim cVal As New Collection
Dim rMatch As New RegExp

' get disk I/O stats
Exec ["vmstat", "-D"] To sDiskIO
For Each sVal In ["total reads", "read sectors", "writes", "written sectors"]
  rMatch.Compile("^\\s*(\\d+)\\s+" & sVal, RegExp.MultiLine)
  rMatch.Exec(sDiskIO)
  If rMatch.Count = 1 Then
    cVal[Replace(sVal, " ", "_")] = rMatch[1].Text
  Else
    Error.Raise("Missing '" & sVal & "' in 'vmstat -D' output")
  Endif
Next
Print "total reads: " & cVal!total_reads & " read sectors:" & cVal!read_sectors
Print "writes: " & cVal!writes & " written sectors: " & cVal!written_sectors

See also