Gambas Documentation
Application Repository
Code Snippets
Compilation & Installation from source code
Components
Controls pictures
Deprecated components
Developer Documentation
Development Environment Documentation
Documents
About The Best Formula In The World
Architecture details
Benchmarks
Books
By Reference Argument Passing
Compatibility between versions
Creating And Using Libraries
Database Datatype Mapping
Database Request Quoting
Date & time management
Dates and calendars
DBus and Gambas
Differences Between Shell And Exec
Differences From Visual Basic
Distributions & Operating Systems
Drag & Drop
DrawingArea Internal Behaviour
External functions datatype mapping
Frequently Asked Questions
Gambas Farm Server Protocol
Gambas Mailing List Netiquette
Gambas Markdown Syntax
Gambas Naming Conventions
Gambas Object Model
Gambas Scripting
Gambas Server Pages
Gambas Unit Testing
Gambas Wiki Markup Syntax
Getting Started With Gambas
Hall Of Fame
Housekeeping, cleaning up
Image Management In Gambas
Including Help Comments in Source Code
Installation from binary packages
Interpreter limits
Introduction
Just In Time Compiler
Just In Time Compiler (old version)
License
Localisation and Internationalization
Mailing Lists & Forums
Naming Conventions
Network Programming
ODBC Component Documentation
PCRE Pattern Syntax
Porting from Gambas 2 to Gambas 3
Previous News
Project Directory Structure
Release Notes
Reporting a problem, a bug or a crash
Rich Text Syntax
Screenshots
Text highlighting definition file syntax
The Program has stopped unexpectedly by raising signal #11
Variable Naming Convention
WebPage Syntax
Web site home page
What Is Gambas?
Window & Form Management
Window Activation & Deactivation
Window Life Cycle
XML APIs
Error Messages
Gambas Playground
How To's
Language Index
Language Overviews
Last Changes
Lexicon
README
Search the wiki
To Do
Topics
Tutorials
Wiki License
Wiki Manual

Text highlighting definition file syntax

Since 3.19

The gb.highlight component allows to register new text syntax highlighters based on definition files.

A definition file contains a list of highlighting states.

Each state is associated with:

  • A style name, which tells how to draw a piece of text, according to an highlighting theme. The style name must made of lowercase letters, dot or underscore characters.

  • A list of commands, which tells how to recognize the piece of text that must be drawn with that style.

The highlighting process takes a piece of text (normally a whole line), an initial state, and returns an array that associates one state to each character of the text, and a final state.

Taking an initial state and returning the final state allows the highlighting process to be run incrementally, (i.e. line by line) by a text editor.

Let's take part of the html highlighting definition file as an example (it highlights the contents of HTML files):

$(IDENT)=[a-zA-Z0-9-:]+
doctype{Doctype=Preprocessor}:
  from <!DOCTYPE to >
comment:
  from <!-- to -->
entity{Entity=Function}:
  match /&[A-Za-z]+;/
  match /&#[0-9]+;/
markup{Markup=Keyword}:
  from /<$(IDENT)/ to //?>/
  attribute{Attribute=Datatype}:
    match /$(IDENT)/
  equal{Normal}:
    symbol =
  value{Value=String}:
    from " to "
    from ' to '
    string.entity{Entity}:
      match /&[A-Za-z]+;/
      match /&#[0-9]+;/
  value.unquoted{Value}:
    match /[^"'`=<>\s]+/
markup.close{Markup}:
  match /</$(IDENT)\s*>/

State definition

Each line that ends with a colon : character introduces a new state.

The syntax is the following:

state name [ { style name [ = default style name ] } ] :

  • state name is the name of the state.

  • style name is the name of the associated style.

  • default style name is the name of a style that will be used as a default if style name is not defined in the highlighting theme used when actually rendering the text.

There is a list of hard-coded style names that are defined in all highlighting themes, and that you can use without problem in any definition file.

These common style names are: Normal, Added, Removed, Error, Comment, Documentation, Keyword, Function, Operator, Symbol, Number, String, Datatype, Preprocessor, Escape, and Constant.

If you want to introduce a new style name, it's a good idea to give as default style name a member of that list.

Example

doctype{Doctype=Preprocessor}:

introduces a state named doctype, associated with a style whose name is Doctype. As Doctype is a new style name, we tell the highlighter to use the Preprocessor style if the Doctype style is not explicitly defined in the highlighter theme.

If no style name is defined, the state will be associated with a style having the same name.

Example

comment:

introduces the comment state which will be associated with the Comment style (case is not important).

Each state is checked independently, but as the definition file is read from top to bottom, the first states have higher priority than the last ones.

Moreover, if no state matches the current character, the normal state applies. In that case, space, tab and newline characters are automatically ignored, i.e. highlighted with the normal state.

State commands

The definition of a state is followed by one on several commands that defines which text must be associated with that state.

These commands must be indented with at least one space.

Here is the possible commands:

match pattern Apply the state to the matching pattern.
word word #1 word #2 ... word #n Apply the state to any of the following words. Matching is case sensitive.
keyword word #1 word #2 ... word #n Apply the state to any of the following words, and add these words to a list of keywords associated with the highlighter. Matching is case sensitive.
symbol symbol #1 symbol #2 ... symbol #n Apply the state to any of the following symbols.
from start pattern to end pattern Apply the state to the text between the start pattern and the end pattern, the patterns included.
from start pattern Apply the state to the text between the start pattern and the end of the line, the start pattern included.
from here to end pattern Apply the state from the current character up to the end pattern, the end pattern included.
from here Apply the state from the current character up to the end of the line.
between start pattern and end pattern Apply the state to the text between the start pattern and the end pattern, the patterns excluded.
between start pattern Apply the state to the text between the start pattern and the end of the line, the start pattern excluded.
between here and end pattern Apply the state from the current character up to the end pattern, the end pattern excluded.

A pattern can be:
  • A plain string, without spaces between quotes or not.

    Using quotes allows to use escaped control characters: "\n" for a newline, "\t" for a tabulation, "\\" for a slash...

  • Or a Perl-compatible regular expression between slashes /, handled by the gb.pcre component.

    For more information about regular expressions, see the PCRE pattern syntax page.

Example

comment:
  from <!-- to -->

tells that the comment state will be applied to all text between the <!-- and --> strings included.

entity{Entity=Function}:
  match /&[A-Za-z]+;/
  match /&#[0-9]+;/

tells that the entity state will be applied to each text matching the &[A-Za-z]+; or the &#[0-9]+; regular expression.

Recursive states

It is possible to nest states. The effect of nested states depends on the command.

  • For the from and between commands, nested states are applied for the part of text inside the start and end patterns specified in the command arguments.

  • For the match, word, keyword or symbol commands, the nested states are applied to the text following the matched text.

Example

markup{Markup=Keyword}:
  from /<$(IDENT)/ to //?>/
  attribute{Attribute=Datatype}:
    match /$(IDENT)/
  equal{Normal}:
    symbol =
  value{Value=String}:
    from " to "
    from ' to '
    string.entity{Entity}:
      match /&[A-Za-z]+;/
      match /&#[0-9]+;/
  value.unquoted{Value}:
    match /[^"'`=<>\s]+/

The markup state is applied from the <$(IDENT) up to the /?> regular expressions.

Note: $(IDENT) is not actually a regular expression pattern, but a preprocessor variable defined at the top of the definition file. See below.

attribute, equal, value and value.unquoted are states that will apply only inside the markup state, i.e. between the start and end patterns defined by the from command.

In other words, all these nested states allows define a specific highlighting process that occurs only inside HTML markups.

Special commands

This is a command that matches no text, but define some properties in association with the current state.

There is only one special command, at the moment:

limit Matching that state set the "limit" flag indicating a new section of the text.

For example, the Gambas text editor uses that flag for delimiting collapsible sections in the edited text.

Variables

Variables are text surrounded by $( and ). They have a value, usually defined at the beginning of the definition file.

The syntax of a variable definition is the following:

$(variable name) = value

Example:
$(IDENT)=[a-zA-Z0-9-:]+

Every occurrence of the variable is replaced by its value.

So, in the example, every occurrence of $(IDENT) in the definition file will be replaced by [a-zA-Z0-9-:]+.

It is the right way to centralize the definition of your regular expression patterns.

Preprocessor commands

Definition files support some rudimentary preprocessing commands. These preprocessing commands are lines beginning with the @ character.

Command Description
@include file name Include another definition file inside the current one. The included file must be located in the same directory as the current file, so only the file name is specified.
@define name Define a preprocessor flag named name.
@if name
...
@endif
Process the part of the definition file between @if name and @endif only if the name has been defined with the @define command.
@word regular expression Define which regular expression the word and keyword commands will use to match a word. The regular expression must be specified between / characters.

By default, a word is matched by the /[A-Za-z_][A-Za-z0-9_]*/ regular expression.