Tokenize
Tokens = Tokenize ( String [ , Identifiers , Strings , Operators , KeepSpace ] )
Split a string into tokens and return them.
Arguments
-
String : the string to split.
-
Identifiers : a string of extra characters allowed in identifier tokens.
-
Strings : an array of strings, each string describing the limits of a string token.
-
Operators : an array of strings, each string representing an operator token.
-
KeepSpace : tell if space tokens are returned.
Return value
The tokens are returned as a string array.
Description
This function is a simple lexical parser that splits a string into tokens and return them as string array
made of following kind of tokens:
-
Space tokens
A space token is made of successive space or tab characters.
-
Newline tokens
A newline token is made of one newline character.
-
Number tokens
A number token is made of successive digit characters.
-
Identifier tokens
An identifier starts with a letter, and is made of any successive letter or digit or extra character specified in the Identifiers argument.
If Identifiers is not specified, only letter and digits are allowed.
-
String tokens
Each string of the Strings array describe the delimiters of a string token.
-
If the description is made of one character, then the initial and final delimiter are that character.
And if two successive delimiter characters are encountered, only one character is kept, and it is not considered as an escape character anymore.
-
If the description is made of two characters, then the first one is the initial delimiter, and the second one the final delimiter.
The final delimiter cannot be escaped.
-
If the description is made of three characters, then the first one is the initial delimiter, and the second one the final delimiter.
The final delimiter can be escaped by using the third character.
If Strings is not specified, then no string token is parsed.
For example: ["\"", "''\\", "[]"]
will parse as token strings everything enclosed by double quotes, single quote, and square brackets.
The strings enclosed by double quotes will allow the "double quoting", those enclosed by single quotes will allow the '
character to be escaped with a backslash character,
whereas those enclosed by square brackets will allow no escape.
-
Operator tokens
The contents of the Operators argument is an array of the different strings that will be parsed as a unique token.
As all characters that are not parsed as a space, newline, number, identifier or string token are returned as an single character token,
the Operators should usually contains only operators made of multiple characters. For example,
<=
, >=
, &&
, and so on.
The tokens are parsed in the order of that description.
So if a token is parsed an identifier, it cannot be parsed as an operator. In other words, if you specify something like
"X->"
in the
Operators argument,
it will never match, as
"X"
will be identified as an identifier first.
As all tokens are returned as strings, you can't really know what the type of token is. But it should not be actually relevant.
Examples
Print Tokenize("Return Subst((\"&1 MiB\"), FormatNumber(Size / 1048576))").Join(" _ ")
Return _ Subst _ ( _ ( _ " _ & _ 1 _ MiB _ " _ ) _ , _ FormatNumber _ ( _ Size _ / _ 1048576 _ ) _ )
Print Tokenize("Return Subst((\"&1 MiB\"), FormatNumber(Size / 1048576))",, ["\""]).Join(" _ ")
Return _ Subst _ ( _ ( _ "&1 MiB" _ ) _ , _ FormatNumber _ ( _ Size _ / _ 1048576 _ ) _ )
See also