Cogito, ergo sum

Legacy:UnrealScript Grammar

From Unreal Wiki, The Unreal Engine Documentation Site
Jump to: navigation, search

This is a Wikipedia:EBNF Specification of the UnrealScript grammar.

It can be useful if you are going to write a parser for the UnrealScript language.

Note: this is not the official specification, it's made by visitors of the UnrealWiki.

Important note: the stock UnrealScript compiler doesn't follow strict rules as usually specified by a grammar like this. It is very well possible that the compiler accepts constructions not documented here. See UnrealScript Language Test for actual examples of various constructions.

Edit guidelines:

  • all non-terminals should have all uppercase characters. Keep everything aligned. If you leave things open use '...' to make that clear.
  • Always use as much brackets as needed, don't optimize because this can result in confusion.
  • Terminals that are words can be used directly in the production rules, otherwise you must use a terminal rule.

Non-Terminals[edit]

PROGRAM                  = CLASSDECL 
                           ( DECLARATIONS )*
                           ( REPLICATIONBLOCK )? 
                           BODY 
                           ( DEFAULTPROPERTIESBLOCK )?
 
CLASSDECL                = class IDENTIFIER ( extends PACKAGEIDENTIFIER )? 
                           ( CLASSPARAMS )* SEMICOLON
 
CLASSPARAMS              = CONSTCLASSPARAMS | within PACKAGEIDENTIFIER | 
                           dependson LBRACK PACKAGEIDENTIFIER RBRACK |
                           config ( LBRACK PACKAGEIDENTIFIER RBRACK )? |
                           hidecategories LBRACK IDENTIFIERLIST RBRACK |
                           showcategories LBRACK IDENTIFIERLIST RBRACK                           
 
IDENTIFIER               = ( ALPHA | UNDERSCORE ) ( ALPHA | UNDERSCORE | DIGIT )*
                           // packagename.classname or classname.structname
PACKAGEIDENTIFIER        = ( IDENTIFIER DOT )? IDENTIFIER
QUALIFIEDIDENTIFIER      = ( ( class SQUOTE PACKAGEIDENTIFIER SQUOTE DOT default DOT IDENTIFIER )
                           | ( ( IDENTIFIER DOT )* IDENTIFIER ) 
                           )
IDENTIFIERLIST           = IDENTIFIER ( COMMA IDENTIFIER )*
 
STRINGVAL                = DQUOTE PRINTABLE DQUOTE
INTVAL                   = ( DIGIT+ | ( '0x' ( HEXDIGIT )+ ) )
FLOATVAL                 = ( DIGIT )+ DOT ( DIGIT )*

Declaration parts[edit]

DECLARATIONS             = ( CONSTDECL | VARDECL | ENUMDECL | STRUCTDECL ) SEMICOLON
 
CONSTDECL                = const IDENTIFIER = CONSTVALUE
CONSTVALUE               = ( STRINGVAL | INTVAL | FLOATVAL | BOOLVAL )
 
VARDECL                  = var ( CONFIGGROUP )? ( VARPARAMS )* 
                           VARTYPE VARIDENTIFIER ( COMMA VARIDENTIFIER )*
CONFIGGROUP              = LBRACK ( IDENTIFIER )? RBRACK
VARTYPE                  = PACKAGEIDENTIFIER | ENUMDECL | STRUCTDECL | ARRAYDECL | CLASSTYPE | BASICTYPE
VARIDENTIFIER            = IDENTIFIER
 
ARRAYDECL                = IDENTIFIER LSBRACK INTVAL RSBRACK
DYNARRAYDECL             = array LABRACK (PACKAGEIDENTIFIER | CLASSTYPE | BASICTYPE) RABRACK
CLASSTYPE                = class LABRACK PACKAGEIDENTIFIER RABRACK
 
ENUMDECL                 = enum IDENTIFIER LCBRACK ENUMOPTIONS RCBRACK
ENUMOPTIONS              = IDENTIFIER ( COMMA IDENTIFIER )*
 
STRUCTDECL               = struct ( STRUCTPARAMS )* IDENTIFIER ( extends PACKAGEIDENTIFIER )? 
                           LCBRACK STRUCTBODY RCBRACK
STRUCTPARAMS             = ( native | export )
STRUCTBODY               = ( VARDECL SEMICOLON )+

Replication parts[edit]

REPLICATIONBLOCK         = replication LCBRACK ( REPLICATIONBODY )* RCBRACK
REPLICATIONBODY          = ( reliable | unreliable ) if LBRACK EXPR RBRACK 
                           IDENTIFIER ( COMMA IDENTIFIER )* SEMICOLON

Body parts[edit]

BODY                     = ( STATEDECL | FUNCTIONDECL )*

State parts[edit]

STATEDECL                = ( STATEPARAMS )* state IDENTIFIER ( CONFIGGROUP )? ( extends IDENTIFIER )? STATEBODY
STATEBODY                = LCBRACK ( STATEIGNORE )? ( FUNCTIONDECL )* STATELABELS RCBRACK
STATEIGNORE              = ignores IDENTIFIER ( COMMA IDENTIFIER )* SEMICOLON
STATELABELS              = ( IDENTIFIER COLON ( CODELINE )* )*

Function parts[edit]

                           // operators require an set amouth of arguments
FUNCTIONDECL             = ( NORMALFUNC | OPERATORFUNC )
 
NORMALFUNC               = ( FUNCTIONPARAMS )* FUNCTIONTYPE ( LOCALTYPE )? 
                           IDENTIFIER LBRACK ( FUNCTIONARGS ( COMMA FUNCTIONARGS )* )? RBRACK 
                           FUNCTIONBODY
 
FUNCTIONPARAMS           = CONSTFUNCPARAMS | native ( LBRACK INTVAL RBRACK )?
 
OPERATORFUNC             = ( FUNCTIONPARAMS )* OPERATORTYPE FUNCTIONBODY
OPERATORTYPE             = ( BINARYOPERATOR | UNARYOPERATOR )
                           // requires two arguments
BINARYOPERATOR           = operator LBRACK INTVAL RBRACK PACKAGEIDENTIFIER OPIDENTIFIER 
                           LBRACK FUNCTIONARGS COMMA FUNCTIONARGS RBRACK  
                           // requires one argument
UNARYOPERATOR            = ( preoperator | postoperator ) PACKAGEIDENTIFIER OPIDENTIFIER 
                           LBRACK FUNCTIONARGS RBRACK  
OPIDENTIFIER             = IDENTIFIER | OPERATORNAMES
 
FUNCTIONARGS             = ( optional | out | coerce )? FUNCTIONARGTYPE IDENTIFIER
FUNCTIONARGTYPE          = BASICTYPE | PACKAGEIDENTIFIER
FUNCTIONBODY             = ( SEMICOLON | ( ( LOCALDECL )* ( CODELINE )* ) ( SEMICOLON )? )
LOCALDECL                = local LOCALTYPE IDENTIFIER ( COMMA IDENTIFIER )*
LOCALTYPE                = PACKAGEIDENTIFIER | ARRAYDECL | CLASSTYPE | BASICTYPE

Code parts[edit]

CODELINE                 = ( STATEMENT | ASSIGNMENT | IFTHENELSE | WHILELOOP | DOLOOP 
                           | SWITCHCASE | RETURNFUNC | FOREACHLOOP | FORLOOP )
CODEBLOCK                = ( CODELINE | ( LCBRACK ( CODELINE )* RCBRACK ) )
 
STATEMENT                = FUNCCALL SEMICOLON
ASSIGNMENT               = IDENTIFIER EQUALS EXPR SEMICOLON
IFTHENELSE               = if LBRACK EXPR RBRACK CODEBLOCK ( else CODEBLOCK )?
WHILELOOP                = while LBRACK EXPR RBRACK CODEBLOCK
DOLOOP                   = do CODEBLOCK until LBRACK EXPR RBRACK
 
SWITCHCASE               = switch LBRACK EXPR RBRACK LCBRACK ( CASERULE )+ ( DEFAULTRULE )? RCBRACK
CASERULE                 = case INTVAL COLON CODEBLOCK
DEFAULTRULE              = default CODEBLOCK
 
RETURNFUNC               = return ( EXPR )? SEMICOLON
FOREACHLOOP              = foreach FUNCCALL CODEBLOCK
FORLOOP                  = for LBRACK ASSIGNMENT SEMICOLON EXPR SEMICOLON EXPR RBRACK CODEBLOCK
 
EXPR                     = OPERAND ( OPIDENTIFIER OPERAND )*
OPERAND                  = ( CONSTVALUE | QUALIFIEDIDENTIFIER | FUNCCALL )
FUNCCALL                 = ( ( class SQUOTE PACKAGEIDENTIFIER SQUOTE DOT static DOT ) 
                           | ( ( IDENTIFIER DOT )+ )
                           )? 
                           IDENTIFIER LBRACK ( EXPR ( COMMA EXPR )* )? RBRACK

Defaultproperties[edit]

DEFAULTPROPERTIESBLOCK   = defaultproperties LCBRACK ( DEFPROP )* RCBRACK
DEFPROP                  = DEFPROPIDENTIFIER EQUALS PRINTABLE
DEFPROPIDENTIFIER        = IDENTIFIER ( ( LBRACK INTVAL RBRACK ) | ( LSBRACK INTVAL RSBRACK ) )?

Terminals[edit]

PRINTABLE                = all printable characters
ALPHA                    = 'a' .. 'z'
DIGIT                    = '0' .. '9'
HEXDIGIT                 = DIGIT | 'a' .. 'f'
SEMICOLON                = ';'
COLON                    = ':'
UNDERSCORE               = '_'
LBRACK                   = '('
RBRACK                   = ')'
LABRACK                  = '<'
RABRACK                  = '>'
LCBRACK                  = '{'
RCBRACK                  = '}'
LSBRACK                  = '['
RSBRACK                  = ']'
DOT                      = '.'
COMMA                    = ','
SQUOTE                   = '''
DQUOTE                   = '"'
EQUALS                   = '='
 
CONSTCLASSPARAMS         = abstract | native | nativereplication | safereplace |
                           perobjectconfig | transient | noexport | exportstructs |
                           // available but obsolete:
                           guid(INTVAL,INTVAL,INTVAL,INTVAL)
                           // available from warfare and up:
                           collapsecategories | dontcollapsecategories | placeable |
                           notplaceable | editinlinenew | noteditinlinenew
BOOLVAL                  = true | false
VARPARAMS                = config | const | editconst | export | globalconfig | input |
                           localized | native | private | protected | transient | travel |
                           // available from warfare and up:
                           editinline | deprecated | edfindable | editinlineuse
STATEPARAMS              = auto | simulated
CONSTFUNCPARAMS          = final | iterator | latent | simulated | singular | static |
                           exec | protected | private 
BASICTYPE                = byte | int | float | string | bool | name | class
FUNCTIONTYPE             = function | event | delegate
OPERATORNAMES            = '~' | '!' | '@' | '#' | '$' | '%' | '^' | '&' | '*' | 
                           '-' | '=' | '+' | '|' | '\' | ':' | '<' | '>' | '/' |
                           '?' | '`' |
                           '<<' | '>>' | '!=' | '<=' | '>=' | '++' | '--' | '?-' | '+=' | 
                           '-=' | '*=' | '/=' | '&&' | '||' | '^^' | '==' | '**' |
                           '~=' | '@=' | '>>>'

Notes[edit]

Case[edit]

UnrealScript is case insensitive, so all terminals may be written in any case format. Because of this the uppercase variants for ALPHA and HEXDIGIT are omitted.

Unreal Engine[edit]

This grammar applies to the UnrealEngine 2. Older versions of the Unreal engine have a few differences. Here's a list of changes to this grammar to be applied for older versions.

  • extends can be replaced with expands
  • The ARRAYDECL rule does not apply
  • in the CLASSPARAMS rule the following do not apply:
    • within PACKAGEIDENTIFIER
    • dependson LBRACK PACKAGEIDENTIFIER RBRACK
    • hidecategories LBRACK IDENTIFIERLIST RBRACK
    • showcategories LBRACK IDENTIFIERLIST RBRACK
  • In CONSTCLASSPARAMS nousercreate is allowed to replace notplaceable
  • STRUCTPARAMS does not apply

Related Topics[edit]

Discussion[edit]

El Muerte TDS: As suggested in UnDox Revisited , so hell why not :)

Tarquin: Nice :)

Jerome-X This can be very useful for the parser in the UCEditor plugin. Thanks :)

El Muerte TDS: The only open things are the class, var and function params, for the rest is should be done. So if anyone could verify the stuff I wrote down, I might have missed some things.

El Muerte TDS: done, no more open rules

CaptainNuss: Greetings, just added the local keyword for variable declarations. Btw, why aren't the basic built-in variable types listed in this specification?

Mychaeel: "local" is covered by LOCALDECL already. In VARDECL it's a bug.

CaptainNuss: Oops, I'm sorry. Didn't see that. :(

El Muerte TDS: you're right about the basic types, added them now, also the function return type was incorrect (functions can also return arrays, etc..)

The reson why var and local are diffirent is because inline enum and struct definitions are not allowed in local but are in var.

Wormbo: Is there a (free) program that can check a source code file against an EBNF definition?

El Muerte TDS: not that I know of. But there are programs that create a parser from a EBNF definition (needs some chaning tho): http://catalog.compilertools.net/lexparse.html and one not in that list ANTLR

Tarquin: I've changed:

CONFIGGROUP              = LBRACK IDENTIFIER RBRACK

as the actual IDENTIFIER is optional, right?

El Muerte: uh, yeah. there are a few other things that might also be changed, I've come across a couple of "hacks" that are apperently legal too :( Not to speak of the things Unreal2 allows. Also there are a couple of new UT2004 keywords missing.

Iainmcgin: i changed FUNCTIONARGS so that the type of the parameter can include the basic types (int, float etc). I'm working on a SableCC grammar file for UnrealScript at the moment, so I'll note down other errors in this EBNF as i find them.

sprfreak14: Should comments be included in this EBNF?
Proposal:

COMMENT                    = ( SINGLELINECOMMENT | DELIMITEDCOMMENT )
SINGLELINECOMMENT          = '//' ( NOTNEWLINE )*
NOTNEWLINE                 = Any character except a new line character
DELIMITEDCOMMENT           = '/*' ( DELIMITEDCOMMENTCHARACTERS )* '*/'
DELIMITEDCOMMENTCHARACTERS = ( NOTASTERISK | '*' NOTSLASH )
NOTASTERISK                = Any character except '*'
NOTSLASH                   = Any character except '/'

Sweavo: the usual way (i.e. the way I would do it writing a C parser) to deal with comments is at the Lexing stage, i.e. there is a stage before parsing that recognizes tokens. Comments are reduced to whitespace at that stage. While the comment stuff above looks OK at a glance, the problem is that you then have to put COMMENT all over the place in the grammar to reflect all the valid places for a comment. Pretty much destroys the usefulness of the grammar. But I agree, if this is to be useful to people writing syntax highlighters, comments should be addressed.

sprfreak14: Is DOT default DOT IDENTIFIER in QUALIFIEDIDENTIFIER optional?