The goal of the layout system is to provide programmer convenience by automatically inserting curly braces and semicolons wherever they have not been inserted by the programmer. Left curly braces are conditionally inserted after certain preceding tokens. Semicolons are conditionally inserted based on the indentation level of each line. Right curly braces are conditionally inserted before certain tokens and also based on the indentation level of the curren tline.
The ``trick'' to the layout scheme is in two parts:
There are several sequencing constructs in the language whose general form is a semicolon-separated sequence of things surrounded by semicolons. This lets us use the same layout rules for multiple purposes.
At each point where a left curly brace might be automatically inserted, a preceding token (one of let, do, or ``='') signals unambiguously that a left curly brace must follow:
def f x y = { ...
def x = { 5 }
struct S 'a 'b = { a : int; ...
let { x = { 5 } } in { ...
while (expr) do { ...
do { ... } while (expr) We rely on the fact that (a) knowing the preceding and current token is enough to know whether to insert a curly brace (b) because blocks are values, all binding expressions can safely be wrapped in blocks, (c) because such wrapping is always safe, it can be done in a way that is invisible to the programmer In fact, the parser insists that this be true. While it is nearly universal practice in Bitc to write code without semicolons in many places, most notably:
let x = 5 in bodyWhat the compiler actually sees is:
let { x = { 5 } } in body
The term offset, as used here, is defined as the number of preceding UCS4 code points that occur to the left of the token on the same line. The first code point on a line is deemed to have offset zero. Offsets are determined without regard to comments. That is, in:
a b c /@
@/ x
the token x appears at offset 5 and is the first token on its line. Here, @ is used for illustration instead of * because doxygen has no means to incorporate embedded comments.
The term layout item sequence is a sequence of layout items that are separated by semicolons and surrounded (bracketed) by curly braces. An opening '{' signals the beginning of a layout item sequence.
A left curly brace will be automatically inserted after the keywords let, do, and ``='' (binding) if none appears explicitly in the input. Note that a left curly brace is required by the grammar at each of these positions.
The lexer maintains a record of every left curly brace (whether or not inserted), in a stack of layout contexts. Each entry records the preceding keyword (let, do, or ``=''), whether the left curly brace in question was automatically inserted or not, and the offset of that layout context. The most recent entry on the layout context stack is ``popped'' whenever an implicit or explicit '}' is encountered.
On encountering the in token, the lexer will insert implicit close braces ('}'), popping layout contexts as it goes, until one of the following conditions holds:
The last layout context popped was associated with the let keyword. That is: curly braces inserted before in will only balance up to the nearest preceding let.
The top entry on the context stack was explicit. That is: implicit closing curly braces will only balance implicit open curly braces.
On encountering end-of-file, the lexer will insert implicit close braces ('}'), popping layout contexts as it goes, until the top entry on the context stack is explicit.
Every (implicit or explicit) '{' begins a layout item sequence. After processing the '{', the lexer examines the next token. If it is end-of-file or
If the offset of the token is greater than the current sequence offset, it becomes the current sequence offset and is recorded in the top (most recent) layout context stack entry. Regardless of offset, no implicit semicolon will be inserted before this token.
If the offset of the token is less than or equal to the current sequence offset, and the most recent open brace was implicit, an implicit close brace is immediately inserted. in, processing proceeds as described above. Otherwise:
If the offset of the token is greater than the current sequence offset, the current token is ``,'' or ``)'', or the preceding token is ``,'' or ``('', nothing is inserted. This special-case suppression enables the use of longer string literals as arguments in I/O.
If the offset of the token is less than the current sequence offset, and the most recent open brace was implicit, an implicit close brace is immediately inserted.
If the offset of the token is equal to the current sequence offset, is not a semicolon (';'), and the most recently returned token was not a semicolon, a semicolon is automatically inserted. Provided it does not follow an opening brace, the offset of the first token on every line is used to determine whether a closing curly brace '}' or a semicolon should be conditionally inserted as follows:
An implicit '{' must be matched by an implicit '}'. Similarly, an explicit '{' must be matched by an explicit '}'. If this requirement is violated, an error is signalled.
1.4.7