BitC Language Specification

Version 0.10+

June 17, 2006

Jonathan Shapiro, Ph.D., Swaroop Sridhar, Scott Doerrie

Systems Research Laboratory

Dept. of Computer Science
Johns Hopkins University

Legal Notice

THIS SPECIFICATION IS PROVIDED ``AS IS'' WITHOUT ANY WARRANTIES, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NON-INFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.

Abstract

BitC is a systems programming language that combines the ``low level'' nature of C with the semantic rigor of Scheme or ML. BitC was designed by careful selection and exclusion of language features in order to support proving properties (up to and including total correctness) of critical systems programs.

This document provides an English-language description of the BitC semantics. It will in due course be augmented by a formal specification of the BitC semantics. The immediate purpose of this document is to quickly capture an informal but fairly complete description of the language so that participants in ongoing discussions about verifiable systems programming languages have a common frame of reference on which to base their discussions.

While the current language specification uses a Scheme-like concrete syntax, this choice is a matter of convenience only. It is entirely possible to build a C-like concrete syntax for BitC, and at some point it may become compelling to do so.

Table of Contents

1 Overview

1.1 About the Language

1.2 Conventions Used in This Document

1.3 Type Inference

1.4 Documentation Strings

I The Core Language

2 Input Processing

2.1 Comments

2.2 Identifiers

2.3 Reserved Words

2.4 Literals

2.4.1 Integer Literals

2.4.2 Floating Point Literals

2.4.3 Character Literals

2.4.4 String Literals

2.5 Compilation Units

2.5.1 Definitions and Declarations

3 Types

3.1 Categories of Types

3.2 Primary Types

3.3 Type Variables

3.4 Simple Constructed Types

3.4.1 Reference Types

3.4.2 Function Types

3.5 Sequence Types

3.5.1 Arrays

3.5.2 Vectors

3.6 Named Constructed Types

3.6.1 Structures

3.6.1.1 Storage Layout

3.6.2 Unions

3.6.2.1 Storage Layout

3.6.2.2 Type Tag Size and Alignment

3.6.2.3 Elided Tag Representation

3.6.3 Reprs

3.6.4 Value vs. Reference Types

3.6.5 Forward Declarations

3.6.6 Named Type Conveniences

3.7 Mutability

3.8 Exceptions

3.9 Restrictions

4 Type Classes and Qualified Types

4.1 Definition of Type Classes

4.1.1 Example: Eql

4.1.2 Qualification: Ord

4.1.3 Example: tyfn

4.2 Instantiation of Type Classes

4.3 Qualified Types

5 Binding of Values

5.1 Binding Patterns

5.2 define

5.3 Local Binding Forms

5.3.1 let

5.3.2 letrec

5.3.3 local define

5.4 Value Non-Recursion

5.5 Static Initialization Restriction

6 Declarations

7 Expressions

7.1 Literals

7.2 Identifiers

7.3 Type-Qualified Expressions

7.4 Value Constructors

7.4.1 unit

7.4.2 make-vector

7.4.3 array, vector

7.4.4 Convenience Syntax

7.5 Expression Sequences

7.6 Iteration

7.7 Interface Member Reference

7.8 Structure, Repr Field Reference

7.9 Union, Repr Tag Reference

7.10 Array and Vector Expressions

7.10.1 array-length, vector-length

7.10.2 array-nth, vector-nth

7.11 Procedure Values

7.11.1 Reference Parameters

7.12 Function Application

7.13 Conditional Execution

7.13.1 if

7.13.2 not

7.13.3 and

7.13.4 or

7.13.5 cond

7.14 Mutability

7.15 References

7.15.1 dup

7.15.2 deref

7.16 Value Matching

7.17 Exception Handling

7.17.1 Try/Catch

7.17.2 Throw

8 Locations

8.1 Expressions Involving Locations

8.2 Implicit Value Extraction

8.3 Generalized Accessors

9 Interfaces

9.1 Defining an Interface

9.2 Importing an Interface, Aliasing

9.2.1 Compile-Time Import Resolution

9.2.2 Error Reporting

9.3 Providing an Interface

9.4 The Reserved Interface bitc

10 Storage Model

11 Pragmatics

11.1 Tail Recursion

II Standard Prelude

12 Foundational Types

13 Foundational Type Classes

III Formal Specification

14 Grammar

14.1 Categorical Terminals

14.2 Interfaces, Units of Compilation

14.3 Type Declaration and Definition

14.4 Value Declaration and Definition

14.5 Types

14.6 Expressions

14.7 Miscellaneous

:= StringLit

IV Standard Library

15 BitC Standard Library

15.1 Built-In Operators

15.2 Arithmetic

15.3 Comparison

16 Verification Support

16.1 Axioms

16.2 Proof Obligations: Theorems

16.3 Proof Obligations: Invariants and Suspensions

16.4 Theories

16.5 Suspending and Enabling

17 Acknowledgments

1 Overview

The BitC project is part of the successor work to the EROS system [12]. By 2004, it had become clear that a number of important practical ``systems'' lessons had been learned in the EROS effort. These motivated a re-examination of the architecture. With the decision to craft a revised design and a new implementation came the opportunity to consider methods of achieving greater and more objective confidence in the security of the system. In particular, the question of whether a formally verified implementation of the EROS successor might be feasible with modern theorem proving tools. Following some thought, it appeared that the answer to this question might be ``yes,'' but that there existed no programming language providing an appropriate combination of power, formally founded semantics, and control over low-level representation. BitC was created to fill this gap.

1.1 About the Language

BitC is conceptually derived in various measure from Standard ML, Scheme, and C. Like Standard ML [10], BitC has a formal semantics, static typing, a type inference mechanism, and type variables. Like Scheme [8], BitC uses a surface syntax that is readily represented as BitC data. Like C [1], BitC provides full control over data structure representation, which is necessary for high-performance systems programming. The BitC language is a direct expression of the typed lambda calculus with side effects, extended to be able to reflect the semantics of explicit representation.

In contrast to ML, BitC syntax is designed to discourage currying. Currying encourages the formation of closures that capture non-global state. This requires dynamic storage allocation to instantiate these closures at runtime. Since there are applications of BitC in which dynamic allocation is prohibited, currying is an inappropriate idiom for this language.

In contrast to both Scheme and ML, BitC does not provide or require full tail recursion. Procedure calls must be tail recursive exactly if the called procedure and the calling procedure are bound in the same define, and if the identity of the called procedure is statically resolvable at compile time. This restriction preserves all of the useful cases of tail recursion that we know about, while still permitting a high-performance translation of BitC code to C code.

Building on the features of ACL2 [7], BitC incorporates explicit support for stating theorems and invariants about the program as part of the program's text.

As a consequence of these modifications, BitC is suitable for the expression of verifiable, low-level ``systems'' programs. There exists a well-defined, statically enforceable subset language that is directly translatable to a low-level language such as C. This translation is direct in both the sense that the translation is simple and the result does not violate programmer intuitions about what the program does or the program's data representation. Indeed, this was a key reason for our decision to move our implementation efforts into BitC.

1.2 Conventions Used in This Document

In the description of the language syntax below, certain conventions are used to render the presentation more compact.

Input that is to be typed as shown appears in fixed font.

Syntactic ``placeholders'' are shown in italics, and should generally be self-explanatory in context. Variable names, expressions, patterns, and types appear respectively as italic v, e, p, or T, with an optional disambiguating subscript. For clarity, the defining occurrence of a name will sometimes appear in the abstract syntax as nm.

When a sequence of similar elements is permitted, this is shown using "...". Such a sequence must have at least one element. For example:

(begin e ... e)

indicates that the begin form takes a (non-empty) sequence of expressions. When it is intended that zero elements should be permitted in a sequence, the example will be written:

(begin [e ... e])

Note that the square braces [ and ] have no syntactic significance in the BitC core language after s-expression expansion. When they appear in the specification, they should be read as metasyntax.

1.3 Type Inference

BitC incorporates a polymorphic type inference mechanism. Like SML, BitC imposes the value restriction for polymorphic type generalization. The algorithm for type inference is not yet specified here, and will be added at a future date — we want to be sure that it converges. We currently plan to use a constraint-based type inference system similar to the Hindy-Milner type inference algorithm [10].

The practical consequence of type inference is that explicitly stated types in BitC are rare. Usually, it is necessary to specify types only when the inference engine is unable to resolve them unambiguously, or to specify that two expressions must have the same result type. In this situation, a type may be written by appending a trailing type qualifier to an expression indicating its result type, as in:

(+ a b) : int32

by similarly qualifying a formal parameter, as in:

(define (fact x:int32)
  (cond ((< x 0) (- (fact (- x))))
        ((= x 0) 1)
        (otherwise
          (* x (fact (- x 1))))))

In general, wherever a type is permitted by the grammar, it is also permissible to write a type variable. A type variable is written as an identifier prefixed by a single quote. The scope of a type variable is the scope of its containing definition form. The type inference engine will infer the type associated with the type variable. Within a definition, all appearances of a type variable will be resolved to the same type. This is particularly useful in the specification of recursive types. For historical reasons, 'a, 'b, etc. are often pronounced ``alpha,'' ``beta,'' and so forth.

1.4 Documentation Strings

Certain productions in the grammar (Section 14) incorporate an optional documentation string labeled docstring. Documentation strings have predefined syntactic positions to facilitate automated extraction by documentation tools. If present, the documentation string must be a syntactically well-formed string, but the string is otherwise ignored for compilation purposes. In certain contexts a documentation string may be followed by an expression syntax, which creates a parse ambiguity. The parser should handle these cases by accepting the expression sequence and then checking to see if it has length greater than 1 and its first element is a string. Note that in such cases the string would be semantically irrelevant in any case. The only point of care here is to note that an expression sequence consisting of a single string is a value, not a documentation string.

Part I: The Core Language

2 Input Processing

The BitC surface syntax is an impure s-expression language. Expressions can be augmented with type qualifiers, and the language provides syntactic conveniences for field reference and array indexing. All of these have canonicalizing rewrites into s-expressions.

Input units of compilation are defined to use the Unicode character set as defined in version 4.1.0 of the Unicode standard [13]. Input units must be encoded using the UTF-8 encoding and Normalization Form C. All keywords and syntactically significant punctuation fall within the US-ASCII subset, and the language provides for US-ASCII encodable ``escapes'' that can be used to express the full Unicode character code space in character and string literals.

Tokens are terminated by white space if not otherwise terminated. For purposes of input processing, the characters space (U+0020), tab (U+0009), carriage return (U+000D), and linefeed (U+000A) are considered to be white space.

Input lines are terminated by a linefeed character (U+000A), a carriage return (U+000D) or by the two character sequence consisting of a carriage return followed by a line feed. This is primarily significant for comment processing and diagnostic purposes, as the rest of the language treats linefeeds as white space without further significance.

2.1 Comments

A comment is introduced by a semicolon and extends up to but not including the trailing newline and/or carriage return of the current line (the end of line markers are significant for purposes of line numbering). This implies that the comment syntax cannot be successfully exploited for identifier splicing as in early C preprocessors.

Notwithstanding the preceding, a semicolon appearing within a string or character literal does not begin a comment.

2.2 Identifiers

BitC identifiers are case sensitive. An identifier may start with any ``identifier character'' (Unicode 4.1.0 [13] character class XID_Start), followed by any number of optional ``identifier continue characters'' (Unicode 4.1.0 character class XID_Continue). The following ``extended alphabetic characters'' may also appear in any position of an identifier.

! $ % & * + - / \ < = > ? @ _ ~

Identifiers beginning with two leading underscores are reserved for use by the runtime system.

Reserved words are not identifiers.

2.3 Reserved Words

The following identifiers are syntactic keywords, and may not be rebound:

#f            #t            and           
apply         array         array-length
array-nth     begin         bitc-version  
bitfield      bool          by-ref 
case          catch         char          
cond          declare       defaxiom      
defexception  define        definvariant  
defrepr       defstruct     deftheory     
defthm        defunion      deref         
disable       do            double        
dup           enable        exception     
extends       external      fill          
fixint        float         if            
import        int8          int16         
int32         int64         interface
fn            lambda        let
letrec        member        mutable
not           opaque        or
otherwise     pair          proclaim
provide       quad          ref
set!          string        suspend
switch        tag           the
throw         try           tyfn
uint8         uint16        uint32
uint64        use           val
vector        vector-length vector-nth
word

The following identifiers are reserved for use as future keywords:

assert        check         coindset      
constrain     defequiv      defobject     
defrefine     deftype       defvariant    
do*           import!       indset
inner-ref     int           label         
let*          list          literal       
location      module        namespace     
nth           read-only     require
sensory       sizeof        super
tycon         using         value-at

In addition to the reserved words identified above, all definitions provided in the standard prelude are implicitly imported into the initial top-level environment of every compilation unit.

Note that BitC does not permit redefinition of bound variables in the same scope. This guarantees that top-level forms receive the default bindings of these identifiers in their environment.

For the moment, all identifiers beginning with ``def'' are reserved words. This restriction is a temporary expedient that is not expected to last in the long term.

Finally, the identifiers defined as part of the BitC standard runtime environment (described below) are bound in the top-level environment.

2.4 Literals

The handling of literal input and output is implemented by the standard prelude functions read and show. Source tokenization, requires that foundational literals have a defined canonical form.

2.4.1 Integer Literals

The general form of an integer literal is:

[-][baser]digits

where base is radix of the subsequent digits expressed in decimal form. Legal bases are 2, 8, 10, and 16. In the absence of a base prefix, the digits are interpreted as base ten. The digits are selected from the characters

0 1 2 3 4 5 6 7 8 9 a b c d e f

with the customary hexadecimal valuations. The letters may appear in either lowercase or uppercase. It is an error for a digit to be present whose value as a digit is greater than or equal to the specified base.

Integer literals of a particular fixed-precision type may be written by using a type qualifier. The expression:

564 : uint32

specifies an unsigned 32 bit quantity whose value is 564. It is a compile-time error to qualify an integer literal with a type that is incapable of representing the literal's value. In the absence of explicit qualification, the type assigned to an integer literal will be some subset of:

int8  int16  int32  int64
uint8 uint16 uint32 uint64
word

Any concrete type that cannot represent the literal value will be omitted from the set of types assigned.1

2.4.2 Floating Point Literals

The general form of a floating point literal is:

[-][baser]digits.digits[^exponent]

where base defines the decimal encoded radix of the digits and exponent is an integer literal, possibly including an initial minus sign and a radix specification for the exponent part. Digits are selected as for integer literals, above. Note that the decimal point is not optional, and must have digits on both sides. Thus, 0.0 is a valid floating point literal, but 0. and .0 are not.

As with integer literals, floating point literals of an explicitly stated representation type may be written using a type qualifier. The expression:

0.0 : float

specifies a 32-bit (single precision) IEEE floating point quantity whose value is zero. As with integer literals, it is a compile-time error to specify a value cannot be represented within the representable range of the qualifying type.2 In the absence of explicit qualification, the type of a floating point literal is some subset of:

float double quad

Any concrete type whose representable range cannot express the literal value will be omitted from the assigned set.

Conversion of a floating point literal to internal representation follows the customary IEEE floating point rounding rules when the specified literal cannot be exactly represented.3

2.4.3 Character Literals

BitC uses the Unicode character set as defined in version 4.1.0 of the Unicode standard [13]. Characters are 32 bits wide. Character literals can be expressed in two ways.

A character literal may be written as

#\printable-character

Where printable-character is any character specified in the Unicode 4.1.0 standard except those with general categories "Cc" (control codes) "Cf" (format controls), "Cs" (surrogates), "Cn" (unassigned), or "Z" (separators). That is, any printable character, excluding spaces. Notwithstanding the listed Unicode categories, the characters ``{'' (U+007B, left curly brace) and ``}'' (U+007D, right curly brace) are excluded for use in character literal escaping. Notwithstanding the previously listed Unicode categories, the following characters are considered printable characters as well:

! " # $ % & ' ( ) * + , - . / :
{ } ; < = > ? @ [ \ ] ^ _ ` | ~

A character may also be specified by its unicode code point:

#\U+digits

Where digits are hexidecimal.

Certain commonly used non-printing characters have convenience representations as character literals:

#\space
#\linefeed
#\return
#\tab
#\backspace
#\lbrace
#\rbrace

2.4.4 String Literals

BitC strings are written within double quotes, and may contain the previously listed ``printable characters'' excluding backslash (``\''), but including (``{'') and (``}''). They may also contain spaces (U+0020), left curly brace (U+007B) and right curly brace (U+007D).

Within a string, the backslash character (``\'') is interpreted as beginning an encoding of a specially embedded character. The character following the ``\'' is either a single-character embedding or a curly brace character ``{'' identifying the start of a Unicode character embedding. The legal forms and their meanings are:

\n Linefeed
\r Carriage Return
\t Horizontal Tab
\b Backspace
\s Space
\f Formfeed
\" Double quote
\\ Backslash
\{U+digits} Unicode code point, hexidecimal digits.

2.5 Compilation Units

There are two types of compilation units in BitC: interfaces and source compilation units. An interface compilation unit defines or declares types (and consequently the code of type constructors), defines type classes, defines constants, and declares values. A source compilation unit can define types, type classes, constants, and values.

Every valid BitC compilation unit may optionally begin (ignoring comments) with a bitc-version form. The syntax of the bitc-version form is:

(bitc-version s)

where s is the version string of the BitC version to which this program conforms. For the version of BitC described in this document, the proper version string is "0.10+". It is a compile-time error if the language version accepted by the current compiler is not backwards compatible with the version specified by the bitc-version form.

In the case of an interface compilation unit, the optional bitc-version form is followed by exactly one interface form. In source compilation units, the bitc-version form is followed by an arbitrary sequence of imports, definitions, declarations, and use forms that are not interface forms.

2.5.1 Definitions and Declarations

The top level forms that introduce programmatic definitions and declarations are:

define    definstance  defrepr
defstruct deftypeclass defunion  
proclaim
use

The define, defunion, defrepr, and defstruct forms support simple recursion. That is, the identifier(s) being defined may be used in their definition. However, the identifier(s) being defined are deemed incomplete until the end of the enclosing defining form. Restrictions on the use of incomplete identifiers are described in the sections on types and value binding.

The proclaim form is used to provide opaque value declarations. The identifier declared by a proclaim form is considered incomplete. If a completing definition is later provided within the same compilation unit, the identifier is considered complete in the balance of the defining compilation unit after the the close of its defining form. An incomplete declaration may be used within a procedure, but may not be used as part of a top-level initializer (see define, Section 5.2).

The use form is used to provide an alternative identifier that is equivalent in all respects to some existing top-level identifier.

All definition forms are expressions that return a value of type unit.

3 Types

BitC provides explicit control over data structure representation while preserving a memory-safe and type-safe language design.

3.1 Categories of Types

BitC has two categories of types: value types and reference types.

A value type is one whose value representation is ``embedded'' in the representation of its containing composite type or stack frame. The lifetime of an instance of value type is determined by the lifetime of its container, and it is the responsibility of the container to allocate storage for its contained value types.

A reference type is a type whose value representation resides in the heap. Every instance of a reference type has at least one reference value that denotes it. The reference is a value type; the value denoted by the reference is a reference type.

If T is a value type, then (ref T) is the type of a reference denoting a heap-allocated instance of T. Similarly, if T is a reference type, then (val T) is the corresponding value type. The val type constructor can only be applied to reference types whose target is of statically known size. Storage for a value of value type is allocated from its containing type.

BitC does not provide automatic assignment conversion between value types and reference types.

3.2 Primary Types

The primary types of BitC are:

unit

The unit type, having as its singleton member the unit value, both of which are written as ().4

bool

A boolean value, either #f or #t. The representation of this type is a single byte, aligned at a arbitrary byte boundary.

char

A unicode code point. The representation of this type is a 32-bit unsigned integer, aligned at a 32-bit boundary.

word

The type word is the smallest unsigned integral type whose range of values is sufficient to represent the bit representation of a pointer on the underlying machine. This type is architecture dependent, and is not directly assignment compatible with unsigned integral types of the same size. Values of type word are aligned at a boundary that is a multiple of their size.

bitfield

The bitfield form describes a fixed-precision integer field:

(bitfield basetype size)

Where basetype is one of the primary fixed-precision integral types and size is a literal not exceeding the size in bits of the base type.

The form

(bitfield int32 4)

describes a two's complement four bit field placed within a 32-bit alignment frame.

Bitfields may only be used as types of structure, union, or tag fields. The type of a bitfield is deemed to be assignment and binding compatible with its basetype. A bitfield over a signed base type is sign-extended as needed when copying to its base type. A bitfield over an unsigned base type is zero extended.

float, double, quad

The types float, double, and quad describe, respectively, IEEE floating point values as described in [2][3]. The quad type is an extended precision floating point type with a 15 bit exponent and a 112 bit mantissa.

The type bool and its constructors #t and #f are defined in the standard prelude.

At present, the BitC runtime system assumes that actual character values will be limited to the Unicode characters whose encoding falls between 0 and 127. This is a temporary restriction.

3.3 Type Variables

A type variable is an identifier preceded by a single quote, as in 'a. Type variables may appear in any position where a type can appear. Type variables are most commonly used as type constructor arguments for named constructed types (see below). They can also be used to annotate expressions, for example to require that two expressions must have the same type. The expression:

(myfun x y:'a):'a

says that the type of x is unspecified by the program author (and should therefore be inferred), the return type is also unspecified (and should be inferred), but the program author is stating that the return type and the last argument type are the same. This type of annotation is sometimes useful to assist the inference engine.

The scope of a type variable is its outermost defining form.

3.4 Simple Constructed Types

Constructed types compose existing types into new types. Type equivalence for the simple constructed types is determined by structural equivalence.

3.4.1 Reference Types

If T is a value type, then

(ref T)

is the type of a reference denoting a heap-allocated instance of T.

Storage Layout    The representation of a ref instance is architecture dependent. It is customarily determined by the size of the machine's integer registers, and aligned at any address that is congruent mod 0 to the integer register size.

3.4.2 Function Types

If targ and tresult are types (including type variables), then:

(fn targ tresult)

is the type of a function taking a single argument of type targ and returning a value of type tresult.

Function types are considered reference types that denote an object of statically undefined size. The size and alignment of a value of function type is determined by the underlying processor architecture.

3.5 Sequence Types

BitC provides fixed-length (array) and variable-length (vector) types.

3.5.1 Arrays

An array is a value type whose value is a fixed product type Ti>0, all of whose elements are of common type. The type:

(array T i)

describes the type of fixed-length arrays of element type T and length i, where i is an integer literal of type word that is greater than zero.

Storage Layout    The value representation of a k-element array is laid out in memory as the concatenation of k contiguous element cells whose size and alignment are determined by their respective element types. The elements of the array appear at increasing addresses in order from left to right.

3.5.2 Vectors

A vector is a dynamically sized array whose elements are of type T. Vectors are reference types. Because they are dynamically sized, there is no corresponding value type. The type:

(vector T)

describes vectors of element type T.

3.6 Named Constructed Types

The named constructed types are types whose compatibility rules are determined by name equivalence. Two values of named constructed types are equivalent if (a) they are instances of the same statically appearing type definition, and (b) their corresponding elements are equivalent.

Unless otherwise qualified, a named constructed type declaration declares a reference type.

3.6.1 Structures

The structure declaration defines a named type whose instances are an ordered sequence of named cells. The syntax of a structure declaration is:

(defstruct nm field ...)
(defstruct (nm tv1 ... tvn) field ...)

where each field is one of:

nm:type
(the type nm)
(fill bitfield-type)

All names nm are disjoint identifiers giving the names of the structure fields, and the respective type forms are the types of the respective fields. Given a variable v that is an instance of a structure type having a field named f, the expression v.f unifies with the field f within that structure.

A fill element may be used to support precise specification of alignment. The alignment and storage layout of a fill field follows the alignment and storage layout of its base type, however a fill field has no name or defined value, and cannot be programatically referenced.

An identifier that is bound to a structure type may be used as a procedure to instantiate new values of that structure type. The arguments to this procedure are the initial values of the respective structure fields.

An identifier that is bound to a non-parameterized structure type may be used as a type name. An identifier that is bound to a parameterized structure type may be used in a type constructor application within a type specification. Its arguments are the types over which the newly instantiated structure type should be instantiated. For example, the declarations:

(defstruct ipair a:int32 b:int32)

(defstruct (tree-of 'a):ref
  left : (optional (tree-of 'a))
  right : (optional (tree-of 'a))
  height : int8
  value : 'a)

define (respectively) the type name ipair and the single argument type constructor tree-of.

3.6.1.1 Storage Layout

A structure having k fields is laid out in memory at increasing addresses from left to right as k contiguous cells whose size and alignment are determined by their respective element types. These cells are then packed according to the previously described alignment and layout packing rules. Did we describe them?

3.6.2 Unions

Note

This is a proposed replacement to the current defunion form.

The defunion form defines enumerations, discriminated unions, and mixes of these. The type being defined is in-scope within the definition of the type, but is incompletely defined. The syntax of a union declaration is one of:

(defunion nm C1 ... Cn)
(defunion (nm tv1 ... tvn)
  C1 ... Cn)

where each tvi is a type variable and Ci is a constructor form. A constructor form consists of either a single identifier or a parenthesized identifier followed by a sequence of field or fill declarations (see defstruct). All field names appearing in a defunion, including constructor names, must be disjoint.

An identifier bound to a union constructor having no fields denotes the value of the unique corresponding union instance for that union type.

An identifier bound to a union constructor having associated fields is a procedure that may be used to instantiate new instances of that union type. The arguments to this procedure are the initial values of the union fields associated with that union variant.

An identifier that is bound to a non-parameterized union type is a valid type name. An identifier that is bound to a paramterized union type may be used in a type constructor application within a type specification. Its arguments are the types over which the newly instantiated structure type should be instantiated. For example, the declarations:

(defunion contrived
  (asChar c:char) (asInt i:int32))

(defunion (optional 'a) :val
  none
  (some value:'a))

define (respectively) a reference type holding either a char or an int32, and a value type of optional elements.

The declaration:

(defunion (list 'a):ref
  nil
  (cons car:'a cdr:(list 'a)))

Defines the reference type of homogeneous lists.

3.6.2.1 Storage Layout

Each variant of a union declaration effectively defines a k+1 element structure, where the first element contains the tag and the remaining k elements are the fields of the constructor leg. In the usual case, the representation of the union leg is arranged as though it had actually been this structure, without regard to the layout of other legs.

In the unusual case of a union whose tag representation can be elided (see below), each individual union leg will be arranged as though it had been the corresponding structure declaration.

In the case of a union having no tag, the union representation will match the size and alignment of reference cells. The storage occupied by a union of value type is the maximum of the storage required for each individual case of the discriminated union (including the type tag, if present).

3.6.2.2 Type Tag Size and Alignment

In the absence of declaration, the union type tag will be given an implementation-defined size and alignment selected to maximize performance efficiency. Explicit control over the size and alignment can be achieved using a tag-type declaration. The declaration:

(defunion (list 'a):ref
  (declare (tag-type uint8))
  nil
  (cons car:'a cdr:(list 'a)))

indicates that the tag should be implemented using an unsigned byte. The declared tag type must be an unsigned integral or bitfield type having a sufficient number of distinct values to assign a unique value to each constructor.

3.6.2.3 Elided Tag Representation

In the absence of an explicit type tag declaration, a union having exactly one leg whose first element is of reference type, all of whose other constructors are enumeration values, shall be represented without using additional storage for the type tag. The value zero shall be used as the low-level representation for the tag associated with the union leg whose first element is of reference type.5

If a union type tag is explicitly declared to be of a field type whose size in bits b is such that the machine's natural heap alignment restriction for objects is align>=2b, the total number of distinct legs of the union does not exceed 2b, and there is exactly one union leg whose first element is of reference type, then the tag representation shall be chosen in such a way that leg whose first element is of reference type does not reserve additional storage for the tag value. The value zero shall be used as the low-level representation for the tag associated with the union leg whose first element is of reference type.6

3.6.3 Reprs

There are examples of low-level hardware data structures for which the unions and structures that can be specified using defstruct or defunion are insufficiently expressive. One example is the Pentium GDT data structure, which has nested union discriminators, but simultaneously has an overall bit-level layout requirement. Another example is data structures where the representation of the tag must appear at a specific location that is not adjacent to the fields guarded by the tag. The defrepr form is included to permit the expression of these data structures.

The following scheme for defrepr is based on the bit-data representation proposed by Iavor Diatchki, et. al. [11].

Similar to unions and structures, a defrepr delaration takes the following form:

(defrepr name
  (Ctr1 f11:type f21:type ... fn1:type
    (where (= fp1 v11) (= fq1 v21) ...
           (= fm1 vm1))

  (Ctr2 f12:type f22:type ... fn2:type
    (where (= fp2 v12) (= fq2 v22) ... 
           (= fm2 vm2))

  ... )

The following restrictions apply. For all constructors Ctrx, Ctry, Ctrz, ...:

  • All fields fpx, fqx...fmx appearing in the when clause of a constructor form Ctrx must be described within the body of Ctrx. That is, {fpx, fqx, ... fmx} ⊆ {f1x, ... fnx}.

  • Identically named fields within two different constructor forms must be located at the same bit level offset from the beginning of both the constructor forms. That is, fpx = fpy implies bit-offset(fpx) = bit-offset(fpy).

  • Identically named fields within two different constructor must have the same type. That is, fpx = fpy implies type-of(fpx) = type-of(fpy).

  • The fields within the when clauses of all constructor forms must uniquely distinguish all constructible values of the union. The compiler will not introduce any more tag bits for a defrepr value.

  • Currently, the defrepr form will not accept type arguments over which it can be instantiated. That is, the following definition is not legal.

    (defrepr (name 'a 'b ... ) ... )
  • Currently, the discriminating fields fp1, fp2, etc must have a integer/bitfield type, and the discriminator values v11, v12, etc must be an integer literal.

We can envision a larger language construct DEFUNION, which accepts both type arguments and when clauses. The defunion and defrepr are just specializations of this DEFUNION construct. However, currently the language only supports defunion (which does not accept the when clause) and defrepr (which does not accept type arguments).

3.6.4 Value vs. Reference Types

In the absence of other specification, the defstruct, defrepr, and defunion forms declare reference types. The developer may optionally qualify the declaration to make this intention explicit:

(defstruct nm:ref field ...)
(defstruct (nm tv1 ... tvn):ref
  field ...)
(defunion nm:ref C1 ... Cn)
(defunion (nm tv1 ... tvn):ref
  C1 ... Cn)
(defrepr nm:ref (body))
(defrepr (nm tv1 ... tvn):ref (body))

The qualifier ``:ref'' indicates that the type declared (and consequently the type returned by value constructors) is a reference type. The qualifier ``:val'' indicates that the type declared is a value type. The qualifier ``:opaque'' indicates that the type declared is a value type whose internal structure is not accessable outside of the defining interface and the providers of that interface. An importer of an opaque type may declare fields and variables of that type and can copy instances of that type, but can neither apply the type constructors nor make reference to the contents of instances.

Note that if the type declared is a value type, it cannot be instantiated within the body of the declaration because its size is not statically known. That is, it is legal to have a field that is a reference to a value of the type currently being defined, but not a value of that type.

3.6.5 Forward Declarations

The declarations

(defstruct nm [qual] [external])
(defunion nm [qual] [external])
(defrepr nm [qual] [external])
(defstruct (nm tv1 ... tvn) 
          [qual] [external])
(defunion (nm tv1 ... tvn)
          [qual] [external])
(defrepr (nm tv1 ... tvn)
          [qual] [external])

state (respectively) that nm is a structure (respectively union) reference type of the stated arity whose internal structure is not disclosed. If present, the qualifier qual optionally declares the type to be one of ``:ref'', ``:val'', or ``:opaque''. In the absence of qualification, the default is ``:ref''. If present, the external portion consists of the keyword external followed by an optional identifier (see discussion of external identifiers in proclaim).

For example, the following declaration is include in the library bitc.int interface to declare the bignum type:

(defstruct int :val external bitc_int)

The structure of these types may optionally be disclosed later in the same compilation unit by a type definition for nm. If the declaring form appears within an interface, the corresponding type definition may appear in a providing unit of compilation, in which case the type is opaque to importers of the interface.

Note that a forward declaration of a value type is sufficient to declare references to that type, but not instances of that type. A complete definition of the value type is required to be in scope in order to declare fields and variables of value type.

3.6.6 Named Type Conveniences

The following types are defined in the BitC standard prelude.

(defstruct (pair 'a 'b) :val
   fst:'a snd:'b)

(defunion (list 'a) :ref
   nil
   (cons 'a (list 'a)))

Note that pair is a keyword that is specially recognized in binding patterns.

The pair type is supported by a right-associative infix convenience syntax:

(a, b) => (pair a b)
(a, b, c) => (pair a (pair b c))

This convenience syntax may be used in types, binding patterns, and value construction.

3.7 Mutability

Unless modified by the mutable keyword, the preceding types yield immutable instantiations. If T is a type, the type

(mutable T)

is the type of mutable instances of T. If the type T is a reference type, then the type (mutable T) describes a mutable reference to a memory location in the heap.

3.8 Exceptions

BitC provides declared exceptions. The type exception should be viewed as an ``open'' union reference type whose variant constructors are defined by defexception. The syntax of an exception declaration is:

(defexception nm [field1 ... fieldn])

where each fieldi is a field declaration (see defstruct) whose type is a concrete type.

An identifier bound to an exception name is a procedure that may be used to instantiate new instances of that exception. The arguments to this procedure are the values of the fields associated with the exception.

3.9 Restrictions

BitC imposes a value restriction [4] on polymorphism. A binding is only permitted to be of polymorphic type if its defining expression is a syntactic value.

As is usual in let-polymorphic languages, polymorphic function arguments cannot be used polymorphically within the function. For example, the following function is disallowed:

(define (foo f)
  (pair
    (f (cons 1 (cons 2 nil)))
    (f (cons #t (cons #f nil)))))

4 Type Classes and Qualified Types

A type class defines an n-ary relation on types, and provides a means for specifying ad hoc polymorphism. Every type class is parameterized over n≥1 types, and defines a set of methods over those types. Type classes provide a form of open type-directed operations: a user can add a new member to the relation established by a given type class by providing a new instantiation of the type class.

Closely connected with type classes is the notion of qualified types. For example, consider the following definition of list-max:

(define (list-max x)
  (switch tmp x
    (nil (raise ValueError))
    (cons
     (if (null? tmp.cdr)
         tmp.car
         (let ((m (list-max tmp.cdr)))
           (if (>= tmp.car m)
               tmp.car m))))))

which is typed as:

(forall ((Ord 'a))
        (fn ((list 'a)) 'a))

This type should be read informally as ``list-max is a procedure accepting lists of type 'a and returning a value of type 'a. It is defined over all types 'a such that there is an instiation of the (Ord 'a) type class.''

In this example, (Ord 'a) is the type class that describes types having a total order. That is: types over which the procedure >= is defined. Obviously, it not semantically sensible to request the greatest element of a list whose element type does not have at total ordering.

Contrast this example with the following alternative:

(define (list-max gte x)
  (switch tmp x
    (nil (raise ValueError))
    (cons
     (if (null? tmp.cdr)
         tmp.car
         (let ((m (list-max tmp.cdr)))
           (if (gte tmp.car m)
               tmp.car m))))))

which is typed as:

(fn ((fn ('a 'a) bool)
     (list 'a)) 'a))

In this second example, the comparison operator is provided as an argument, and there is no requirement for additional type constraints. Note, however, that in practice any comparison function that might actually be passed in this position is likely to depend on the <= operator in some fashion, and is therefore likely to end up having a qualified type.

4.1 Definition of Type Classes

A type class is defined by the abstract syntax:

(deftypeclass (nm tv ... tv)
  [tyfn-declarations]
  method-definitions)

where a tyfn-declaration is a statement of functional dependency between types [6]:

(tyfn (tv ... tv) tv)

and each method definition takes the form:

nm : function-type

Each method is an abstract procedure that may be instantiated for some particular type by a later use of definstance. The method may be invoked prior to the point where the instantiation is visible. Each method defined by a type class is introduced into the scope containing the type class definition.

By providing an instantiation of a class over some particular set of types, the programmer simultaneously proves (by example) that the set of types is a member of the class and defines (by example) how the operations of the class are implemented for that type.

Type functions, when present, indicate that there is a dependent relationship between two or more types of the type class relation. For example, the (incomplete) declaration:

(deftypeclass (sample 'a 'b 'c)
  (tyfn ('a 'b) 'c)
  ...)

states that sample is a type relation over three types, but also says that for any pair of types 'a and 'b there there is one valid choice of 'c.

4.1.1 Example: Eql

As a first example, consider the equality comparison operations. The type class Eql defines a single element type relation on types 'a: describing whether the type is admissable under equality. Some types — notably function types — cannot be compared for equality. The definition of this type class is written:

(deftypeclass (Eql 'a)
  == : (fn ('a 'a) bool))
  != : (fn ('a 'a) bool))

which states that Eq. is the single element type relation over all types 'a that can be passed as arguments to == and !=.

4.1.2 Qualification: Ord

A type class can also be introduced in qualified form. The syntax for such a type class definition is:

(deftypeclass 
  (forall (constraint ... constraint)
     (nm tv ... tv))
  [tyfn-declarations]
  method-definitions)

where each constraint takes the form:

(tc-name tv ... tv)

where tc-name is a typeclass name. An example of this use is the Ord type class:

(deftypeclass 
  (forall ((Eql 'a)) (Ord 'a))
  < : (fn ('a 'a) 'a))

This type class states that Ord is the single element type relation over all types 'a that can be passed as arguments to <. It also states that the Ord relation is only defined for types that are also members of the Eql relation (that is: types that admit equality comparison).

Note that in the presence of this definition, the procedures >, <=, and >= can be defined as:

(define (> x y)
  (not (or (< x y) (== x y)))
(define (<= x y)
  (or (< x y) (== x y)))
(define (>= x y)
  (or (> x y) (== x y)))

all of which will be inferred to have the type:

(forall ((Ord 'a)) (fn ('a 'a) bool))

This may seem like a very long-winded way of saying that an orderable type is any type that can be passed to the operators < and ==. However, type classes are statements about relations among types. This may become clearer with the following example.

Note that because (Ord 'a) has (Eql 'a), the types:

(forall ((Ord 'a) (Eql 'a))
        (fn ('a 'a) bool))
(forall ((Ord 'a)) (fn ('a 'a) bool))

are equivalent. The second is stylistically preferred for reasons of brevity. It is also more robust: in the (in this example unlikely) event that the definition of Ord should be modified to depend on some other type class in place of Eql the future, the first definition will mistakenly retain an additional, unncessary type dependency, while the second will continue to type check as intended.

Restriction: Qualified type relationships must be acyclic.

4.1.3 Example: tyfn

Need an example of type functions.

4.2 Instantiation of Type Classes

Whenever a type class method is invoked, the compiler must identify some concrete member of the type class relation that is sufficient to choose an appropriate implementation of that method. This is done by locating an appropriate instantiation.

A type class instantiation is a demonstration by example that some particular set of types satisfies the relation required by the type class. Type class instantiations are defined by the definstance form. The abstract syntax of definstance is:

(definstance tc-instance
  function ... function)
(definstance
  (forall (constraint ... constraint)
          tc-instance)
  function ... function)

where tc-instance takes the form:

(typeclass-name type ... type)

For example, the definition:

(definstance (Ord int32)
  int32-ops.<)

states that int32 is member of the type relation Ord because there is an instance function int32.< that provides an implementation of the ``less than'' operation over arguments of type int32.

In practice, this definition is insufficient, because we must first demonstrate that int32 is a member of the Eq relation (which is a superclass of Ord) In consequence, two separate instantiations are required:

(definstance (Eq int32)
  int32-ops.==
  int32-ops.!=)
(definstance (Ord int32)
  int32-ops.<)

A type class instantiation is deemed to be in scope for purposes of procedure instantiation if it is defined by the end of the outermost unit of compilation.

It is a compile time error to define two type class instances covering the same concrete types unless one instance is ``preferred'' to the other. Preference is determined by comparing the respective type variable instantiations positionally. Given two instances A and B over type variables tv1...tvn, instance A is preferable to instance B if there exists some subset of the respective type variable instantiations such that the instantiation under A is strictly more concrete than the instantiation under B, and the two instantiations are identical (modulo type variable renaming) at all other positions. If this comparison does not (transitively) determine a most preferred instantiation, then no instantiation is preferred and a compile time error is signalled.

4.3 Qualified Types

It is sometimes necessary to qualify the types of instances, type classes, constructed type definitions, or value declarations explicitly. A qualified type takes the general form:

(forall (constraint ... constraint)
        type)

Qualified types may appear only as the types of binding patterns; they may not qualify expressions generally. For example:

(define add1:(forall ((Num 'a))
                     (fn 'a 'a))
  (lambda (x) (+ (the 'a 1) x)))

explicitly states that the add1 procedure takes arguments whose type admit +, and therefore must be members of the Num type class.

If multiple qualifications appear in the same binding pattern, they must unify. The following is legal, if somewhat obscure:

(define
  (v1:(forall ((Eql 'a)) 'a), v2:'b)
  : (forall ((Num 'c))
      ((fn ('c) bool), 'b)) ...)

with the effect that v1 receives the qualified type:

(forall ((Eql (fn ('c) bool))
         (Num 'c))
        (fn ('c) bool))

which will ultimately fail to type check, because functions are not admissable under value equality.

Qualifications may also be applied to structure and union declarations, with the abstract syntax:

(defstruct
  (forall (constraints)
          (struct-name tvars) [:val])
   nm1[:t1] ... nmn[:tn])
(defunion
  (forall (constraints)
          (struct-name tvars) [:val])
   C1 ... Cn)

Qualifications may similarly appear in the binding patterns of structure, union, and value declarations.

5 Binding of Values

5.1 Binding Patterns

Binding patterns are used to bind names to values. They appear in the definition of formal parameters and in binding forms such as define, let, letrec, and do. A binding pattern consists of an identifier that is optionally qualified by a type:

id
(the T id)
id : T

In top-level bindings (those introduced by a top-level define, the id may be qualified by an interface binding name corresponding to some interface that the current unit of compilation provides (Section 9.3). Thus, if my.interface is an interface name, it is legal for a source unit of compilation to contain:

;; State that we are a provider
;; of my.interface:
(provide if my.interface)
;;...
;; Define some variable declared
;; in the interface:
(define (if.varname x)
  (+ x 1))

5.2 define

Variable and procedure bindings are introduced by define:

(define bp e)
(define (id [bp1 ... bpn])
  e ... e)

where each bp is a binding pattern. The second form is a convenience shorthand for:

(define id
  (lambda ([bp1 ... bpn])
    e ... e))

where each bpi is a binding pattern.

The right hand form of a define is evaluated to obtain a value, which is then pattern matched against the pattern being bound. For each identifier x that appears in the binding pattern, then identifier is bound in the currently active environment to the positionally corresponding element within the value resulting from the evaluation of the expression.

Identifiers defined within a single define are deemed ``incomplete'' until the end of the enclosing define form.

Mutually recursive procedure definitions at top level can be achieved either by use of letrec or by declaring the procedures ahead of their definitions.

5.3 Local Binding Forms

5.3.1 let

The let form provides a mechanism for locally binding identifiers to the result of an expression evaluation. Each identifier bound in a let form must appear exactly once among the collection of binding patterns being bound. Evaluation of the initialization expressions occurs in order from e1 to en. The environment in which the expression(s) are evaluated does not contain the identifiers being bound in the current let form.

The syntax of let is:

(let ((bp1 e1)
      ...
      (bpn en))
  ebody-1
  ...
  ebody-n)

One common form of these expressions is the one in which the left hand patterns are simple identifier names, as in:

(let ((x e1)
      ...
      (y e2))
  ; x, y are bound in:
  ebody-1
  ...
  ebody-n)

The value of a let form is the value of the last form executed within the body.

In similar languages, let is often presented as a form derived from lamdba. In BitC, as in other let-polymorphic languages, the value restriction for lambda arguments means that this is not (quite) true.

5.3.2 letrec

The letrec form provides a mechanism for locally binding identifiers to an expression value. Each identifier bound in a letrec form must appear exactly once among the collection of binding patterns being bound. Evaluation of the initialization expressions occurs in order from e1 to en. The environment in which the expression(s) are evaluated contains (via unification) the identifiers being bound in the current letrec form. This allows letrec to bind recursive procedure definitions:

(letrec
  ((odd
     (lambda (x) ; odd
       (cond ((= x 0) #f)
             ((< x 0) (odd (- x)))
             (otherwise
               (not
                 (even (- x 1)))))))
   (even
     (lambda (x) ; even
       (cond ((= x 0) #t)
             ((< x 0) (even (- x)))
             (otherwise
               (not
                 (odd (- x 1))))))))
  body)

The value of a let form is the value of the last form executed within the body.

Within the defining expressions of a letrec form, use of the identifiers being defined is subject to the same restrictions described for define. This ensures that cyclical constant data cannot be introduced.7

Any binding pattern appearing in the bpi position in a letrec must be statically decomposable at compile time. It is not sufficient that the corresponding ei be of compatible type. This restriction allows the binding patterns to be flattened away by the compiler without internally violating the completeness restriction.8

5.3.3 local define

The define form may be used in an expression sequence, provided it is not the last form of the expression sequence. In this context, define is a derived form. The canonical rewriting of the local define form using core language constructs is:

(begin ...
  (define id edef) e2 [...]) =>
(begin ...
  (letrec ((id edef))
    e2 [...]))

This rewrite proceeds left to right. Successive defines are gathered into letrec forms that are progressively more deeply nested. Adjacent define forms are not gathered into a single letrec body.

5.4 Value Non-Recursion

In any recursive binding (introduced by letrec or define) such as:

(define bp e)

if id is an identifier that appears in the binding pattern (and is therefore incomplete), free occurrences of id in e must occur only within a lambda body. This ensures that id will be initialized before it is used.

This restriction intentionally prevents infinitely recursive data constant definitions.

5.5 Static Initialization Restriction

I continue to look for a more rigorous way to express the following requirement.

Statically declared (global) variables must be initialized before the main entry point is entered. This presents a challenge of specification. The language definition must impose a sufficient ordering constraint on initializations to ensure that no initializer can depend (transitively) on any uninitialized variable. To ensure this, we introduce the notions of ``compile-time evaluable'' and ``compile time applicable'' expressions, and the restriction that every initializing expression of a statically declared variable must be compile-time evaluable.9 Informally: it must be possible for the compiler to evaluate the initializing expression at compile time without (conservatively) referencing any uninitialized variable.

Literals are compile-time evaluable.

A locally bound identifier is compile-time evaluable exactly if its initializing expression is compile-time evaluable. It is compile-time applicable exactly if the return value of its defining expression is compile-time applicable.

A globally bound identifier is compile-time evaluable provided its definition is lexically observable and compile-time evaluable. By ``lexically observable,'' we mean that either (a) it appears as a lexically preceding definition in the same unit of compilation, or (b) there exists some chain of interfaces I0...In such that the global identifier is is defined in In, the unit defining out imports I0, I0 imports I1, I1 imports I2 ... and In-1 imports In.

A globally bound identifier is compile-time applicable provided it is of function type, it is lexically observable, and all expressions appearing in its defining lambda form are compile-time evaluable. For purposes of this analysis, it is assumed that any formal parameter of the function is both compile-time evaluable and (if of function type) compile-time applicable.

Any expression other than an application or an assignment is compile-time evaluable provided that all of its free identifiers are compile-time evaluable.

An application is compile-time evaluable provided that (a) the expression in the function position is compile-time evaluable, (b) all of its arguments are compile-time evaluable, and (c) any arguments of function type are compile-time evaluable.

An assignment (as with set!) is compile-time evaluable provided its expression is both compile-time evaluable and (if of function type) compile-time applicable. This prevents later assignments from altering the compile-time evaluability of previously defined identifiers.

Dangling:

The result of an expression evaluation (including application and constructor application) is observably known if (a) the definitions of all identifiers that are free in the expression are observably known, and (b) any procedure that is applied is observably applicable. Requirement (b) is satisfied by definition for all type constructors.

Note that these definitions are conservative with respect to mutability. Because no initializing expression can reference an observably unknown value, nor perform an application that is not observably applicable, it follows that no assignment performed from within an initializing expression can cause an identifier to transition from observably known to observably unknown.

6 Declarations

The proclaim form is used to provide opaque value declarations. The declaration:

(proclaim x:int32)

states that x is the name of a value of type int32 whose definition and initialization is provided by some implementing unit of compilation. This form can legally appear only at top level within a source unit of compilation or within an interface.

The identifier declared by a proclaim form is considered incomplete. If a completing definition is later provided within the same compilation unit, the identifier is considered complete in the balance of the defining compilation unit after the the close of its defining form. An incomplete declaration may be used within a procedure, but may not be used as part of a top-level initializer (see define, Section 5.2).

It is occasionally necessary to make reference to procedures or values that are implemented by an externally provided runtime library. This may be accomplished by an external declaration:

(proclaim proc:(fn (int32) char)
          external)
(proclaim proc:(fn (int32) char)
          external ident)

This has the effect of advising the BitC compiler that no definition of this identifier will be supplied in BitC source code. It is primarily intended to support portions of the BitC runtime library. Use of this mechanism for other purposes is strongly discouraged, and we reserve the right to revise this syntax incompatibly in future revisions of the BitC specification.

If a proclaimed external procedure provides an optional trailing ident, this identifier will be used verbatim in the generated code in place of the normal identifier name generated by BitC. The trailing identifier is permitted only if the external procedure has non-polymorphic type.

7 Expressions

7.1 Literals

Every literal is an expression whose type is the type of the literal (as described above) and whose value is the literal value itself.

7.2 Identifiers

Every lexically valid identifier is an expression whose type is the type of the identifier and whose value is the value to which the identifier is bound.

7.3 Type-Qualified Expressions

Any expression e may be qualified with an explicit result type by writing either of

(the T e)
e : T

where T is a type. This indicates that the result type of the the form is constrained to be of type T. The the form is syntax, its expression argument is not conveyed by application, and is therefore not subject to copying as a consequence of type qualification.

The result value of the expression is not changed by type qualification, except to the extent that a type restriction may lead the inference engine to resolve the types of other expressions and the selection of overloaded primitive arithmetic operators in ways that produce different results.

Syntactic Restriction    The e:T convenience syntax is not permitted in combination with the member selection convenience syntax ``.''. The sequence of grammar expansions:

expr -> expr.Id
expr -> expr:type.Id
expr -> expr:Id.Id.Id
               ^

leads to a shift/reduce conflict at the indicated position. The grammar resolves this by disallowing the helper type-qualification syntax in this context. If required, a type qualification in this context can be obtained using either of the following alternatives:

(the T e).Id
(member e:T Id)

7.4 Value Constructors

7.4.1 unit

The expression:

()

denotes the singleton unit value.

7.4.2 make-vector

The expression:

(make-vector elen einit)

creates a new vector whose length is determined by the value of the expression elen, which must evaluate to a value of type word. The argument einit must be a function from word to some type T, where the vector created will be of type (vector T). The initializer value for each cell will be obtained by invoking the procedure einit a total of elen times, passing as an argument the index of the vector position to be initialized. The procedure einit should return the desired initializer value for the corresponding position.

For example, the procedure list->vector may be written as:

(import ls bitc.list)
(define (list->vector lst)
  (make-vector
    (length lst)
    (lambda (n)
       (ls.list-nth lst n))))

Care should be taken to ensure that the type returned by the initializer function is mutable if the slots of the vector are intended to be mutable.

7.4.3 array, vector

The expressions:

(array e0 ... en)
(vector e0 ... en)

create a new array (respectively, vector) whose length is determined by number of arguments. The first argument expression becomes the first cell of the created array (respectively, vector), the second becomes the second, and so forth. All expressions must be of like type.

7.4.4 Convenience Syntax

Derived forms

The following are right-associative convenience syntax for types defined in the standard prelude:

(a,b) => (pair a b)
(a,b,c) => (pair a (pair b c))

[]    => nil
[a] => (cons a nil)
[a,b] => (cons a (cons b nil))

7.5 Expression Sequences

Derived form

The expression:

(begin e1 ... en)

executes the forms e1 through en in sequence, where each form is an expression. The value of a begin expression is the value produced by the last expression executed in the begin block.

Derivation    The canonical rewriting of the begin form using core language constructs is:

(begin e1 ... e2) =>
((lambda () e1 ... e2))

7.6 Iteration

Derived form

BitC provides the looping construct do as a derived form.

(do ((bp1 einit-1 estep-1)
     ...
     (bpn einit-n estep-n))
    (etest eresult)
   ebody-1
   ...
   ebody-n)

Do is an iteration construct taken from Scheme [8]. It specifies a set of variables to be bound along with an initializer expression and an update expression for each variable. Evaluation of the do form proceeds as follows:

The einit-i expressions are evaluated in order in the lexical context containing the do form. In this context, the variables bound by the loop have not yet been bound. All other expressions are evaluated within an inner lexical context that includes the do-bound variables. After all of the initialization values are computed in order, the do-bound variables are bound to the initial results in parallel, and body processing begins.

At the start of each pass over the body, the expression etest is evaluated. If this expression returns #t, then eresult is evaluated and its result returned. Otherwise, the expresions of the body are evaluated in sequence.

At the end of each execution of the loop body, the estep-i expressions are evaluated in sequence. Once all of the expression values have been evaluated, the do-bound variables are bound to the newly computed results in parallel and a new pass is initiated over the loop body as previously described.

Derivation    The canonical rewriting of the do form using core language constructs is:

(do ((bp1 einit-1 estep-1)
     ...
     (bpn einit-n estep-n))
    (etest eresult)
   ebody-1
   ...
   ebody-n) =>
(letrec
  ((__loop
    (lambda (bp1 ... bpn)
      (if etest eresult
          (begin
            ebody-1
            ...
            ebody-n
            (__loop estep-1
                    ...
                    estep-n))))))
  (__loop einit-1 .. einit-n))

Note the because do is defined in terms of lambda application, value restrictions on argument bindings apply.

Pragmatics:    The do form is guaranteed to be processed tail recursively, as implied by the canonical rewriting.

7.7 Interface Member Reference

If if is an identifier naming an interface binding established through import, and id is an identifier defined in that interface, then either of:

(member if id)
if.id

is an expression that returns the value of that identifier. The returned value is a location, and can be used as an argument to set!.

7.8 Structure, Repr Field Reference

If eloc is a location expression of structure or repr type, and field is an identifier naming some invariant field in that type then either of:

(member eloc field)
eloc.field

is an expression that returns the field value. member is a syntactic form. The returned value is a location, and can be used as an argument to set!.

7.9 Union, Repr Tag Reference

If eloc is a location expression of union or repr type, and tagid is an identifier naming some union discriminator tag in that union or repr type then either of:

(member eloc tagid)
eloc.tagid

is a boolean expression that returns true exactly if the tag value of the corresponding tag is tagid.

7.10 Array and Vector Expressions

7.10.1 array-length, vector-length

If e is an expression of array (respectively: vector) type, then

(array-length e)
(vector-length e)

returns a word whose value is the number of elements in the array.

7.10.2 array-nth, vector-nth

If e is an expression of array (respectively: vector) type, and ei is an expression with result type word, then:

(array-nth eloc ei)
(vector-nth e ei)

are (respectively) expressions that return the ei'th element of the array (respectively: vector). If the value ei is greater than or equal to the length of the array (respectively: vector), then a IndexBoundsError exception is thrown.

array-nth and vector-nth are syntactic forms. The returned value is a location, and can be used as an argument to set!. array-nth requires a location as its argument.

The expression:

e[ei]

is a convenience shorthand for

(vector-nth e ei)

7.11 Procedure Values

Procedure values are introduced by the keyword lambda. In contrast to Scheme, Haskell, and Standard ML, BitC procedures take zero or more arguments. The syntax of a procedure definition is:

(lambda ([bp1 ... bpn]
  e1 ... en)

where each bpi is a binding pattern matching the formal parameters of the procedure and e1...en is the body of the procedure. The return value of the procedure is the value computed by the last expression executed in the body.

Each formal argument binding pattern defines a set of variable bindings that are in scope in the body of the lambda. Each formal argument binding pattern is unified with its corresponding actual parameter. Any identifier that is free in the binding pattern is unified with the structurally corresponding element of its associated actual parameter.

BitC argument and return value passing are ``by value.'' Formal argument and return values must be of value type, which means that references can be passed, but the values denoted by these references cannot. The ``by value'' policy also implies that local variables are copies of their initializing expressions, which may yield surprising results if the initializer is of mutable type. A let binding is not an alias for its initializer. A let binding of a (top level) mutable value cannot simply be substituted by β-reduction into the body of the let form.

7.11.1 Reference Parameters