This quick start tutorial is aimed at bringing the reader up to speed
with Nocaml and give the ability to: write Nocaml
programs, write any necessary builtins in C, set up a Nocaml
runtime environment, and invoke Nocaml code from C.
It is assumed that the reader has at least a rudimentary grasp of the
essentials of OCaml, Ruby and C.
NocamlLet’s start by cracking open Nocaml’s stdlib.lisp, which is, of
course, written in Nocaml. Specifically, let’s take a look at the
not function:
(defun not (x)
(case x
((False) true)
((True) false)))
As you can see, the syntax is vaguely reminiscent of Lisp. In this
example, True and False are data constructors, while the lowercase
variants true and false happen to be “primordial” objects that are
statically allocated outside the Nocaml heap. Note that case is
analogous to OCaml’s match ... with construct, except it isn’t as
powerful - it only supports destructing a single level.
That wasn’t a very exciting function though. Let’s take a look at
reverse-rec:
(defun reverse-rec (lista listb)
(case lista
((ListFini)
listb)
((ListCons head tail)
(reverse-rec tail (ListCons head listb)))))
Now here’s a function that’s actually recursive. ListFini and
ListCons are data constructors. As you can see, destructing
ListCons yields two variables, here given the names head and
tail. In the last line, the constructor ListCons is used to create
a new object. The result of reverse-rec a b should be (concat
(reverse a) b), except it can’t be implemented that way because
concat requires reverse-rec for its implementation.
Types and builtins are declared in “companion” Ruby files like
stdlib.rb. Such files can also define primordials. Such files
consist of invocations of the following functions:
wombat_enable_ocaml enables OCaml code generation for strong
static typechecking. this should almost always be on, and stdlib
indeed turns it on
wombat_register_primordial registers a primordial object,
accepting its name and C definition
wombat_register_builtin registers a builtin, accepting its name
and arity
wombat_register_constructor registers a constructor, accepting its
name and arity
wombat_register_constructor registers a native constructor,
accepting its name and arity. Native constructors cannot be
destructed (cased) in Nocaml, and they are always initialized to
zero when created
wombat_ocaml registers OCaml code to emit during OCaml code
generation. Nocaml has no facility for creating corresponding
OCaml types or builtin function signatures so these must be coded
by hand at present
Some additional notes that will assist the reader in understanding
stdlib.rb:
Blobs will be described in another section. Do not concern yourself
overly with the blob-related statements at present. Also, the
blob-related statements only have to be implemented once – in
stdlib. A typical user of Nocaml will only be interested in
defining custom types, primordials, and builtins
Notice how primordials are plainly defined as static const
uintptr_t arrays in C
There are two sets of OCaml code. The first set (wombat_*) is
for validating everything except the constraint that there be no
partial function application. The second set (wombatx_*) validates
that no partial function application takes place. This requires the
trivial transformation of, for each function, wrapping N arguments
into a vectorN object. Thus every function is converted into a
single-argument form, and partial application becomes
impossible.
In Nocaml, the memory representation of each object begins with a
“low” tag that contains the ID of a wombat or native
constructor. Alternatively, the tag can be a “high” tag whose high bit
is set. In this case, the tag declares a “blob” whose length in bytes
is given by the lower W-1 bits of the tag (where W is the bit
length of a machine word).
When providing a facility to create a Blob, it was desired that
wombat remain agnostic with respect to any defined types. Therefore,
instead of allowing the creation of a Blob out of an Integer
object, wombat allows the creation of a Blob out of any object
whatsoever. The given object is passed to the special wombat_measure
function to be defined by the C code. This function is defined by
stdlib in an implementation that asserts that its argument is
Integer.
In Nocaml, allocation of Nocaml objects always takes place (A) by
driver C code when no Nocaml code is executing or (B) in Nocaml
code. Builtins cannot allocate. Loosening this restriction is
technically possible but would add overhead and complexity.
Thus, if a builtin were otherwise to require allocation, the
allocation is first performed in Nocaml and passed into the
builtin. For example, addition is not:
(let ((c (int-add a b))) ...)
but is rather:
(let ((c (int-add a b (Integer)))) ...)
The policy is to allow a limited form of mutation, making Nocaml not
technically pure. The specific concession made in stdlib is to check
that the object being mutated is freshly allocated at the front of the
heap. This allows small, local, harmless mutations while prohibiting
the types of mutations that can cause confusion in large programs.
Let’s open stdlib.c and check out a builtin:
WOMBAT_BUILTIN static uintptr_t* wombat_builtin_blob_minus_length(WombatExternal* wombat_external WOMBAT_UNUSED, uintptr_t* wombat_context WOMBAT_UNUSED, uintptr_t* cella, uintptr_t* cellb)
{
STDLIB_CHECK_TOP(cellb);
uintptr_t bit = (1UL << ((8*sizeof(uintptr_t))-1));
assert((cella[0] & bit) != 0);
assert((cellb[0] == WOMBAT_NATIVE_CONSTRUCTOR_Integer));
cellb[1] = (cella[0] - bit);
return cellb;
}
This builtin gets the length of a blob. The _minus_ in its name
corresponds to a literal - character. STDLIB_CHECK_TOP is a macro
that checks that an object is at the top/front of the heap. The first
assert checks that the cella is a blob. The second checks that
cellb is an Integer. The next line sets the payload word of the
Integer object to the length of the blob.
WOMBAT_BUILTIN is a macro that expands to an attribute informing
gcc and clang that this function should not be subject to
inlining.
WOMBAT_UNUSED is a macro that expands to an attribute informing
gcc and clang that this parameter is not necessarily used. In this
case, wombat_external is used by STDLIB_CHECK_TOP to determine the
heap pointer.
In no cases would wombat_context apply to a builtin (it is there
because it is required to keep function signatures uniform within
wombat – keep in mind that builtins can be passed around as
variables within Nocaml).
The driver.c for the sample quicksort program illustrates how to set
up a Nocaml environment, allocate Nocaml data structures from C,
and call into Nocaml code from C.
The Nocaml specifics start thus:
#define NELM (1536*1024*1024/8)
static uintptr_t memory[NELM];
static uintptr_t bitmap[((NELM+((8*sizeof(uintptr_t))-1))/(8*sizeof(uintptr_t)))];
static uintptr_t ctrmap[(sizeof(bitmap)/sizeof(uintptr_t))];
#define STKMAPLEN 32768
static uintptr_t stkmap[STKMAPLEN];
In this example, memory allocates 1.5GiB of heap (on 64-bit
systems – this is not scaled for 32-bit because we are primarily
concerned with being able to sort a specific number of machine
words). The collector requires three additional areas of memory. The
first is the “bitmap” which must have one bit per word, rounded up to
an even number of words. The second is the “counter map”, which must
have the same size as the bitmap. The third is the “stack map” which
must be large enough to accomodate all the pointers into the heap that
are stored on the stack. Usually the stack map need not be very large
at all, but in this case it is large enough to allow running debug builds
that lack tail recursion optimization.
Next is the setup of a CollectorExternal object on the stack:
CollectorExternal external;
memset((&external), 0, sizeof(external));
external.heap.bas = memory;
external.heap.end = memory + NELM;
external.heap.ptr = external.heap.end;
external.bmap.bas = bitmap;
external.cmap.bas = ctrmap;
external.smap.bas = stkmap;
external.smap.end = stkmap + STKMAPLEN;
There is nothing remarkable about the above. The next line shows how
to turn a primordial into an ordinary uintptr_t* wombat object,
basically by casting away const:
uintptr_t* rest = ((uintptr_t*)(wombat_primordial_list_minus_fini));
Next there is a loop:
for (uintptr_t i = 0; i < QSELMS; i++) {
uintptr_t v = ((uintptr_t)(rand()));
rest = wombat_constructor_3(&(external.wombat), WOMBAT_CONSTRUCTOR_ListCons, wombat_constructor_2(&(external.wombat), WOMBAT_NATIVE_CONSTRUCTOR_Integer, ((uintptr_t*)(v))), rest);
}
This constructs a list by prepending an element each iteration. There
are two constructor invocations per iteration, one for the list link
and one for the Integer.
Finally, the invocation:
rest = collector_invoke((&external), ((void*)(wombat_defun_main_minus_quicksort_minus_ntimes)), NULL, rest, n, NULL, NULL, NULL, NULL);
In Nocaml, invocations are always made with 6 arguments (any
additional arguments are “don’t cares”). After invocation, Nocaml
will conduct a cycle of compacting garbage collection, keeping only
the result of function invocation. All objects to be kept must be
included in the result.
After the quicksort example, there is an example using blobs. An input
blob containing the string “hello” is replicated 100 times, resulting
in a blob of length 500, that is converted to a string in the C
code.