Type Erasure to int64_t

Turmeric compiles to C. Many high-level types -- closures, ADTs, opaque structs, type variables, recursive types, tagged unions -- share one runtime representation: a 64-bit opaque handle. This guide maps where that collapse happens today, so contributors extending the type system or codegen know which boundary they are crossing.

This is a snapshot. As sized types (SZ*), unboxed structs, and monomorphization land, several of these sites will gain non-erased representations. Treat the file:line citations as a starting point and re-verify before relying on them.


The single choke point

Everything funnels through one function:

If type_c_name() returns "int64_t" for a type, that type is erased at the C boundary. If it returns a struct name or a concrete C type, it is not.


The three mechanisms

1. Pointer-as-int

Heap-allocated values are cast through (int64_t)(intptr_t) and stored as raw integers. The runtime keeps a pointer; the type system pretends it's an int.

Call sites that marshal these into generic positions live in src/compiler/elab_call.c:1570-1576.

2. Opaque-by-default

Anything the type checker cannot lower to a concrete C layout falls through to int64_t. This is how parametric polymorphism is implemented in the absence of monomorphization.

Note the asymmetry inside TY_APP: if type_has_concrete_codegen_layout() succeeds, the application gets a real struct via register_struct_app(). Otherwise it collapses. This is the seam where HKT specialization can hook in.

3. Tagged pair

For runtime-discriminated values, both the tag and the payload are int64_t:

typedef struct { int64_t tag; int64_t val; } tur_tagged_t;
#define TUR_TAG(t, v)  ((tur_tagged_t){(int64_t)(t), (int64_t)(v)})

Used for (A | B) union types and the any top type.


Function values

Function values are a special case because they need both a code pointer and an environment, but the type system still wants to treat them uniformly.

The code pointer is erased into int64_t; the environment travels alongside as a separate void *.


What stays unboxed

Not everything collapses. Types that already fit in a register and have a stable C representation pass through unchanged:

Turmeric type C type
:int int64_t (carrier, not erasure)
:bool bool
:float double
:cstr const char *
:ptr void *
Concrete defstruct the struct's C name
Concrete TY_APP with codegen layout a registered struct app
TY_SET tur_set_t *

The distinction between ":int as a carrier" and ":int as erasure target" matters when reading inline-C: a parameter declared :int may be carrying a real integer, or it may be carrying a cast pointer. The declaring defn is the source of truth.


Why this matters

A few practical consequences of the current erasure scheme:

  1. No type-directed dispatch at the C level. Two functions that take an erased generic parameter receive the same int64_t and cannot branch on the runtime type without a tur_tagged_t-style discriminant.
  2. Pointer provenance is invisible to the C compiler. GC and sanitizer tooling that wants to walk heap pointers has to know which int64_t fields are actually casts. The runtime tracks this separately.
  3. The HKT codegen seam is type_has_concrete_codegen_layout(). Any improvement that turns more TY_APP cases into real structs passes through this predicate.
  4. Sized types (SZ*) narrow the carrier, not the erasure. A :int32 argument still lives in an int-sized slot at the call boundary; the narrowing happens inside the function body.

Re-verifying the map

Line numbers drift. To regenerate this list:

# All sites that emit "int64_t" as a C type from the type lowering
rg -n '"int64_t"' src/compiler/types.c

# Sites that cast pointers through intptr_t into the int64_t carrier
rg -n '\(int64_t\)\(intptr_t\)' src/ stdlib/

# The tagged union and poly function typedefs
rg -n 'tur_tagged_t|tur_poly_fn_t' src/compiler/emit_module.c

The choke point at type_c_name() is stable -- start there and walk outward.