Frame Guide

tur-frame is a pure-Turmeric spice that provides in-memory columnar dataframes modeled on R's data.frame, pandas DataFrame, and Racket's data-frame. Columns are stored in Apache Arrow's in-memory format (C Data Interface compatible), which enables zero-copy hand-off to Python, R, DuckDB, and Polars without serialization.

Installing

tur add https://github.com/rjungemann/turmeric-spices \
  --ref frame-v0.1.0 --subdir spices/frame --name frame
tur fetch

Then in your build.tur:

{:name "myapp"
 :spices ["frame"]}

Importing

(import frame/frame  :refer [frame frame-from-cols frame-from-rows
                               frame-nrows frame-ncols frame-schema
                               frame-column frame-head frame-tail
                               frame-slice frame=?])
(import frame/column :refer [column-int64 column-float64 column-utf8
                               column-type column-length column-get
                               column-int64-at column-float64-at
                               column-utf8-at])
(import frame/type   :refer [type-int64 type-float64 type-utf8 type-name])
(import frame/schema :refer [schema])
(import frame/select :refer [select-cols drop-cols rename with-col
                               map-col mutate])
(import frame/filter :refer [filter drop-nulls distinct sample])
(import frame/sort   :refer [arrange]]
(import frame/group  :refer [group-by agg agg-sum agg-mean agg-count
                               agg-min agg-max summarize])
(import frame/join   :refer [inner-join left-join join])
(import frame/reshape :refer [melt])
(import frame/csv    :refer [read-csv write-csv default-csv-opts])
(import frame/interop :refer [arrow-export arrow-import])
(import frame/print  :refer [print-frame frame-describe])

1. Building a Frame

From parallel columns

;; Build columns from value lists.  Pass 0 (nil) for validity to mean "all valid".
(let [names (column-utf8    (list "Alice" "Bob" "Carol") 0 0)
      ages  (column-int64   (list 30 25 35) 0 0)
      score (column-float64 (list 8.5 7.0 9.1) 0 0)

      s (schema (list (field "name"  (type-utf8)    0)
                      (field "age"   (type-int64)   0)
                      (field "score" (type-float64) 0)))

      df (frame s (list names ages score))]

  ;; quick sanity check
  (print-frame df)
  ;; =>  name   age  score
  ;;    Alice    30    8.5
  ;;      Bob    25    7.0
  ;;    Carol    35    9.1
  )

From an association list

frame-from-cols infers the schema from the column types:

(let [df (frame-from-cols
           (vec-of (cons "name" (column-utf8  (list "Alice" "Bob") 0 0))
                   (cons "age"  (column-int64 (list 30 25) 0 0))))]
  (print-frame df))

From rows

frame-from-rows accepts a schema and a list of rows. Each row is a list of values in schema order. This is slower than column-wise construction and is intended for small literals and tests:

(let [s  (schema (list (field "x" (type-int64)   0)
                       (field "y" (type-float64) 0)))
      df (frame-from-rows s
           (list (list 1 1.5)
                 (list 2 2.5)))]
  (print-frame df))

From CSV

read-csv infers column types from the first 100 rows by default:

(let [df (read-csv "data/sales.csv" (default-csv-opts))]
  (match df
    [(ok f)  (print-frame f)]
    [(err e) (println "error:" e)]))

The type inference order is int64 → float64 → date32 → timestamp → bool → utf8. Use read-csv-typed to supply an explicit schema and skip inference:

(let [s (schema (list (field "id"    (type-int64)   0)
                      (field "name"  (type-utf8)    0)
                      (field "price" (type-float64) 0)))
      df (read-csv-typed "products.csv" s (default-csv-opts))]
  ...)

Write a frame back to CSV with write-csv:

(write-csv df "output.csv" (default-csv-opts))

2. Selecting, Filtering, and Mutating

Frames are immutable values. Every transformation returns a new frame; unchanged columns are shared with the original (no copying).

select-cols and drop-cols

;; Keep only named columns
(let [small (select-cols df (list "name" "score"))]
  (print-frame small))

;; Drop named columns
(let [no-age (drop-cols df (list "age"))]
  (print-frame no-age))

rename and with-col

;; Rename a column
(let [df2 (rename df "score" "rating")]
  (print-frame df2))

;; Add or replace a column (length must match nrows)
(let [bonus (column-float64 (list 100.0 80.0 120.0) 0 0)
      df2   (with-col df "bonus" bonus)]
  (print-frame df2))

filter

Pass a predicate (fn [frame row-index] :int) -- return non-zero to keep the row:

;; Keep rows where age >= 30
(let [seniors (filter df (fn [f i]
                           (let [age (column-int64-at (frame-column f "age") i)]
                             (>= age 30))))]
  (print-frame seniors))

drop-nulls and distinct

;; Drop rows with any null in the named columns; pass 0 for all columns
(let [clean (drop-nulls df (list "score"))]
  ...)

;; De-duplicate by named columns; 0 = all columns
(let [uniq (distinct df (list "name"))]
  ...)

mutate

mutate derives a new column by calling row-fn once per row:

;; Add a "grade" :utf8 column derived from score
(let [df2 (mutate df "grade" (type-utf8)
            (fn [f i]
              (let [s (column-float64-at (frame-column f "score") i)]
                (if (>= s 9.0) "A"
                (if (>= s 7.0) "B" "C")))))]
  (print-frame df2))

3. Sorting and De-duplicating

arrange sorts by one or more columns. The keys argument is two parallel lists: column names and sort directions (0 = ascending, 1 = descending):

;; Sort by score descending, then by name ascending
(let [sorted (arrange df
               (list "score" "name")
               (list 1       0))]
  (print-frame sorted))

arrange-indices returns the permutation without applying it, so you can apply the same ordering to multiple frames with reorder:

(let [idx     (arrange-indices df (list "score") (list 1))
      sorted1 (reorder df   idx)
      sorted2 (reorder df2  idx)]
  ...)

distinct removes duplicate rows by the named key columns (or all columns if you pass 0). Combined with arrange, this is the typical "keep first occurrence" pattern:

(let [top-per-name (distinct (arrange df (list "score") (list 1))
                             (list "name"))]
  (print-frame top-per-name))

4. Group-By and Aggregation

group-by returns an opaque grouped-frame. Call agg on it to produce one summary row per group. The agg argument is three parallel lists: output column names, input column names, and aggregation tags.

;; Sum and count scores, grouped by grade
(let [grouped (group-by df (vec-of "grade"))
      summary (agg grouped
                (list "total_score" "count")
                (list "score"       "name")
                (list (agg-sum)     (agg-count)))]
  (match summary
    [(ok f)  (print-frame f)]
    [(err e) (println "agg error:" e)]))

Available aggregation functions:

Function Meaning
(agg-count) Number of rows
(agg-sum) Sum of column values
(agg-mean) Arithmetic mean
(agg-min) / (agg-max) Min / max value
(agg-median) Median
(agg-std) / (agg-var) Sample std / variance
(agg-first) / (agg-last) First / last value in group

summarize applies aggregations without grouping -- useful for whole-frame summary statistics:

;; Overall mean and max score
(let [stats (summarize df
              (list "mean_score" "max_score")
              (list "score"      "score")
              (list (agg-mean)   (agg-max)))]
  (match stats
    [(ok f) (print-frame f)]
    [_ 0]))

frame-describe produces a pandas-style summary -- count, mean, std, min, 25th/50th/75th percentile, and max for every numeric column:

(print-frame (frame-describe df))

5. Joining Two Frames

All joins return result<frame>. The keys argument is two parallel lists of column names -- left-side key names and right-side key names.

(let [orders   (read-csv "orders.csv"   (default-csv-opts))
      products (read-csv "products.csv" (default-csv-opts))]
  (match (cons orders products)
    [(cons (ok o) (ok p))
     ;; inner join on orders.product_id = products.id
     (let [result (inner-join o p
                    (list "product_id")
                    (list "id"))]
       (match result
         [(ok df) (print-frame df)]
         [(err e) (println "join error:" e)]))]
    [_ (println "csv read error")]))

The convenience join function takes a how string and a single key list (for when the key column has the same name on both sides):

;; Left join on the shared "category" column
(join orders products "left" (list "category"))

Available join kinds: "inner", "left", "right", "full", "semi", "anti", and cross-join (no keys needed):

;; Semi-join: keep left rows that have a match in right
(semi-join orders products (list "product_id") (list "id"))

;; Cross join: Cartesian product
(cross-join sizes colors)

When both frames have a column with the same name (other than the join key), the right-frame copy gets a _r suffix in the output.


6. Reshaping with melt

melt converts a wide frame to long format. The identity columns stay unchanged; the remaining columns each become a row, with their name stored in var-name and their value in value-name:

;; Wide frame:
;;   name   q1   q2   q3
;;   Alice  10   20   15
;;   Bob    12   18   22
;;
;; After melt with id-cols=["name"]:
;;   name   quarter  sales
;;   Alice  q1       10
;;   Alice  q2       20
;;   ...

(let [df    (read-csv "quarterly.csv" (default-csv-opts))
      long  (melt df
              (list "name")      ; identity columns
              "quarter"          ; variable column name
              "sales")]          ; value column name
  (match (cons df long)
    [(cons (ok wide) melted)
     (when melted (print-frame melted))]
    [_ 0]))

All non-identity columns must share the same type. melt returns 0 if they do not.

Note: pivot (long → wide) and transpose are not included in v0.1.0. For those operations, export to Arrow and use the receiving runtime's native pivot/transpose.


7. Arrow C Data Interface

tur-frame stores data in the Arrow in-memory format. Zero-copy hand-off to other runtimes uses the Arrow C Data Interface -- two small C structs (ArrowSchema* and ArrowArray*) that any Arrow-aware library can consume directly.

Exporting to Python (PyArrow)

;; Export returns a cons pair of (schema-ptr . array-ptr)
(let [ptrs      (arrow-export df)
      schema-p  (head ptrs)
      array-p   (head (tail ptrs))]
  ;; Hand the two raw pointers to Python via ctypes or cffi:
  ;; import pyarrow as pa
  ;; tbl = pa.RecordBatch._import_from_c(array_ptr, schema_ptr)
  (println "schema ptr:" schema-p)
  (println "array ptr:"  array-p))

The exporting side transfers ownership: the consumer must call the release callbacks embedded in the structs when done. PyArrow does this automatically.

Exporting to R (nanoarrow) and DuckDB

Importing from another runtime

;; Import an (ArrowSchema*, ArrowArray*) pair produced by another runtime.
;; Turmeric takes ownership and calls the release callbacks on GC.
(let [df (arrow-import schema-ptr array-ptr)]
  (match df
    [(ok f)  (print-frame f)]
    [(err e) (println "import error:" e)]))

Column-level export

For single-column hand-off (e.g. passing a vector to a numeric routine):

(let [col  (frame-column df "score")
      ptrs (arrow-export-column col "score")]
  ...)

Column Type Reference

Type tag function Arrow format Turmeric element accessor
(type-int32) "i" (column-int32-at col i):int
(type-int64) "l" (column-int64-at col i):int
(type-float32) "f" (column-float32-at col i):float
(type-float64) "g" (column-float64-at col i):float
(type-bool) "b" (column-bool-at col i):int (0 or 1)
(type-utf8) "u" (column-utf8-at col i):cstr
(type-date32) "tdD" (column-int32-at col i) → days since epoch
(type-timestamp) "tsu:" (column-int64-at col i) → µs since epoch
(type-null) "n" all rows null

Typed fast-path accessors (column-int64-at, etc.) are undefined if called on the wrong column type or out of range. Use column-type to check first, or call column-get for a bounds-checked option-typed result.


What's Coming in v0.2