Obelisk is a novel relational database management system and its scripting language, urQL, designed for the experimental Urbit computing platform but adaptable to other systems. Building on E. F. Codd’s relational algebra and unencumbered by backward compatibility constraints, Obelisk is a temporal relational database derived from first principles. Key innovations include time travel for querying historical database states, proper set implementation, elimination of nulls, idempotent queries, and enforced composability. Obelisk ensures all user operations are accessible via both the urQL scripting language and api, reflecting Urbit’s pure functional architecture. Obelisk’s time-travel feature indexes every schema and content state change by time, ensuring idempotent queries and atomic script execution. This paper explores how Obelisk advances relational database design while addressing sql’s deficiencies, offering a modern, principled approach to structured data management.
The Obelisk relational database management system (rdbms) and urQL scripting language1 run on the experimental Urbit computer and network but could just as well be implemented on any other computer—except for certain features dependent on Hoon language features or the Urbit network architecture.2 Like all relational databases, it is built upon E. F. Codd’s pioneering computer science work developing the relational algebra and uniting it with set theory and predicate calculus.3 As relational databases implementing sql came to dominate structured data storage the sql standard evolved, always bearing, and hindered by, the necessary constraint of backward compatibility. As the first and only rdbms for a new computing paradigm, Obelisk has no obligation to the past and so is free to build anew from first principles with all the learnings of the last half century while perpetuating the best features of sql.
Obelisk’s urQL, while clearly a dialect of sql, fixes numerous deficiencies in sql4 , elevates composability to a syntactic requirement (not merely an option), adds one completely new major feature, time travel,5 and leaves no functionality inaccessible to the normal user.
The rdbms evolution that is Obelisk was greatly inspired by the Urbit architecture. Urbit is a computer whose entire state (OS and userspace) is a pure function of its prior state.6 The Urbit computer itself is an acid database. The entire state of the server is one noun7 that advances only through pure functions. urQL’s syntax is more than a “better" scripting language, it is an accurate reflection of the underlying database server’s operations. Anything the user can accomplish through urQL they can also do through the api8 , and vice versa. Together with Obelisk’s native time travel capabilities and idempotence, there are no mysteries confronting the user, no functionality outside the scope of the api/scripting language like performing audits with database logs. There are no logs. (There are system views that emulate logs to support the familiar log analysis pattern.) Obelisk’s state is both its current state and all prior states.9
Obelisk’s major innovations include:
Time travel
Proper sets
No nulls
Composability
The totality of data in a data bank may be viewed as a collection of time-varying relations (Codd, 1970).
In Obelisk time is primary and fundamental. Every change of state, whether to a database’s schema or content, is indexed by time. Thus every query is idempotent because each query is implicitly or explicitly associated with a particular state in the series. And time is a series of steps, like in physical reality. A time step may be appended to the series, but there is no inserting, updating, or deleting time steps.
The rules enforcing time primacy in the Obelisk database
engine are simple. Each database holds a most recent
schema10
time (by user data table) and a most recent content time (by
user data table). Every subsequent state change, whether to
schema or content must be subsequent to the latest of the
two times. Normally the user never needs to concern
themselves with this requirement. When not overridden
by AS OF
the database engine applies the host ship’s
time11
to determine the current state of the database.
Script execution, regardless of the number or sequencing of
commands, potentially alters the state of the database as if
everything happened all at once, even though the commands
execute sequentially. Users only need to be aware of this rule
when applying AS OF
to override NOW. It is possible
for a script to result in different ending state times for
different tables. Violation of time constraints (or any other
error) causes the entire script to fail. (Scripts are always
atomic.)
The second and final rule is once a script introduces a
query—a SELECT command returning data results
(but not
any intermediate SELECT)—all subsequent commands
must
also be queries. Further schema or data changes are not
allowed.
Among the metadata returned by queries are the
schema and content times (labelled schema time
and
data time)
used by the engine to create the query results. A
query executes with a de facto time of the latest of the
two for the state of each underlying table or view. In
order to recreate the same query one needs to specify this
AS OF
time. Thus every query is implicitly or explicitly
idempotent.
Creating a new schema component, DATABASE,
NAMESPACE,12
TABLE, or VIEW, may be backdated (or forward dated) to any
time subsequent to the latest of schema time or content time
(CREATE DATABASE
is of course entirely independent). Users
override schema and data manipulation commands at their
own peril. It is possible to effectively lock all or portions of the
database for update.
CREATE DATABASE db2 AS OF ~2099.1.1 ; %obelisk-result: %results 5 [ %message 'created database %db2' ] [ %server-time ~2024.9.29..21.14.00..7fc8 ] [ %schema-time ~2099.1.1 ]
Getting passing results for all operations on the database now requires a time subsequent to this future creation date.
(crash) CREATE NAMESPACE: %ns1 as-of schema time out of order
The database is not even visible in the current context.
FROM sys.sys.databases AS OF NOW SELECT * ; %obelisk-result: 5 %results [ %message 'SELECT' ] %result-set database sys-agent sys-tmsp data-ship ↳ data-agent data-tmsp 10 db1 /gall/dojo ~2024.9.26 ↳ ~zod /gall/dojo ~2024.9.26 sys /gall/dojo ~2024.9.26 ↳ ~zod /gall/dojo ~2024.9.26 [ %server-time ~2024.9.26..22.03.39..2233 ] 15 [ %message 'sys.sys.databases' ] [ %schema-time ~2024.9.26 ] [ %data-time ~2024.9.26 ] [ %vector-count 2 ]
We can see it in a future time context.
FROM sys.sys.databases AS OF ~2099.1.1 SELECT * ; database sys-agent sys-tmsp data-ship 5 ↳ data-agent data-tmsp animal-shelter /gall/dojo ~2024.10.1 ↳ ~zod /gall/dojo ~2024.10.1 db1 /gall/dojo ~2024.9.26 ↳ ~zod /gall/dojo ~2024.9.26 10db2 /gall/dojo ~2099.1.1 ↳ ~zod /gall/dojo ~2030.1.1 sys /gall/dojo ~2024.9.26 ↳ ~zod /gall/dojo ~2024.9.26
Manipulating AS OF
time for schema components
is rarely useful, but could be used for creating and
populating a database with history. What is much more
useful is setting the content time for query or content
manipulation13
relative to some past state, or mixing times to audit content
changes.
FROM my-table T1 OUTER JOIN my-table AS OF <some past date> T2 WHERE NOT EXISTS T1.key OR NOT EXISTS T2.key 5SELECT * ;
(The example demonstrates urQL’s non-standard join and predicate syntax. A full explanation of these follows.)
The astute reader may have already inferred the missing
piece of the time travel puzzle. When specified in a FROM
clause, the database engine will use the closest content
state less than or equal to the time requested, using the
table schema in a state likewise less than or equal to that
time.
Suppose that my-table has
content mutated at
two times, ~2024.10.2..16.54.41..df15 and
~2024.10.4..17.32.11..a1c2.
FROM db2..my-table-1 AS OF ~2024.10.3 SELECT * ; %obelisk-result: 5 %results [ %message 'SELECT' ] %result-set <results not shown> 10 [ %server-time ~2024.10.30..17.00.48..85c2 ] [ %message 'db1.dbo.my-table-1' ] [ %schema-time ~2023.1.4..14.15.21..ad17 ] [ %data-time ~2024.10.2..16.54.41..df15 ] 15 [ %vector-count 3 ]
As explained above, the database engine returns metadata for
this query which allows us to identify the implicit idempotent
query. Any AS OF
time >= ~2024.10.2..16.54.41..df15
and < ~2024.10.4..17.32.11..a1c2
is the same
idempotent query.
FROM db2..my-table-1 AS OF ~2024.10.2..16.54.41..df15 SELECT * ;
Identifying the idempotent query with the one exactly matching the content state time is important. The current last content state in the table’s series can always be superseded by a content state one tick later.
Idempotent queries have a big technical benefit. The engine can cache query results with never a thought given to cache expiration. Simple systems have fewer bugs and less attack surface. Obelisk does this for views—which in Obelisk are read only. Making a long-running query a view requires only a one-time performance hit...until the backing table data state progresses, in which case there is another one-time performance hit.
So far we have casually referred to tables and views as standard components of relational databases. In this section we introduce Codd’s term, relation.14 (What follows is a hasty summary of Codd’s formal definition, and it is somewhat more restrictive than Codd’s definition in order to make a further point below.) Consider a proper set of data tuples (i.e. every member unique). All the tuples have the same length, and positionally each element across the tuples is of the same type. This is a relation. Arranging the tuples as rows in a 2D matrix (arbitrarily ordered), we can index the columns by unique labels.15 We can refer to each element in the tuples by the column label, yielding a definition of relation for the discussion below.
There is no DISTINCT keyword.
All relations in Obelisk are
proper sets, as Codd intended.
A table in Obelisk is a set of immutable persistently stored relations—indexed by time. Rows are indexed by a unique key referencing one or more column names, which ensures a proper set. Mutating the contents results in a new subsequent relation.
A view is a persistently defined
relation. It is defined by
SELECT commands and set operators over tables
and views.
The relation’s set-like characteristic is maintained by shrinking
the rows to a proper set. Views are indexed by time. A change
to the definition or state of any table referenced in the
definition results in a new time index. As mentioned earlier, if
a user queries a view the system caches its state for that point
in time. We distinguish between persistently defined and
persistently stored because there is no requirement for view
data to persist. Due to query idempotence it can always be
recreated.
The SELECT command is a special
case. It defines a
relation, but it is also a presentation specification. The
resulting presentation may not be a relation (by our restricted
definition), even though the resulting rows always form a
proper set.
FROM my-table SELECT column-1, column-1, column-1
This command defines a proper relation consisting of the column named column-1, but the presentation layer repeats it three times. Thus the resulting presentation is not a (restricted) relation because there are three columns with the same label.
This SELECT does result in a
(restricted) relation,
FROM my-table SELECT column-1, column-1 AS Col-2, column-1 AS Col-3
This quibble is for the sake of emphasizing (our) formal
definition of relation, and to point out when defining a view
the SELECT command must provide for unique
column names.
Informally one may consider the results of SELECT as a
pseudo-relation.
Trivially SELECT without a FROM clause is a one-row
relation because Obelisk always assigns unique labels to
literals if not otherwise aliased.
SELECT 0, ~nomryg-nilref, ~2025.9.24 %obelisk-result: %results 5 [ %message 'SELECT' ] %result-set literal-0 literal-1 literal-2 0 ~nomryg-nilref ~2025.9.24 [ %server-time ~2025.9.24..17.56.02..de98 ] 10 [ %schema-time ~2025.9.22..22.03.28..6f07 ] [ %data-time ~2025.9.22..22.03.28..6f07 ] [ %vector-count 1 ]
(When there is no FROM clause
schema and data times
reference the creation time of the system database.)
The urQL join16
is restricted to the key words JOIN, CROSS JOIN, LEFT JOIN,
RIGHT JOIN,
OUTER JOIN.
JOIN is either a natural join or a
\(\theta \)-join.17
The absence of ON <predicate>
syntactically
determines a natural join, which must adhere to the following
rules.
Natural join is always between two relations, the
subject of the current JOIN clause and any
relation
previously specified in the FROM clause, so order
of relations
matters. The joining characteristic is equality between all
like-named and like-typed columns common between the
two.18
Hence a natural join may result in a relation of more rows
than either of the two input relations. If there are matching
columns with more than one previously listed relation, the join
fails and a \(\theta \)-join is required.
The availability of natural joins encourages transparent and composable schema designs, but bear in mind common metadata columns such as timestamp must have uniquely qualified names across tables and views in order for natural joins to be useful.
Syntactically speaking, a \(\theta \)-join is a join with
ON <predicate>
clause wherein at least one equality or inequality operator
exists with a column from the current relation on one side and
a column from a previously defined relation on the other, and
such an operation must exist on both sides of all OR
conjunctions.
The outer joins LEFT, RIGHT, and OUTER (full outer
join) are substantially the same as for any other sql
implementation and may be natural or \(\theta \)-joins. CROSS JOIN
is
the cartesian join.
To avoid the quagmire of three-valued (or even
four-valued) logic, nulls are entirely banned from
urQL.19
To begin with, columns may not be defined as nullable.
This leaves outer joins as the other way traditional sql
database engines can produce nulls. Once again SELECT
is a special case and does not always produce ad hoc
relations20
in sql or urQL.
The purpose of a data management system is the organization and retrieval of data. The presentation of that data is an ancillary concern. As a reinvention from first principles of relational data management, Obelisk is not obligated to return query results shoehorned into a two-dimensional matrix.
Rather than queries returning column rows of fixed length,
Obelisk returns vectors of unique label, type, and atomic
datum. The labels—usually the column name from the
underlying relation qualified by relation to distinguish
like-named columns—and order of the triples are derived from
the SELECT command. This extends to set
operations,
notably UNION.
Outer joins produce a list of variable length vectors. Let us
call each potential variable length a vector type. All column
name/type labeled values potentially exist in one vector type,
and smaller types have a proper subset of these labels in the
same order. The upper bound for types in a LEFT JOIN
or
RIGHT JOIN
is two and for OUTER JOIN
is three. In the case
of set unions the relation types may share no column types
whatsoever—in which case there is a hypothetical root type
never instantiated.
Presentation software is free to interpret a list of vectors in
any manner. By vertically aligning labels, presentation layers
are free to insert NULL or any other token in
holes.
Returning to a previous query example
FROM my-table T1 OUTER JOIN my-table AS OF <some past date> T2 WHERE NOT EXISTS T1.key OR NOT EXISTS T2.key 5SELECT * ;
returns the following vector list in pseudo-Hoon:
[%vector [p=%t1-key q=[p=~.ud q=1]] [p=%t1-col-2 ...] ↳ ...<remaining columns>...] [%vector [p=%t2-key q=[p=~.ud q=2]] [p=%t2-col-2 ...] ↳ ...<remaining columns>...]
and in the following section we explain the remaining non-standard attribute of this query.
To deal with column type sparseness resulting from outer joins and relation unions, urQL introduces two predicate keywords:
EXISTS This is not the standard sql EXISTS
keyword. Rather it is a unary operator that tests for
existence of a column in a possibly sparse relation
row, or it can test that a cte returns data.
EQUIV This binary
operator is the same as equality,
except comparison of two non-existent columns
and/or ctes results in TRUE.
The standard equality and inequality operators comparing any non-existing column and/or cte results in false.
Relational integrity by means of foreign key indices is planned for a future release, substantially following the sql Standard.
The universality of the data sublanguage lies in its descriptive ability (Codd, 1970).
WITH <ctes> FROM <joins> SCALARS 5WHERE GROUP BY HAVING SELECT ORDER BY 10TOP
The first rule of Composability Club is, “Have a query syntax
that looks compositional.” This ordering is the logical flow of
data and operations defined by the sql standard (with a
notable Obelisk addition discussed below). It is how users and
the database engine agree on constructing proper results.
Actual query execution plans may deviate in some way from
rigidly following this logical path for performance reasons.
This is why, unlike standard sql, urQL syntax sets the
SELECT clause in its logical place, not before
the FROM
clause.
Common Table Expressions (ctes) entered the sql standard21 to make it easier for users to reason about sub-queries. Inlining subqueries makes for bad readability. The sql Standard must maintain inlining backwards compatibility. urQL has no such constraint. It enforces cte composability by banning all subquery inlining.
urQL requires a cte in all cases involving subqueries
through runtime validation. Any usage in a predicate
or scalar function that can accept a column value or
literal also accepts a cte alias provided that the cte
returns a singleton, one row by one column. This is not the
case in the SELECT clause. Selecting a singleton
cte is
functionally equivalent to a CROSS JOIN,
which urQL syntax
prefers in the interest of explicit programming. Scalar
function and predicate operators consuming a list (set)
of values require the cte to return a single column of
data.
WITH Foo AS (...) ... WHERE bar IN Foo
Scalar functions present another opportunity to enforce
composability. The SCALARS keyword introduces a
clause
where all scalar functions must be defined. They are
subsequently referenced by alias. A scalar function operates on
any data in a single data row as well as singleton and list ctes
and returns one consistently typed value.
SCALARS Sc1 IF foo = 1 THEN bar ELSE foobar END-IF Sc2 IF Sc1 = 'what-not' THEN %.y ELSE %.n END-IF Sc3 IF Sc2 THEN dt-col ELSE ~2025.9.6 END-IF
In legacy sql this would all be nested and inlined in a
SELECT clause or predicate. In urQL one inlines
the scalar
alias.
Obelisk supports IF, CASE, COALESCE, and many other
common sql scalar functions.
The Urbit architecture provides for single sign-on (sso) through the immutable identity of all computers on the network. Obelisk security22 allows for grouping identities, much like security on other sql database implementations. In addition parent, child, and sibling are predefined groups providing for subnet security. The Urbit network architecture provides for subnets with two levels of hierarchy—parent, a ship of identity type galaxy, star, or planet, and children, ships of identity type moon.
User permissions are for ships (computers) on the network, for agents (userspace programs), or a combination of the two.23 Permissions can be any combination of whitelists or blacklists affecting databases, namespaces, tables and views, or columns. Permissions may be read-only or for write affecting data. Schema read is allowed, but update can only be performed by the host ship. Whitelisting and blacklisting can be tailored by timeframe, both in regards to the data or schema state permitted or restricted and the effective time range of the permission itself.
Obelisk, like other sql-based relational databases, is susceptible to sql injection attacks, and the mitigation for Obelisk is the same as for other dbmss—implementing stored procedures and adding a level of SP permissions.
Short of dropping a database, any write—even dropping a table—results in a new state. Referencing a prior state and writing to it (with proper planning) can effectively correct any destructive write.
Obelisk security is believed to be robust up to disk and ram hacking.
Key to Codd’s invention of the relational database is his critique of hierarchical databases.24 It took another 20 years to mostly fulfill the demise of accessing data via paths. Yet the concept lingers and is not without merit. Urbit being both a computer fundamentally supporting a hierarchical (tree) architecture and a network of nodes with immutable identities is the perfect substrate for implementing a truly global namespace.25
Obelisk fully supports global namespace access to data and schema. This urQL/Obelisk functionality is not easily duplicated on computers other than Urbit.
SELECT DATABASE animal-shelter
SELECT DATABASE
returns the entire
animal-shelter26
noun.27
Formatted as a path in the global Urbit namespace (beginning with ship identity)
/~nomryg-nilref/obelisk/animal-shelter
any caller on the network with sufficient security permissions can retrieve a copy of the database.
Working our way down the schema hierarchy we have access to namespaces within a database,
SELECT NAMESPACE animal-shelter.reference ~/obelisk/animal-shelter/reference
and pause to point out the path is on the local ship, but still confined by local desk28 permissions. The returned noun is all the user defined tables, views, and indices in the selected namespace, including all their schema and data states.
SELECT my-table ~/obelisk/db1/dbo/my-table
The reader may wonder about the syntax here. Submission of
urQL scripts to the Obelisk parser requires the name of a
default database along with the script, and dbo
is always
the default namespace. The returned noun is the same
metadata and result set whose print format we have seen
above.
We cannot tell from the syntax (or the returned metadata)
whether the query returned the data from a table or view.
Views can shadow tables. To force a table add the TABLE
keyword to the script, or in the namespace,
~/obelisk/db1/dbo/table/my-table
and use AS OF
for a particular state,
~/obelisk/db1/dbo/my-table/~2020.1.26
The final drill down in the namespace is to the column level,
SELECT db1..my-table.column-4 ~/obelisk/db1/dbo/my-table/column-4
Keep in mind Obelisk returns proper sets of data rows. Only a view is capable of deterministic ordering of the result rows when accessing data in this manner.
Shrubbery support provides for hybrid relational/hierarchical data processing. The Hawk Urbit implementation of namespace computing provides for scripting language or api usage of Obelisk as well as path access and post processing. There is already a working adaptation of a sql Studio-like interface to Obelisk available in Hawk.
Bad programmers worry about the code. Good programmers worry about data structures and their relationships (Torvalds, 2016).
To better understand the workings of Obelisk we turn to the data structures. Obelisk is a work in progress approaching a beta release, so some structures still may change over time.
The Obelisk parser parses urQL commands into the structures
in file /sur/ast, which also serves as the
Obelisk api. We
show the most important schema and content manipulating
commands in Figure 1. A qualified-column is one
qualified by a qualified- table.
:: create-database $: %create-database name=@tas as-of=(unit as-of) == :: drop-database $: %drop-database name=@tas force=? == :: create-namespace $: %create-namespace database-name=@tas name=@tas as-of=(unit as-of) == :: create-table $: %create-table table=qualified-table columns=(list column) pri-indx=(list ordered-column) foreign-keys=(list foreign-key) as-of=(unit as-of) == :: predicate (tree predicate-component) :: qualified-table $: %qualified-table ship=(unit @p) database=@tas namespace=@tas name=@tas alias=(unit @t) == :: drop-table $: %drop-table table=qualified-table force=? as-of=(unit as-of) == :: update $: %update ctes=(list cte) table=qualified-table as-of=(unit as-of) $: columns=(list qualified-column) values=(list value-or-default) == =predicate == :: delete $: %delete ctes=(list cte) table=qualified-table as-of=(unit as-of) =predicate == :: truncate-table $: %truncate-table table=qualified-table as-of=(unit as-of) == :: predicate-component $? ops-and-conjs qualified-column unqualified-column dime value-literals cte-alias scalar-alias aggregate == :: column $: %column name=@tas type=@ta ==
Obelisk does not yet support interacting with tables on foreign ships, so qualification is currently only by database and namespace. An alias may be assigned in the urQL. There is no distinction in urQL or the ast between tables and views. A view may shadow a table in the context of queries, but schema and data manipulating commands always operate on the specified table.
We see that DELETE requires a
predicate (unlike standard
sql) and the predicate data structure is tree-shaped.
Predicate nodes are operators, conjunctions, or any number of
representations of data.
Figure 2 presents the selection data structure, also a
tree structure which accommodates either a set operator
or one of three other structures as nodes. insert and
merge are for future implementation using the
%into
operator. The query data structure is central to
reading and
formatting persisted data and can be combined with any of
the regular set operators (%union, %except, %intersect,
%divided-by). If there are ctes, they are processed first to
return one relation each. Prior cte relations are accessible by
subsequent ctes. Internally different kinds of relation data
structure represent data returned by ctes (cte-alias) or
more commonly qualified-table, which includes
(possibly
table-shadowing) views. (query-row is a
placeholder for
future temporary table urQL syntax.)
:: selection $: %selection ctes=(list cte) set-functions=(tree set-function) == :: set-function $?(set-cmd set-op) :: query $: %query from=(unit from) scalars=(list scalar) =predicate group-by=(list grouping-column) having=predicate =select order-by=(list ordering-column) == :: set-op $? %union %except %intersect %divided-by %divide-with-remainder %into == :: set-cmd $%(insert merge query) :: cte $: %cte name=@tas =query == :: from $: %from =relation as-of=(unit as-of) joins=(list joined-object) == :: scalar $: %scalar scalar=scalar-function alias=@t == :: select $: %select top=(unit @ud) columns=(list selected-column) == :: selected-column $% qualified-column unqualified-column selected-aggregate selected-value selected-all selected-all-object == :: insert $: %insert table=qualified-table as-of=(unit as-of) columns=(unit (list @tas)) values=insert-values == :: relation $% qualified-table cte-alias query-row == :: joined-object $: %joined-object join=join-type =relation as-of=(unit as-of) =predicate ==
selected-column deals with all
the different ways
columns and data may be selected in urQL, qualified and
unqualified columns, literals, all columns across all joins, and
all columns of a specified relation. (Aggregate functions are for
a future release.)
A potential enhancement to the urQL syntax and ast in a
future release is to support zero-ary structures in the SELECT
clause of a query. This is an area of research enabled by
Obelisk’s ability to produce result vectors and sub-vectors (see
outer joins above). The model is how the Datalog language
defines zero-ary structures, beginning a sub-structure with
:- colhep and ending it with a . dot, allowing nested
sub-structures.
The example
SELECT column-1, column-2, :- column-3, column-4., ↳ column-5
would return one vector of
column-1, column-2,
column-5, and one vector of column-3, column-4
for every instance of the same column-1, column-2,
column-5. This would open the door to returning
complex
nouns from queries.
At the very top of the Persistence Layer (Figure3) is the server data structure.
+$ server (map @tas database)
This is simply an index into all the databases, including the
%sys database tracking other databases. This is
the noun
which the Obelisk app persists. Alternate implementations
using an app structure other than the one provided with
Obelisk merely need to persist this noun.
:: database $: %database name=@tas created-provenance=path created-tmsp=@da sys=((mop @da schema) gth) content=((mop @da dataa) gth) =view-cache == :: view-cache ((mop ns-obj-key cache) ns-obj-comp) :: ns-obj-key $: ns=@tas obj=@tas time=@da == :: cache $: %cache tmsp=@da content=(unit cache-content) == :: ns-obj-comp |= [p=data-obj-key q=data-obj-key] ^- ? :: cache-content $: rowcount=@ rows=(list (map @tas @)) == :: schema $: %schema provenance=path tmsp=@da =namespaces tables=(map [@tas @tas] table) views=((mop data-obj-key view) ns-obj-comp) == :: data $: %data ship=@p provenance=path tmsp=@da files=(map [@tas @tas] file) == :: table $: %table provenance=path tmsp=@da =column-lookup type-lookup=(map @tas @ta) pri-indx=index columns=(list column) indices=(list index) == :: column-lookup (map @tas [aura @]) :: view $: %view provenance=path tmsp=@da =selection =column-lookup type-lookup=(map @tas @ta) columns=(list column) ordering=(list column-order) ==
:: file $: %file ship=@p provenance=path tmsp=@da =column-addrs rowcount=@ pri-idx=(tree [(list @) (map @tas @)]) indexed-rows=(list indexed-row) =column-catalog == :: indexed-row $: %indexed-row key=(list @) data=(map @tas @) == ::
The database structure reveals
the simplicity of Obelisk
time-travel. The database schema and content are both +mops
(ordered +maps) indexed by time. The magic of
Urbit noun
de-duplication does the rest.
view-cache provides for fast
retrieval of view data after
a view has been requested once. Any alteration of table data
underlying a view will create a new entry keyed on namespace,
view name, and time, but without content. The first
request for that key will cause the entire selection
to execute and save the content. There are currently
only system views. User-defined views will be in a later
release.
table defines the schema used to
persist data for a
persisted relation. 'column-lookup'
returns the column
type (aura) as well as its position in the canonical ordering.
tables require a unique primary index, and for
now there are
no secondary table indices.
+$ index $: %index unique=? key=(list key-column) 5== == +$ key-column $: %key-column name=@tas =aura 10 ascending=? == ==
:: action $% $: %tape default-database=@tas urql=tape == $: %commands cmds=(list command) == == :: cmd-result $: %results (list result) == :: vector $: %vector (lest vector-cell) == :: vector-cell [p=@tas q=dime] :: result $% [%message msg=@t] [%vector-count count=@ud] [%server-time date=@da] [%security-time date=@da] [%schema-time date=@da] [%data-time date=@da] [%result-set (list vector)] == :: command $% alter-index alter-namespace alter-table create-database create-index create-namespace create-table create-view delete drop-database drop-index drop-namespace drop-table drop-view grant revoke selection truncate-table update ==
columns returns the columns
in
canonical order (the order
originally defined).
With the table definition established, the actual data is in
a file structure, also containing relevant
metadata. At the
most basic level, every row of persisted data consists of a
key—a list of atoms which is one or more of the subsequent
columns, and the column data, which maps from column name
to the atomic value. Of all the persistence layer structures,
the one actually persisting data is probably the most
prone to future development. For purposes of alternating
query execution plans, data is persisted in more than
one way. (tree [(list @)
(map @tas @)])
is an
ordered map allowing ordered selection on partial keys.
column-catalog is a currently abandoned
attempt
at
persisting data column-wise. (Inserting large amounts of
data stored column-wise proved impractical, so for now
column-wise storage is deferred.) indexed-rows
is the only
form currently in use, so for the beta release every query plan
execution is a table scan.
Aside from urQL, Obelisk users only ever interact with the
presentation layer (Figure 4). The
Obelisk app allows for two
actions, %tape, which submits a command and
default
database name in a urQL $tape, and %commands, which is a
list of api commands where every table and view is already
fully qualified by database.
This section has covered most of the available commands; several planned commands like alters, drops, and security commands will parse but are not yet supported in the Obelisk engine.
Every urQL command returns a result structure of metadata and vectors, which we have already covered.
An untapped area of investigation is to change the
%result-set from returning (list vector)
to returning
a list of trap cores which emit vector when evaluated. The
lookup of columns row by row in the query relation is
computationally intense and could be offloaded to the client
side.
A database scripting language should provide the most intuitive and complete experience possible for organizing and querying data. Uniting the relational data model with set theory and predicate calculus has proven to be the most effective general means of organizing and retrieving data to date, however the hierarchical model still has its place and should be supported as far as possible. We have seen that even sql databases have at least a three-level hierarchy, four including namespaces, five when implemented as a temporal database.
Obelisk and its urQL scripting language represent a
reimagining of the relational database paradigm, inspired
by and tailored to the unique architecture of the Urbit
computer while remaining adaptable to other computing
environments and free of the constraints of sql’s backward
compatibility, Obelisk leverages E. F. Codd’s foundational
principles to address longstanding deficiencies in traditional
rdbmss. Its innovations—time travel through database states,
idempotent queries, proper set operations, elimination of
nulls, and enforced composability—offer a streamlined,
transparent, and powerful approach to data management. The
integration of Urbit’s pure functional model ensures that all
operations, whether through the urQL scripting language or
the api, are consistent and accessible, with no hidden
functionality. Additionally, through Urbit’s global hierarchical
namespace, data access via namespace path enhances
flexibility, while Urbit’s identity-based security model
provides robust protection. Obelisk not only advances the
state of relational databases but also sets a new standard for
clarity, consistency, and innovation in structured data
systems.
/* The %sys database contains provenance metadata for all the user databases. */ 5FROM sys.sys.databases SELECT database, sys-agent, sys-tmsp, data-ship, data-agent, data-tmsp ; /* Set the default database to animal-shelter for the 10remaining queries. The rest of the schema metadata is in %sys namespace views of each user database. */ FROM sys.tables 15SELECT namespace, name, agent, tmsp, row-count ; FROM sys.columns SELECT namespace, name, col-ordinal, col-name, ↳ col-type ; 20 /* The %sys namespace views also tracks the schema and content history. */ 25FROM sys.sys-log SELECT tmsp, agent, component, name ; FROM sys.data-log SELECT tmsp, ship, agent, namespace, table, row-count ; 30/* Start exploring. Return complete sentences, if you like. */ FROM reference.calendar T1 35JOIN reference.calendar-us-fed-holiday T2 WHERE T1.date BETWEEN ~2025.1.1 AND ~2025.12.31 SELECT 'On this day: ', day-name, ',', T1.date, ↳ ', is the holiday ', us-federal-holiday, '.' ; 40 /* Create a new database, add a table and insert data. The default namespace is %dbo. */ 45CREATE DATABASE db2 AS OF ~2000.1.1 ; CREATE TABLE db2..my-table-1 (col1 @t, col2 @da) PRIMARY KEY (col1) ; 50INSERT INTO db2..my-table-1 (col1, col2) VALUES ('today', ~2000.1.1) ('tomorrow', ~2000.1.2) ('next-day', ~2000.1.3) ('next-today', ~2000.1.1) 55 ('next-tomorrow', ~2000.1.2) ('next-next-day', ~2000.1.3) ;
Chamberlin, Donald D. and Raymond F. Boyce (1974). “sequel: A Structured English Query Language.” In: sigfidet ’74: Proceedings of the 1974 acm sigfidet (now sigmod) Workshop on Data Description, Access and Control, pp. 249–264. url: https://doi.org/10.1145/800296.811515 (visited on ~2025.9.5).
Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Tech. rep. San Jose, California: ibm Research Laboratory. url: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf (visited on ~2025.9.5).
— (1990). The Relational Model for Database Management: Version 2. Addison-Wesley.
Darwen, Hugh and C. J. Date (1998). Databases, Types, and the Relational Model: The Third Manifesto. Revised 2006. Addison-Wesley.
— (2002). Temporal Data and the Relational Model. Morgan Kaufmann.
~hastuc-dibtux, Liam Fitzgerald
(2024)
“Shrub”.
url:
https://github.com/urbit/shrub
(visited
on
~2025.9.27).
Levin, Ami (2020) “Animal Shelter”. url: https://github.com/ami-levin/Animal_Shelter (visited on ~2025.9.13).
~migrev-dolseg, William Hanlen
(2025)
“ways
hawk
is
practically
useful
for
me,
right
now”.
url:
https://willhanlen.com/~~/outbox/25/7/my-hawk-uses/
(visited
on
~2025.9.8).
~rovnys-ricfer, Theodore Blackman
(n. d.,
ca. 2023)
“Young
Shrub”.
url:
https://rovnys.cataphract.us/young-shrub
(visited
on
~2025.9.8).
~sorreg-namtyv, Curtis Yarvin
et al.
(2016).
Urbit:
A
Solid-State
Interpreter.
Whitepaper.
Tlon
Corporation.
url:
https://media.urbit.org/whitepaper.pdf
(visited
on
~2024.1.25).
3Codd (1970); Chamberlin and Boyce (1974) Codd’s seminal paper focuses primarily on studying set relations and does not invent a query language, but sets out the parameters for an effective and efficient language. Predicate calculus was already the obvious way to filter stored data. Chamberlin and Boyce (1974) treat it as a given, discussing their syntax for applying it.⤴
5Darwen and Date (1998); Darwen and Date (2002). While influential in other aspects of Obelisk’s design. We are less familiar with their work on temporal databases, so their research on temporal databases was not especially influential. Without any direct knowledge of his work, we credit the reputation of Rich Hickey.⤴
6~sorreg-namtyv et al. (2016). Is any sufficiently complex computer function completely pure? Urbit and Obelisk both rely on crash to reject invalid commands that parsing and the api type system cannot otherwise validate.⤴
7The noun is an Urbit abstraction that maps beautifully to English. It is either a natural number, or an ordered pair of nouns. Thus the fundamental data structure of Urbit (and Obelisk) is a binary tree of natural numbers.⤴
9With the sole exception of dropping a database from the server, in which case all traces of the database vanish from the server.⤴
10Currently the only schema
altering
commands available are
CREATE and
DROP.
ALTER is on
the
roadmap for an undetermined future
release.⤴
11A ship is the generic term for an Urbit computer on the Urbit network, which has a unique, immutable, cryptographically protected identity. The ship’s time corresponds to the computer time on the Urbit VM’s host computer, whether Linux/Unix, Mac, or Windows.⤴
13Setting the AS OF
time when manipulating data (INSERT,
UPDATE,
DELETE)
results in a new
content state based on the state in
effect at that time.⤴
15Here we deviate from Codd’s formal definition, where he only suggests the columns—he has formally defined them as domains—may be labeled, the domains are not necessarily unique. One must bear in mind Codd’s work is theoretic. He even points out that he is not designing a data query language. He is providing the foundation upon which to design query languages.⤴
16We respect the purity of Codd’s original relational algebra, thus there is no inner join only join. Outer joins are a set union on a joined relation as well as the remaining rows not being joined (Codd, 1990, p. 107). Nonetheless outer joins have great practical benefit and we opt to dispense with superfluous keywords. Codd never published original research on outer joins, but after their introduction by other authors (Chamberlin and Boyce, 1974), he covered them in a book (Codd, 1990). Oracle is widely credited with the first adoption of outer joins in the early 1980s, and they were first included in the ansi sql-92 standard. Outer joins are not included in the first Obelisk beta.⤴
20It is beyond the scope of this paper to invent a new formal definition of relation. We imagine that instead of a relation consisting of one tuple type of ordered atomic values, there would be a root tuple type of ordered unique label, type, value triples. (Type ensures strong typing. An untyped version is more general.) Any number of unique sub-types consisting of any subset of these triples in the same order may also be present. Arranged vertically all the tuples align on label/type matches. Any resulting 2D matrix having more than the root tuple type present has some number of holes in every occurrence of a sub-type, and is thus a sparse relation.⤴
24Codd (1970), “1.1.3 Access Path Dependence”. Codd uses the term data store which is mostly interchangeable with database, even if they conjure up different images in the mind.⤴
25Not in the first beta release. The shrubbery concept of a universal namespace is an organic outgrowth of the Urbit file system (Clay). Within Urbit circles the concept is still amorphous. It could be more than an Urbit network uri, with endpoints functioning as data or software. It is hard to identify a single author or inventor. ~rovnys-ricfer originated the term in conversation with ~littel-wolfur and was the first to write about it (~rovnys-ricfer, personal communication, ~2025.10.18; ~rovnys-ricfer, undated). ~sarpen-laplux developed the earliest working implementation in 2023. ~hastuc-dibtux subsequently carried out extensive prototyping and advocacy (2024). ~migrev-dolseg’s %hawk Gall app is the most readily available and extensive implementation in this tradition to date (2025).⤴
26Levin (2020) Available for several database platforms, this is a readily available non-trivial sample database that demonstrates natural primary table keys.⤴