Obelisk: Reinventing SQL for Modern Computing

Obelisk: Reinventing SQL
for Modern Computing

Jack Fox ~nomryg-nilref
FoxyLabs

Abstract

Obelisk is a novel relational database management system and its scripting language, urQL, designed for the experimental Urbit computing platform but adaptable to other systems. Building on E. F. Codd’s relational algebra and unencumbered by backward compatibility constraints, Obelisk is a temporal relational database derived from ﬁrst principles. Key innovations include time travel for querying historical database states, proper set implementation, elimination of nulls, idempotent queries, and enforced composability. Obelisk ensures all user operations are accessible via both the urQL scripting language and api, reﬂecting Urbit’s pure functional architecture. Obelisk’s time-travel feature indexes every schema and content state change by time, ensuring idempotent queries and atomic script execution. This paper explores how Obelisk advances relational database design while addressing sql’s deﬁciencies, oﬀering a modern, principled approach to structured data management.

1 Introduction
2 Time Travel
3 The Relational Model
3.1 Relations
3.2 Proper sets
3.3 Joins
3.4 No Nulls
3.5 Predicate Calculus
3.6 Relational Integrity
4 Composability
4.1 Common Table Expressions
4.2 Scalar Functions
5 Security
6 The Shrubbery
7 Data Structures
7.1 ast/api
7.2 Persistence Layer
7.3 Presentation Layer
8 Conclusion
Appendix: Examples
References

1 Introduction

The Obelisk relational database management system (rdbms) and urQL scripting language¹ run on the experimental Urbit computer and network but could just as well be implemented on any other computer—except for certain features dependent on Hoon language features or the Urbit network architecture.² Like all relational databases, it is built upon E. F. Codd’s pioneering computer science work developing the relational algebra and uniting it with set theory and predicate calculus.³ As relational databases implementing sql came to dominate structured data storage the sql standard evolved, always bearing, and hindered by, the necessary constraint of backward compatibility. As the ﬁrst and only rdbms for a new computing paradigm, Obelisk has no obligation to the past and so is free to build anew from ﬁrst principles with all the learnings of the last half century while perpetuating the best features of sql.

Obelisk’s urQL, while clearly a dialect of sql, ﬁxes numerous deﬁciencies in sql⁴ , elevates composability to a syntactic requirement (not merely an option), adds one completely new major feature, time travel,⁵ and leaves no functionality inaccessible to the normal user.

The rdbms evolution that is Obelisk was greatly inspired by the Urbit architecture. Urbit is a computer whose entire state (OS and userspace) is a pure function of its prior state.⁶ The Urbit computer itself is an acid database. The entire state of the server is one noun⁷ that advances only through pure functions. urQL’s syntax is more than a “better" scripting language, it is an accurate reﬂection of the underlying database server’s operations. Anything the user can accomplish through urQL they can also do through the api⁸ , and vice versa. Together with Obelisk’s native time travel capabilities and idempotence, there are no mysteries confronting the user, no functionality outside the scope of the api/scripting language like performing audits with database logs. There are no logs. (There are system views that emulate logs to support the familiar log analysis pattern.) Obelisk’s state is both its current state and all prior states.⁹

Obelisk’s major innovations include:

Time travel
Proper sets
No nulls
Composability

2 Time Travel

The totality of data in a data bank may be viewed as a collection of time-varying relations (Codd, 1970).

In Obelisk time is primary and fundamental. Every change of state, whether to a database’s schema or content, is indexed by time. Thus every query is idempotent because each query is implicitly or explicitly associated with a particular state in the series. And time is a series of steps, like in physical reality. A time step may be appended to the series, but there is no inserting, updating, or deleting time steps.

The rules enforcing time primacy in the Obelisk database engine are simple. Each database holds a most recent schema¹⁰ time (by user data table) and a most recent content time (by user data table). Every subsequent state change, whether to schema or content must be subsequent to the latest of the two times. Normally the user never needs to concern themselves with this requirement. When not overridden by AS OF the database engine applies the host ship’s time¹¹ to determine the current state of the database.

Script execution, regardless of the number or sequencing of commands, potentially alters the state of the database as if everything happened all at once, even though the commands execute sequentially. Users only need to be aware of this rule when applying AS OF to override NOW. It is possible for a script to result in diﬀerent ending state times for diﬀerent tables. Violation of time constraints (or any other error) causes the entire script to fail. (Scripts are always atomic.)

The second and ﬁnal rule is once a script introduces a query—a SELECT command returning data results (but not any intermediate SELECT)—all subsequent commands must also be queries. Further schema or data changes are not allowed.

Among the metadata returned by queries are the schema and content times (labelled schema time and data time) used by the engine to create the query results. A query executes with a de facto time of the latest of the two for the state of each underlying table or view. In order to recreate the same query one needs to specify this AS OF time. Thus every query is implicitly or explicitly idempotent.

Creating a new schema component, DATABASE, NAMESPACE,¹² TABLE, or VIEW, may be backdated (or forward dated) to any time subsequent to the latest of schema time or content time (CREATE DATABASE is of course entirely independent). Users override schema and data manipulation commands at their own peril. It is possible to eﬀectively lock all or portions of the database for update.

CREATE DATABASE db2 AS OF ~2099.1.1 ; 
 
%obelisk-result: 
  %results 
5    [ %message 'created database %db2' ] 
    [ %server-time ~2024.9.29..21.14.00..7fc8 ] 
    [ %schema-time ~2099.1.1 ]

Getting passing results for all operations on the database now requires a time subsequent to this future creation date.

(crash) 
CREATE NAMESPACE: %ns1 as-of schema time out of order

The database is not even visible in the current context.

FROM sys.sys.databases AS OF NOW 
SELECT * ; 
 
%obelisk-result: 
5  %results 
    [ %message 'SELECT' ] 
    %result-set 
      database  sys-agent  sys-tmsp  data-ship 
       ↳ data-agent  data-tmsp 
10      db1  /gall/dojo  ~2024.9.26 
       ↳ ~zod  /gall/dojo  ~2024.9.26 
      sys  /gall/dojo  ~2024.9.26 
       ↳ ~zod  /gall/dojo  ~2024.9.26 
    [ %server-time ~2024.9.26..22.03.39..2233 ] 
15    [ %message 'sys.sys.databases' ] 
    [ %schema-time ~2024.9.26 ] 
    [ %data-time ~2024.9.26 ] 
    [ %vector-count 2 ]

We can see it in a future time context.

FROM sys.sys.databases AS OF ~2099.1.1 
SELECT * ; 
 
database  sys-agent  sys-tmsp  data-ship 
5 ↳ data-agent  data-tmsp 
animal-shelter  /gall/dojo  ~2024.10.1 
 ↳ ~zod  /gall/dojo  ~2024.10.1 
db1  /gall/dojo  ~2024.9.26 
 ↳ ~zod  /gall/dojo  ~2024.9.26 
10db2  /gall/dojo  ~2099.1.1 
 ↳ ~zod  /gall/dojo  ~2030.1.1 
sys  /gall/dojo  ~2024.9.26 
 ↳ ~zod  /gall/dojo  ~2024.9.26

Manipulating AS OF time for schema components is rarely useful, but could be used for creating and populating a database with history. What is much more useful is setting the content time for query or content manipulation¹³ relative to some past state, or mixing times to audit content changes.

FROM my-table T1 
OUTER JOIN my-table AS OF <some past date> T2 
WHERE NOT EXISTS T1.key 
   OR NOT EXISTS T2.key 
5SELECT * ;

(The example demonstrates urQL’s non-standard join and predicate syntax. A full explanation of these follows.)

The astute reader may have already inferred the missing piece of the time travel puzzle. When speciﬁed in a FROM clause, the database engine will use the closest content state less than or equal to the time requested, using the table schema in a state likewise less than or equal to that time.

Suppose that my-table has content mutated at two times, ~2024.10.2..16.54.41..df15 and ~2024.10.4..17.32.11..a1c2.

FROM db2..my-table-1 AS OF ~2024.10.3 
SELECT * ; 
 
%obelisk-result: 
5  %results 
    [ %message 'SELECT' ] 
    %result-set 
 
      <results not shown> 
10 
    [ %server-time ~2024.10.30..17.00.48..85c2 ] 
    [ %message 'db1.dbo.my-table-1' ] 
    [ %schema-time ~2023.1.4..14.15.21..ad17 ] 
    [ %data-time ~2024.10.2..16.54.41..df15 ] 
15    [ %vector-count 3 ]

As explained above, the database engine returns metadata for this query which allows us to identify the implicit idempotent query. Any AS OF time >= ~2024.10.2..16.54.41..df15 and < ~2024.10.4..17.32.11..a1c2 is the same idempotent query.

FROM db2..my-table-1 AS OF ~2024.10.2..16.54.41..df15 
SELECT * ;

Identifying the idempotent query with the one exactly matching the content state time is important. The current last content state in the table’s series can always be superseded by a content state one tick later.

Idempotent queries have a big technical beneﬁt. The engine can cache query results with never a thought given to cache expiration. Simple systems have fewer bugs and less attack surface. Obelisk does this for views—which in Obelisk are read only. Making a long-running query a view requires only a one-time performance hit...until the backing table data state progresses, in which case there is another one-time performance hit.

3 The Relational Model

3.1 Relations

So far we have casually referred to tables and views as standard components of relational databases. In this section we introduce Codd’s term, relation.¹⁴ (What follows is a hasty summary of Codd’s formal deﬁnition, and it is somewhat more restrictive than Codd’s deﬁnition in order to make a further point below.) Consider a proper set of data tuples (i.e. every member unique). All the tuples have the same length, and positionally each element across the tuples is of the same type. This is a relation. Arranging the tuples as rows in a 2D matrix (arbitrarily ordered), we can index the columns by unique labels.¹⁵ We can refer to each element in the tuples by the column label, yielding a deﬁnition of relation for the discussion below.

3.2 Proper sets

There is no DISTINCT keyword. All relations in Obelisk are proper sets, as Codd intended.

A table in Obelisk is a set of immutable persistently stored relations—indexed by time. Rows are indexed by a unique key referencing one or more column names, which ensures a proper set. Mutating the contents results in a new subsequent relation.

A view is a persistently deﬁned relation. It is deﬁned by SELECT commands and set operators over tables and views. The relation’s set-like characteristic is maintained by shrinking the rows to a proper set. Views are indexed by time. A change to the deﬁnition or state of any table referenced in the deﬁnition results in a new time index. As mentioned earlier, if a user queries a view the system caches its state for that point in time. We distinguish between persistently deﬁned and persistently stored because there is no requirement for view data to persist. Due to query idempotence it can always be recreated.

The SELECT command is a special case. It deﬁnes a relation, but it is also a presentation speciﬁcation. The resulting presentation may not be a relation (by our restricted deﬁnition), even though the resulting rows always form a proper set.

FROM my-table 
SELECT column-1, column-1, column-1

This command deﬁnes a proper relation consisting of the column named column-1, but the presentation layer repeats it three times. Thus the resulting presentation is not a (restricted) relation because there are three columns with the same label.

This SELECT does result in a (restricted) relation,

FROM my-table 
SELECT column-1, column-1 AS Col-2, column-1 AS Col-3

This quibble is for the sake of emphasizing (our) formal deﬁnition of relation, and to point out when deﬁning a view the SELECT command must provide for unique column names. Informally one may consider the results of SELECT as a pseudo-relation.

Trivially SELECT without a FROM clause is a one-row relation because Obelisk always assigns unique labels to literals if not otherwise aliased.

SELECT 0, ~nomryg-nilref, ~2025.9.24 
 
%obelisk-result: 
  %results 
5    [ %message 'SELECT' ] 
    %result-set 
      literal-0  literal-1  literal-2 
      0  ~nomryg-nilref  ~2025.9.24 
    [ %server-time ~2025.9.24..17.56.02..de98 ] 
10    [ %schema-time ~2025.9.22..22.03.28..6f07 ] 
    [ %data-time ~2025.9.22..22.03.28..6f07 ] 
    [ %vector-count 1 ]

(When there is no FROM clause schema and data times reference the creation time of the system database.)

3.3 Joins

The urQL join¹⁶ is restricted to the key words JOIN, CROSS JOIN, LEFT JOIN, RIGHT JOIN, OUTER JOIN. JOIN is either a natural join or a $\theta $-join.¹⁷

The absence of ON <predicate> syntactically determines a natural join, which must adhere to the following rules.

Natural join is always between two relations, the subject of the current JOIN clause and any relation previously speciﬁed in the FROM clause, so order of relations matters. The joining characteristic is equality between all like-named and like-typed columns common between the two.¹⁸ Hence a natural join may result in a relation of more rows than either of the two input relations. If there are matching columns with more than one previously listed relation, the join fails and a $\theta $-join is required.

The availability of natural joins encourages transparent and composable schema designs, but bear in mind common metadata columns such as timestamp must have uniquely qualiﬁed names across tables and views in order for natural joins to be useful.

Syntactically speaking, a $\theta $-join is a join with ON <predicate> clause wherein at least one equality or inequality operator exists with a column from the current relation on one side and a column from a previously deﬁned relation on the other, and such an operation must exist on both sides of all OR conjunctions.

The outer joins LEFT, RIGHT, and OUTER (full outer join) are substantially the same as for any other sql implementation and may be natural or $\theta $-joins. CROSS JOIN is the cartesian join.

3.4 No Nulls

To avoid the quagmire of three-valued (or even four-valued) logic, nulls are entirely banned from urQL.¹⁹ To begin with, columns may not be deﬁned as nullable. This leaves outer joins as the other way traditional sql database engines can produce nulls. Once again SELECT is a special case and does not always produce ad hoc relations²⁰ in sql or urQL.

The purpose of a data management system is the organization and retrieval of data. The presentation of that data is an ancillary concern. As a reinvention from ﬁrst principles of relational data management, Obelisk is not obligated to return query results shoehorned into a two-dimensional matrix.

Rather than queries returning column rows of ﬁxed length, Obelisk returns vectors of unique label, type, and atomic datum. The labels—usually the column name from the underlying relation qualiﬁed by relation to distinguish like-named columns—and order of the triples are derived from the SELECT command. This extends to set operations, notably UNION.

Outer joins produce a list of variable length vectors. Let us call each potential variable length a vector type. All column name/type labeled values potentially exist in one vector type, and smaller types have a proper subset of these labels in the same order. The upper bound for types in a LEFT JOIN or RIGHT JOIN is two and for OUTER JOIN is three. In the case of set unions the relation types may share no column types whatsoever—in which case there is a hypothetical root type never instantiated.

Presentation software is free to interpret a list of vectors in any manner. By vertically aligning labels, presentation layers are free to insert NULL or any other token in holes.

Returning to a previous query example

FROM my-table T1 
OUTER JOIN my-table AS OF <some past date> T2 
WHERE NOT EXISTS T1.key 
   OR NOT EXISTS T2.key 
5SELECT * ;

returns the following vector list in pseudo-Hoon:

[%vector [p=%t1-key q=[p=~.ud q=1]] [p=%t1-col-2 ...] 
 ↳ ...<remaining columns>...] 
[%vector [p=%t2-key q=[p=~.ud q=2]] [p=%t2-col-2 ...] 
 ↳ ...<remaining columns>...]

and in the following section we explain the remaining non-standard attribute of this query.

3.5 Predicate Calculus

To deal with column type sparseness resulting from outer joins and relation unions, urQL introduces two predicate keywords:

EXISTS This is not the standard sql EXISTS keyword. Rather it is a unary operator that tests for existence of a column in a possibly sparse relation row, or it can test that a cte returns data.
EQUIV This binary operator is the same as equality, except comparison of two non-existent columns and/or ctes results in TRUE.

The standard equality and inequality operators comparing any non-existing column and/or cte results in false.

3.6 Relational Integrity

Relational integrity by means of foreign key indices is planned for a future release, substantially following the sql Standard.

4 Composability

The universality of the data sublanguage lies in its descriptive ability (Codd, 1970).

WITH <ctes> 
FROM 
<joins> 
SCALARS 
5WHERE 
GROUP BY 
HAVING 
SELECT 
ORDER BY 
10TOP

The ﬁrst rule of Composability Club is, “Have a query syntax that looks compositional.” This ordering is the logical ﬂow of data and operations deﬁned by the sql standard (with a notable Obelisk addition discussed below). It is how users and the database engine agree on constructing proper results. Actual query execution plans may deviate in some way from rigidly following this logical path for performance reasons. This is why, unlike standard sql, urQL syntax sets the SELECT clause in its logical place, not before the FROM clause.

4.1 Common Table Expressions

Common Table Expressions (ctes) entered the sql standard²¹ to make it easier for users to reason about sub-queries. Inlining subqueries makes for bad readability. The sql Standard must maintain inlining backwards compatibility. urQL has no such constraint. It enforces cte composability by banning all subquery inlining.

urQL requires a cte in all cases involving subqueries through runtime validation. Any usage in a predicate or scalar function that can accept a column value or literal also accepts a cte alias provided that the cte returns a singleton, one row by one column. This is not the case in the SELECT clause. Selecting a singleton cte is functionally equivalent to a CROSS JOIN, which urQL syntax prefers in the interest of explicit programming. Scalar function and predicate operators consuming a list (set) of values require the cte to return a single column of data.

WITH Foo AS (...) 
... 
WHERE bar IN Foo

4.2 Scalar Functions

Scalar functions present another opportunity to enforce composability. The SCALARS keyword introduces a clause where all scalar functions must be deﬁned. They are subsequently referenced by alias. A scalar function operates on any data in a single data row as well as singleton and list ctes and returns one consistently typed value.

SCALARS 
  Sc1 IF foo = 1 THEN bar ELSE foobar END-IF 
  Sc2 IF Sc1 = 'what-not' THEN %.y ELSE %.n END-IF 
  Sc3 IF Sc2 THEN dt-col ELSE ~2025.9.6 END-IF

In legacy sql this would all be nested and inlined in a SELECT clause or predicate. In urQL one inlines the scalar alias.

Obelisk supports IF, CASE, COALESCE, and many other common sql scalar functions.

5 Security

The Urbit architecture provides for single sign-on (sso) through the immutable identity of all computers on the network. Obelisk security²² allows for grouping identities, much like security on other sql database implementations. In addition parent, child, and sibling are predeﬁned groups providing for subnet security. The Urbit network architecture provides for subnets with two levels of hierarchy—parent, a ship of identity type galaxy, star, or planet, and children, ships of identity type moon.

User permissions are for ships (computers) on the network, for agents (userspace programs), or a combination of the two.²³ Permissions can be any combination of whitelists or blacklists aﬀecting databases, namespaces, tables and views, or columns. Permissions may be read-only or for write aﬀecting data. Schema read is allowed, but update can only be performed by the host ship. Whitelisting and blacklisting can be tailored by timeframe, both in regards to the data or schema state permitted or restricted and the eﬀective time range of the permission itself.

Obelisk, like other sql-based relational databases, is susceptible to sql injection attacks, and the mitigation for Obelisk is the same as for other dbmss—implementing stored procedures and adding a level of SP permissions.

Short of dropping a database, any write—even dropping a table—results in a new state. Referencing a prior state and writing to it (with proper planning) can eﬀectively correct any destructive write.

Obelisk security is believed to be robust up to disk and ram hacking.

6 The Shrubbery

Key to Codd’s invention of the relational database is his critique of hierarchical databases.²⁴ It took another 20 years to mostly fulﬁll the demise of accessing data via paths. Yet the concept lingers and is not without merit. Urbit being both a computer fundamentally supporting a hierarchical (tree) architecture and a network of nodes with immutable identities is the perfect substrate for implementing a truly global namespace.²⁵

Obelisk fully supports global namespace access to data and schema. This urQL/Obelisk functionality is not easily duplicated on computers other than Urbit.

SELECT DATABASE animal-shelter

SELECT DATABASE returns the entire animal-shelter²⁶ noun.²⁷

Formatted as a path in the global Urbit namespace (beginning with ship identity)

/~nomryg-nilref/obelisk/animal-shelter

any caller on the network with suﬃcient security permissions can retrieve a copy of the database.

Working our way down the schema hierarchy we have access to namespaces within a database,

SELECT NAMESPACE animal-shelter.reference 
 
~/obelisk/animal-shelter/reference

and pause to point out the path is on the local ship, but still conﬁned by local desk²⁸ permissions. The returned noun is all the user deﬁned tables, views, and indices in the selected namespace, including all their schema and data states.

SELECT my-table 
 
~/obelisk/db1/dbo/my-table

The reader may wonder about the syntax here. Submission of urQL scripts to the Obelisk parser requires the name of a default database along with the script, and dbo is always the default namespace. The returned noun is the same metadata and result set whose print format we have seen above.

We cannot tell from the syntax (or the returned metadata) whether the query returned the data from a table or view. Views can shadow tables. To force a table add the TABLE keyword to the script, or in the namespace,

~/obelisk/db1/dbo/table/my-table

and use AS OF for a particular state,

~/obelisk/db1/dbo/my-table/~2020.1.26

The ﬁnal drill down in the namespace is to the column level,

SELECT db1..my-table.column-4 
~/obelisk/db1/dbo/my-table/column-4

Keep in mind Obelisk returns proper sets of data rows. Only a view is capable of deterministic ordering of the result rows when accessing data in this manner.

Shrubbery support provides for hybrid relational/hierarchical data processing. The Hawk Urbit implementation of namespace computing provides for scripting language or api usage of Obelisk as well as path access and post processing. There is already a working adaptation of a sql Studio-like interface to Obelisk available in Hawk.

7 Data Structures

Bad programmers worry about the code. Good programmers worry about data structures and their relationships (Torvalds, 2016).

To better understand the workings of Obelisk we turn to the data structures. Obelisk is a work in progress approaching a beta release, so some structures still may change over time.

7.1 AST/API

The Obelisk parser parses urQL commands into the structures in ﬁle /sur/ast, which also serves as the Obelisk api. We show the most important schema and content manipulating commands in Figure 1. A qualified-column is one qualiﬁed by a qualified- table.

:: create-database 
$:  %create-database 
    name=@tas 
    as-of=(unit as-of) 
== 
 
:: drop-database 
$:  %drop-database 
    name=@tas 
    force=? 
== 
 
:: create-namespace 
$:  %create-namespace 
    database-name=@tas 
    name=@tas 
    as-of=(unit as-of) 
== 
 
:: create-table 
$:  %create-table 
    table=qualified-table 
    columns=(list column) 
    pri-indx=(list ordered-column) 
    foreign-keys=(list foreign-key) 
    as-of=(unit as-of) 
== 
 
:: predicate 
(tree predicate-component) 
:: qualified-table 
$:  %qualified-table 
    ship=(unit @p) 
    database=@tas 
    namespace=@tas 
    name=@tas 
    alias=(unit @t) 
== 
 
:: drop-table 
$:  %drop-table 
    table=qualified-table 
    force=? 
    as-of=(unit as-of) 
== 
 
:: update 
$:  %update 
    ctes=(list cte) 
    table=qualified-table 
    as-of=(unit as-of) 
    $:  columns=(list qualified-column) 
        values=(list value-or-default) 
    == 
    =predicate 
== 
 
 
 
 
:: delete 
$:  %delete 
    ctes=(list cte) 
    table=qualified-table 
    as-of=(unit as-of) 
    =predicate 
== 
 
:: truncate-table 
$:  %truncate-table 
    table=qualified-table 
    as-of=(unit as-of) 
== 
 
:: predicate-component 
 $? ops-and-conjs 
   qualified-column 
   unqualified-column 
   dime 
   value-literals 
    cte-alias 
    scalar-alias 
    aggregate 
== 
 
:: column 
$:  %column 
    name=@tas 
    type=@ta 
==

Figure 1: Obelisk schema creation and modiﬁcation ast/api structures, from /sur/ast.

Obelisk does not yet support interacting with tables on foreign ships, so qualiﬁcation is currently only by database and namespace. An alias may be assigned in the urQL. There is no distinction in urQL or the ast between tables and views. A view may shadow a table in the context of queries, but schema and data manipulating commands always operate on the speciﬁed table.

We see that DELETE requires a predicate (unlike standard sql) and the predicate data structure is tree-shaped. Predicate nodes are operators, conjunctions, or any number of representations of data.

Figure 2 presents the selection data structure, also a tree structure which accommodates either a set operator or one of three other structures as nodes. insert and merge are for future implementation using the %into operator. The query data structure is central to reading and formatting persisted data and can be combined with any of the regular set operators (%union, %except, %intersect, %divided-by). If there are ctes, they are processed ﬁrst to return one relation each. Prior cte relations are accessible by subsequent ctes. Internally diﬀerent kinds of relation data structure represent data returned by ctes (cte-alias) or more commonly qualified-table, which includes (possibly table-shadowing) views. (query-row is a placeholder for future temporary table urQL syntax.)

:: selection 
$:  %selection 
    ctes=(list cte) 
    set-functions=(tree set-function) 
== 
 
:: set-function 
$?(set-cmd set-op) 
 
:: query 
$:  %query 
    from=(unit from) 
    scalars=(list scalar) 
    =predicate 
    group-by=(list grouping-column) 
    having=predicate 
    =select 
    order-by=(list ordering-column) 
== 
 
:: set-op 
$?  %union 
    %except 
    %intersect 
    %divided-by 
    %divide-with-remainder 
    %into 
== 
 
 
:: set-cmd 
$%(insert merge query) 
 
:: cte 
$:  %cte 
    name=@tas 
    =query 
== 
 
:: from 
$:  %from 
    =relation 
    as-of=(unit as-of) 
    joins=(list joined-object) 
== 
 
:: scalar 
$:  %scalar 
    scalar=scalar-function 
    alias=@t 
== 
 
:: select 
$:  %select 
    top=(unit @ud) 
    columns=(list selected-column) 
== 
 
 
 
:: selected-column 
$%  qualified-column 
    unqualified-column 
    selected-aggregate 
    selected-value 
    selected-all 
    selected-all-object 
== 
 
:: insert 
$:  %insert 
    table=qualified-table 
    as-of=(unit as-of) 
    columns=(unit (list @tas)) 
    values=insert-values 
== 
 
:: relation 
$%  qualified-table 
    cte-alias 
    query-row 
== 
 
:: joined-object 
$:  %joined-object 
    join=join-type 
    =relation 
    as-of=(unit as-of) 
    =predicate 
==

Figure 2: Obelisk query and selection ast/api structures, from /sur/ast.

selected-column deals with all the diﬀerent ways columns and data may be selected in urQL, qualiﬁed and unqualiﬁed columns, literals, all columns across all joins, and all columns of a speciﬁed relation. (Aggregate functions are for a future release.)

A potential enhancement to the urQL syntax and ast in a future release is to support zero-ary structures in the SELECT clause of a query. This is an area of research enabled by Obelisk’s ability to produce result vectors and sub-vectors (see outer joins above). The model is how the Datalog language deﬁnes zero-ary structures, beginning a sub-structure with :- colhep and ending it with a . dot, allowing nested sub-structures.

The example

SELECT column-1, column-2, :- column-3, column-4., 
 ↳ column-5

would return one vector of column-1, column-2, column-5, and one vector of column-3, column-4 for every instance of the same column-1, column-2, column-5. This would open the door to returning complex nouns from queries.

7.2 Persistence Layer

At the very top of the Persistence Layer (Figure3) is the server data structure.

+$  server  (map @tas database)

This is simply an index into all the databases, including the %sys database tracking other databases. This is the noun which the Obelisk app persists. Alternate implementations using an app structure other than the one provided with Obelisk merely need to persist this noun.

:: database 
$:  %database 
    name=@tas 
    created-provenance=path 
    created-tmsp=@da 
    sys=((mop @da schema) gth) 
    content=((mop @da dataa) gth) 
    =view-cache 
== 
 
:: view-cache 
((mop ns-obj-key cache) ns-obj-comp) 
 
:: ns-obj-key 
$:  ns=@tas 
    obj=@tas 
    time=@da 
== 
 
:: cache 
$:  %cache 
    tmsp=@da 
    content=(unit cache-content) 
== 
 
 
:: ns-obj-comp 
|=  [p=data-obj-key q=data-obj-key] 
^-  ? 
 
:: cache-content 
$:  rowcount=@ 
    rows=(list (map @tas @)) 
== 
 
:: schema 
$:  %schema 
    provenance=path 
    tmsp=@da 
    =namespaces 
    tables=(map [@tas @tas] table) 
    views=((mop data-obj-key view) 
      ns-obj-comp) 
== 
 
:: data 
$:  %data 
    ship=@p 
    provenance=path 
    tmsp=@da 
    files=(map [@tas @tas] file) 
== 
:: table 
$:  %table 
    provenance=path 
    tmsp=@da 
    =column-lookup 
    type-lookup=(map @tas @ta) 
    pri-indx=index 
    columns=(list column) 
    indices=(list index) 
== 
 
:: column-lookup 
(map @tas [aura @]) 
 
:: view 
$:  %view 
    provenance=path 
    tmsp=@da 
    =selection 
    =column-lookup 
    type-lookup=(map @tas @ta) 
    columns=(list column) 
    ordering=(list column-order) 
==

Figure 3: Obelisk persistence layer ast/api structures, from /sur/server-state. (Continues on next page.)

:: file 
$:  %file 
    ship=@p 
    provenance=path 
    tmsp=@da 
    =column-addrs 
    rowcount=@ 
    pri-idx=(tree [(list @) (map @tas @)]) 
    indexed-rows=(list indexed-row) 
    =column-catalog 
== 
:: indexed-row 
$:  %indexed-row 
    key=(list @) 
    data=(map @tas @) 
== 
 
 
 
 
 
::

Figure 3: Obelisk persistence layer ast/api structures, from /sur/server-state. (Continued from previous page.)

The database structure reveals the simplicity of Obelisk time-travel. The database schema and content are both +mops (ordered +maps) indexed by time. The magic of Urbit noun de-duplication does the rest.

view-cache provides for fast retrieval of view data after a view has been requested once. Any alteration of table data underlying a view will create a new entry keyed on namespace, view name, and time, but without content. The ﬁrst request for that key will cause the entire selection to execute and save the content. There are currently only system views. User-deﬁned views will be in a later release.

table deﬁnes the schema used to persist data for a persisted relation. 'column-lookup' returns the column type (aura) as well as its position in the canonical ordering. tables require a unique primary index, and for now there are no secondary table indices.

+$  index 
    $:  %index 
        unique=? 
        key=(list key-column) 
5==  == 
+$  key-column 
    $:  %key-column 
      name=@tas 
      =aura 
10      ascending=? 
==  ==

:: action 
$%  $:  %tape 
        default-database=@tas 
        urql=tape 
    == 
    $:  %commands 
        cmds=(list command) 
==  == 
 
:: cmd-result 
$:  %results 
    (list result) 
== 
 
:: vector 
$:  %vector 
    (lest vector-cell) 
== 
 
:: vector-cell 
[p=@tas q=dime] 
:: result 
$%  [%message msg=@t] 
    [%vector-count count=@ud] 
    [%server-time date=@da] 
    [%security-time date=@da] 
    [%schema-time date=@da] 
    [%data-time date=@da] 
    [%result-set (list vector)] 
== 
 
 
 
 
 
 
 
 
 
 
 
 
:: command 
$%  alter-index 
    alter-namespace 
    alter-table 
    create-database 
    create-index 
    create-namespace 
    create-table 
    create-view 
    delete 
    drop-database 
    drop-index 
    drop-namespace 
    drop-table 
    drop-view 
    grant 
    revoke 
    selection 
    truncate-table 
    update 
==

Figure 4: Obelisk presentation layer ast/api structures, from /sur/obelisk.

columns returns the columns in canonical order (the order originally deﬁned).

With the table deﬁnition established, the actual data is in a file structure, also containing relevant metadata. At the most basic level, every row of persisted data consists of a key—a list of atoms which is one or more of the subsequent columns, and the column data, which maps from column name to the atomic value. Of all the persistence layer structures, the one actually persisting data is probably the most prone to future development. For purposes of alternating query execution plans, data is persisted in more than one way. (tree [(list @) (map @tas @)]) is an ordered map allowing ordered selection on partial keys. column-catalog is a currently abandoned attempt at persisting data column-wise. (Inserting large amounts of data stored column-wise proved impractical, so for now column-wise storage is deferred.) indexed-rows is the only form currently in use, so for the beta release every query plan execution is a table scan.

7.3 Presentation Layer

Aside from urQL, Obelisk users only ever interact with the presentation layer (Figure 4). The Obelisk app allows for two actions, %tape, which submits a command and default database name in a urQL $tape, and %commands, which is a list of api commands where every table and view is already fully qualiﬁed by database.

This section has covered most of the available commands; several planned commands like alters, drops, and security commands will parse but are not yet supported in the Obelisk engine.

Every urQL command returns a result structure of metadata and vectors, which we have already covered.

An untapped area of investigation is to change the %result-set from returning (list vector) to returning a list of trap cores which emit vector when evaluated. The lookup of columns row by row in the query relation is computationally intense and could be oﬄoaded to the client side.

8 Conclusion

A database scripting language should provide the most intuitive and complete experience possible for organizing and querying data. Uniting the relational data model with set theory and predicate calculus has proven to be the most eﬀective general means of organizing and retrieving data to date, however the hierarchical model still has its place and should be supported as far as possible. We have seen that even sql databases have at least a three-level hierarchy, four including namespaces, ﬁve when implemented as a temporal database.

Obelisk and its urQL scripting language represent a reimagining of the relational database paradigm, inspired by and tailored to the unique architecture of the Urbit computer while remaining adaptable to other computing environments and free of the constraints of sql’s backward compatibility, Obelisk leverages E. F. Codd’s foundational principles to address longstanding deﬁciencies in traditional rdbmss. Its innovations—time travel through database states, idempotent queries, proper set operations, elimination of nulls, and enforced composability—oﬀer a streamlined, transparent, and powerful approach to data management. The integration of Urbit’s pure functional model ensures that all operations, whether through the urQL scripting language or the api, are consistent and accessible, with no hidden functionality. Additionally, through Urbit’s global hierarchical namespace, data access via namespace path enhances ﬂexibility, while Urbit’s identity-based security model provides robust protection. Obelisk not only advances the state of relational databases but also sets a new standard for clarity, consistency, and innovation in structured data systems.

Appendix: Examples

/* The %sys database contains provenance metadata for 
all the user databases. 
*/ 
 
5FROM sys.sys.databases 
SELECT database, sys-agent, sys-tmsp, data-ship, 
       data-agent, data-tmsp ; 
 
/* Set the default database to animal-shelter for the 
10remaining queries. The rest of the schema metadata is 
in %sys namespace views of each user database. 
*/ 
 
FROM sys.tables 
15SELECT namespace, name, agent, tmsp, row-count ; 
 
FROM sys.columns 
SELECT namespace, name, col-ordinal, col-name, 
 ↳ col-type ; 
20 
/* The %sys namespace views also tracks the schema and 
content history. 
*/ 
 
25FROM sys.sys-log SELECT tmsp, agent, component, name ; 
 
FROM sys.data-log 
SELECT tmsp, ship, agent, namespace, table, row-count ; 
 
30/* Start exploring. Return complete sentences, 
  if you like. 
*/ 
 
FROM reference.calendar T1 
35JOIN reference.calendar-us-fed-holiday T2 
WHERE T1.date BETWEEN ~2025.1.1 
      AND ~2025.12.31 
SELECT 'On this day: ', day-name, ',', T1.date, 
 ↳     ', is the holiday ', us-federal-holiday, '.' ; 
40 
/* Create a new database, add a table and insert data. 
The default namespace is %dbo. 
*/ 
 
45CREATE DATABASE db2 AS OF ~2000.1.1 ; 
 
CREATE TABLE db2..my-table-1 (col1 @t, col2 @da) 
PRIMARY KEY (col1) ; 
 
50INSERT INTO db2..my-table-1 (col1, col2) 
  VALUES ('today', ~2000.1.1) 
    ('tomorrow', ~2000.1.2) 
    ('next-day', ~2000.1.3) 
    ('next-today', ~2000.1.1) 
55    ('next-tomorrow', ~2000.1.2) 
    ('next-next-day', ~2000.1.3) ;

References

: Chamberlin, Donald D. and Raymond F. Boyce (1974). “sequel: A Structured English Query Language.” In: sigfidet ’74: Proceedings of the 1974 acm sigfidet (now sigmod) Workshop on Data Description, Access and Control, pp. 249–264. url: https://doi.org/10.1145/800296.811515 (visited on ~2025.9.5).
: Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Tech. rep. San Jose, California: ibm Research Laboratory. url: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf (visited on ~2025.9.5).
: — (1990). The Relational Model for Database Management: Version 2. Addison-Wesley.
: Darwen, Hugh and C. J. Date (1998). Databases, Types, and the Relational Model: The Third Manifesto. Revised 2006. Addison-Wesley.
: — (2002). Temporal Data and the Relational Model. Morgan Kaufmann.
: ~hastuc-dibtux, Liam Fitzgerald (2024) “Shrub”. url: https://github.com/urbit/shrub (visited on ~2025.9.27).
: Levin, Ami (2020) “Animal Shelter”. url: https://github.com/ami-levin/Animal_Shelter (visited on ~2025.9.13).
: ~migrev-dolseg, William Hanlen (2025) “ways hawk is practically useful for me, right now”. url: https://willhanlen.com/~~/outbox/25/7/my-hawk-uses/ (visited on ~2025.9.8).
: ~rovnys-ricfer, Theodore Blackman (n. d., ca. 2023) “Young Shrub”. url: https://rovnys.cataphract.us/young-shrub (visited on ~2025.9.8).
: ~sorreg-namtyv, Curtis Yarvin et al. (2016). Urbit: A Solid-State Interpreter. Whitepaper. Tlon Corporation. url: https://media.urbit.org/whitepaper.pdf (visited on ~2024.1.25).

Footnotes

¹We prefer scripting language over query language for semantic reasons. It does more than query.⤴

²Notably subnet security features and shrubbery enabling SELECT statements.⤴

³Codd (1970); Chamberlin and Boyce (1974) Codd’s seminal paper focuses primarily on studying set relations and does not invent a query language, but sets out the parameters for an eﬀective and eﬃcient language. Predicate calculus was already the obvious way to ﬁlter stored data. Chamberlin and Boyce (1974) treat it as a given, discussing their syntax for applying it.⤴

⁴E.g. the sql standard does not truly implement sets by default. Obelisk does.⤴

⁵Darwen and Date (1998); Darwen and Date (2002). While inﬂuential in other aspects of Obelisk’s design. We are less familiar with their work on temporal databases, so their research on temporal databases was not especially inﬂuential. Without any direct knowledge of his work, we credit the reputation of Rich Hickey.⤴

⁶~sorreg-namtyv et al. (2016). Is any suﬃciently complex computer function completely pure? Urbit and Obelisk both rely on crash to reject invalid commands that parsing and the api type system cannot otherwise validate.⤴

⁷The noun is an Urbit abstraction that maps beautifully to English. It is either a natural number, or an ordered pair of nouns. Thus the fundamental data structure of Urbit (and Obelisk) is a binary tree of natural numbers.⤴

⁸Of the sql rdbmss that do have apis, they are uniformly bolted-on afterthoughts.⤴

⁹With the sole exception of dropping a database from the server, in which case all traces of the database vanish from the server.⤴

¹⁰Currently the only schema altering commands available are CREATE and DROP. ALTER is on the roadmap for an undetermined future release.⤴

¹¹A ship is the generic term for an Urbit computer on the Urbit network, which has a unique, immutable, cryptographically protected identity. The ship’s time corresponds to the computer time on the Urbit VM’s host computer, whether Linux/Unix, Mac, or Windows.⤴

¹²A namespace is known as schema in the rest of the sql world, leading to confusion.⤴

¹³Setting the AS OF time when manipulating data (INSERT, UPDATE, DELETE) results in a new content state based on the state in eﬀect at that time.⤴

¹⁴Codd (1970).⤴

¹⁵Here we deviate from Codd’s formal deﬁnition, where he only suggests the columns—he has formally deﬁned them as domains—may be labeled, the domains are not necessarily unique. One must bear in mind Codd’s work is theoretic. He even points out that he is not designing a data query language. He is providing the foundation upon which to design query languages.⤴

¹⁶We respect the purity of Codd’s original relational algebra, thus there is no inner join only join. Outer joins are a set union on a joined relation as well as the remaining rows not being joined (Codd, 1990, p. 107). Nonetheless outer joins have great practical beneﬁt and we opt to dispense with superﬂuous keywords. Codd never published original research on outer joins, but after their introduction by other authors (Chamberlin and Boyce, 1974), he covered them in a book (Codd, 1990). Oracle is widely credited with the ﬁrst adoption of outer joins in the early 1980s, and they were ﬁrst included in the ansi sql-92 standard. Outer joins are not included in the ﬁrst Obelisk beta.⤴

¹⁷Codd (1970).⤴

¹⁸Codd (1970) describes this as a cross join ﬁltered for equality on the operands.⤴

¹⁹Darwen and Date (1998; 2002) inﬂuenced this decision.⤴

²⁰It is beyond the scope of this paper to invent a new formal deﬁnition of relation. We imagine that instead of a relation consisting of one tuple type of ordered atomic values, there would be a root tuple type of ordered unique label, type, value triples. (Type ensures strong typing. An untyped version is more general.) Any number of unique sub-types consisting of any subset of these triples in the same order may also be present. Arranged vertically all the tuples align on label/type matches. Any resulting 2D matrix having more than the root tuple type present has some number of holes in every occurrence of a sub-type, and is thus a sparse relation.⤴

²¹The ansi/iso sql standard deﬁned Common Table Expressions in 1999.⤴

²²Not in the ﬁrst beta release. The ﬁrst beta prevents all access by a foreign ship.⤴

²³Userspace security is still a work in progress in the Urbit architecture.⤴

²⁴Codd (1970), “1.1.3 Access Path Dependence”. Codd uses the term data store which is mostly interchangeable with database, even if they conjure up diﬀerent images in the mind.⤴

²⁵Not in the ﬁrst beta release. The shrubbery concept of a universal namespace is an organic outgrowth of the Urbit ﬁle system (Clay). Within Urbit circles the concept is still amorphous. It could be more than an Urbit network uri, with endpoints functioning as data or software. It is hard to identify a single author or inventor. ~rovnys-ricfer originated the term in conversation with ~littel-wolfur and was the ﬁrst to write about it (~rovnys-ricfer, personal communication, ~2025.10.18; ~rovnys-ricfer, undated). ~sarpen-laplux developed the earliest working implementation in 2023. ~hastuc-dibtux subsequently carried out extensive prototyping and advocacy (2024). ~migrev-dolseg’s %hawk Gall app is the most readily available and extensive implementation in this tradition to date (2025).⤴

²⁶Levin (2020) Available for several database platforms, this is a readily available non-trivial sample database that demonstrates natural primary table keys.⤴

²⁷Backups may be implemented as a copy of the database noun on another ship. The Urbit computer is already an acid database and nouns are de-duplicated on the same ship. So references to the same noun are pointers to the single physical occurrence of that noun.⤴

²⁸A desk can be thought of as an application space inside of userspace.⤴