Ogma v0.5 Release

ogma is a scripting language written in Rust focused on ergonomically and efficiently processing tabular data. Mixing aspects of terminal shells and functional programming, the ogma project lets one interact with data in a refreshing way.

ogma version 0.5 has been released, bringing many bug fixes on top of the type inference update. Some of the major updates were behind the scenes, reworking the compiler to be more robust with type inferencing and variable resolution, which should reduce any internal compilation erros seen in the wild. This post is going to delve into the introduction of TypesSets and the variable sealing rework.

Find the release binaries and notes at the GitHub repo.

TypesSets


Since the introduction of type inference, the compiler could encounter code which would fail to type check, but could be trivially reasoned about. Much of the issue was how the compiler would trial different types and the deductions that could be made without a concrete set of constraints. To solve this issue, the notion of a TypesSet was introduced. The set would begin as a master superset of all known types, reducing as more information is gathered and constraints are placed on each node. The set replaced the Inferred variant in a node within the type graph, so effectively each node would maintain a set of inferred types. The actual change wasn't that large:

enum Knowledge {
    // No longer represented, since the types set will dictate ambiguity
-   Unknown,
    Any,
    Known(Type),
    Obliged(Type),
-   Inferred(Type),
+   Inferred(TypesSet)
}

struct TypesSet(Rc<HashSet<Type>>);

Notice that the Unknown variant is removed, it is represented via a set having ambiguity (more than one type). The TypesSet structure is simply a reference counted HashSet. The reference counting is used to reduce the memory burden, since the type graph will be initialised with the master superset, there is no need to keep clones of this in memory. When a set is reduced, it will be lazily cloned (cow), and not all sets will go this way; some sets are immediately overwritten with known knowledge. Using sets allows for faster deductions to be made. When the type flow occurs, sets that flow between one another can leverage set intersection to reduce each of them. Checking for a valid intersection is simple, the Rust standard library's HashSet has a great API. The change to types sets also heralded a more constrained way of implementing intrinsic commands. The compiler framework had already existed, this release now leverages it to unleash the power of set deductions.

Moving to types sets for the inference fixed many of the outstanding type inferencing bugs, and provides a robust foundation to get polymorphism out of your ogma code.

Variable shadowing


Whilst working through the bug issues, a subtle variable shadowing bug arouse in uncommon cases where the shadowed variable would be the one referenced at runtime.

ls | let $x | grp type | map value 
    {:Table \ $x | let $row.key:Str $x | fold {Table $x} append-row $row.size:Num }
                                     |                ^ $x should be a string but was compiling as a table
                                     ^ $x gets reassigned here

To solve this issue, the assumptions the compiler would make about sealing nodes from variable introduction had to be reworked. The new method leverages the locals graph to strictly define the lexical parents which can introduce variables, and which would need to be sealed before any concrete answers about variables existing can be asked. This was especially tricky since most commands do not introduce variables, however, when applied as a default assumption would lead to the compiler eagerly compiling blocks with stale variables. The new system makes the compiler more pessimistic, but allows for fine grained control to tell the compiler when an command will not introduce any more variables and can be sealed.

Looking forward


ogma's type system has reached enough power to allow most polymorphic needs. Feature wise, there are lots of commands to implement, and the milestone of partitions, which is key to create more modular code bases. Open source support through code contributions, sponsorship, adoption and sharing is much appreciated!

Previous
Previous

Using Rust and Elm to create Kronuz

Next
Next

Stochastic Scheduling Framework utilising Spry and daedalus