Designing a Data Flow Language - PIPE

Renato Pereira
Apr 28, 2024

PIPE is an experimental language that I’m creating mostly to learn the concepts of programming language design. I’m experimenting with the limited knowledge I have about the field before heading into compiler theory. This reason by itself already imposes some conditions on what PIPE became.

PIPE is a dynamic interpreted language, mainly because I still lack the knowledge about how to handle a more strict type system and how to generate proper bytecodes. To extract the most from what I can do, I decided to create a strongly typed language without null values, which, impacted further decisions.

Following the discussion from the previous posts, the focus of the languages is to provide a special system for chaining functions calls, which, in my opnion, increases readability and makes sequential algorithms easier to follow. I discoverd with SHT that I could transform this sequence of statements into lazy expressions by using generators:

-- Euler problem 2: Find the sum of the even-valued terms in the Fibonacci sequence whose values do not exceed four million
fn fib(x) {
  return match x {
    0: 0;
    1: 1
    x: fib(x-1) + fib(x-2)
  }
}


range()
| map       fib
| takeWhile x: x <= 4_000_000
| filter    x: x.IsEven()
| sum

In this example, fib(x) is just a function that receives an number and return a number, defined by the user. range is a generator functions, similar to python. Without parameters, it generates number from 0 to inifinite. Since it is a generator, it will only “execute” when the caller query it for the next value. The called in this case would be the map function, that will query the range generator, and call the fib function with the range value. The map itself returns a generator. Similarly, takeWhile and filter also return a generator. sum on the other hand, will query a generator and sum the value returned into a final number.

Breaking it down:

range(value ...Number) -> Stream<Number>: range my receive multiple arguments and it will return a Stream (a.k.a generator) of numbers. Without arguments, it will generate numbers from 0 to infinite.
map(Stream<X>, function(X) -> Y) -> Stream<Y>: map receive any value as first argument, which internally will be converted to a stream. It also receives a function, which arguments should be the same as the first map argument. Map returns a stream of the same elements from the mapped function.
takeWhile(Stream<X>, function(X) -> bool) Stream<X>: takeWhile receives any value, that will be converted into a Stream, and a function that will check if takeWhile should query the incoming stream or stop it.
filter(Stream<X>, function(X) -> bool) -> Stream<X>: filter is very similar,however it won’t stop the streaming, but will filter values off if the incoming value does not check with the function.
sum(Stream<Number>) -> Number: sum receives a number (or a stream of number), and returns a number.

Notice that sum, as the last chained function, forces the whole sequence of streams to be exahusted. Without it, this expression would only return a Stream.

To make this whole mechanism work, I had to go with some constraints:

Type Conversion Rules

Stream(X) work with any type, if X implements an iterator itnerface, it will used, otherwise it will stream the value itself. For example:

Stream(3): generates the number 3 and stop.
Stream([1, 2]): generates the number 1 and 2, and stop.
Stream('abc'): generates the strings ‘a’, ‘b’, and ‘c’, and stop.
Stream(<stream>): just return the internal stream.

Most types can be created from a stream:

Number(<stream>): will query the stream once and try to convert its value to a number.
String(<stream>): will query the stream until its stop, converting every value into a string and concating everything together.
List(<stream>): will query the stream until its stop, each value of the stream will become an element in the list.

With this mechanism, we can write expressions like 'string' | List, [1,2,3] | String, or even range(10) | filter x: x.IsEven() | List.

Tuples and Assignemnt Rules

Tuples can be used for returning and assigning values, for example:

a, b := 1, 2			-- a=1; b=2
a    := 1, 2      -- a=1
a, b := 1					-- Error!

Notice that the receiver can ignore additional values from a tuple if the left-side contains less elements than the right side. The inverse is invalid, however. The same rules are applied to function arguments.

fn add(a, b) { a  + b } 

add(1, 2)     -- =3
add(1, 2, 3)  -- =3
add(1)        -- Error!

This may sound as a strange decision at first, but it helps to handle lambdas in the pipe expressions:

[1, 2, 3]
| filter x: x.IsEven()
| sum

The Stream([1, 2, 3]) is not Stream<Number> but Stream<Number, Number>, with (value, index). A Dict works in a similar fashion, returning a pair of (value, key). Simply ignoring additional values, helps keeping the code clean, but with the cost of possible confusions about the return type from incoming Streams.

People comming from a JavaScript background and working with lodash library, such as me, may feel familiar with this :), and to be honest, it is a lot more consistent than go, for example:

list 		:= []int{1, 2, 3}
el1 		:= list[1] // Consistent.
el2, ok := list[1] // There is no rule in the language that can generate
                   // expressions like this?!

func foo() (int, bool) { return 1, false }
bar1, ok := foo()   // Consistent.
bar2     := foo()   // Error!

As you can see, Go uses the same mechanism for SOME expressions, but there is no rule in the language that can produce the same result.

Lambdas and Dicts

Lambda notation where created for pipe expressions. You can create a function simply using the : character.

a := :0          -- fn a() { 0 }
b := x: x*2      -- fn b(x) { x*2 }
c := (a,b): a+b  -- fn c(a, b) { a + b }

This notation is very similar to arrow functions in JavaScript, but I judged it would be more attractive in pipes:

range()
| map       x: fib(x)
| takeWhile x: x <= 4_000_000
| filter    x: x.IsEven()
| sumBy     x: x

range()
| map       x => fib(x)
| takeWhile x => x <= 4_000_000
| filter    x => x.IsEven()
| sumBy     x => x

In order to avoid any possible ambiguity, short dictionary declarations use = as key-value separator:

myDict := { a=1, b=2 }

Which sucks a bit, but it is not so uncommon and is still readable.

Generator Functions

Generators works pretty much the same as Python:

fn OneTwoThree {
	yield 1
	yield 2
	yield 3
}

a := OneTwoThree()	-- a = Stream<Number>
a.Next()            -- = Maybe<Number>

Notice that a generator does not return the raw value, but instead it returns Maybe (Monads), which encapsulated the yielded value. Python is different, the next function returns the value immediatly and, to notify the exauhstion of the iterator, it raises an exception:

def OneTwoThree():
  yield 1
  yield 2
  yield 3

a = OneTwoThree()
a.__next__() 			# =1
a.__next__() 			# =2
a.__next__() 			# =3
a.__next__() 			# StopIteration error!

To avoid interruption, I decided to return a maybe, and treat the iterator result consistentily with the rest of the language.

Exceptions and Maybes

Founding no other ergonomic (or aesthetic) way to handle errors, I decided to go with interruptions. Multiple returns such as in Go and Odin wouldn’t fit well due to the lack of type checking, and neither is forcing a Maybe, such as in Rust.

However, I created a simple expression to keep it simpler than try-catch blocks. With the wrap operator ?, you can convert any expression to a Maybe, and stop any possible raised interruptions within:

f := file.Open('x.txt')?     -- f = Maybe<File>

if !f.Ok() {
	println('Error:', f.Error())
	return
}

f.Value()

I’ve been thinking a lot about interruption vs returned value, I’m still unsure about these approaches. I don’t have experience in Rust or Odin, but in Go, we have a nice way to handle errors:

type Config struct {
	Path string
}

func Load(config *Config) (string, error) {
	f, err := os.ReadAll(config.Path)
	if err != nil {
		println("Couldn't read config file:", err.Error())
		return '', err
	}

	return f, nil
}

Clean and I love it. What I don’t like is that this code still may produce interruptions, for example Load(nil) would generate a Panic and stop my whole program. You may argue that this would be a skill issue, since I modeled Load function in a way that permits this kind of situation. That’s fair, but if it happens inside a dependency? If it happens by a bug in Go? The thing is, there is no guarantee that an expression won’t interrupt the program in Go, and there is only one way to capute it, and it is an ugly solution.

Nil values

Nil values sucks, but not having null values in dynamic languages looks equally as bad. The monad pattern is great for most of things, but suppose you want to model a Graph Node which can reference a parent of the same type.

data Node {
	parent = empty(Node) 	-- Returns a Maybe<Node>(error)
}

Notice that, in order to keep null out of the language, I had to put a default initializer in struct definitions, which works great until you have self-reference. In this case, I had to use a maybe with the default value as an error (Empty Error). I believe that this would be avoidable by using union types, like functional languages. But what to do with null values in JSON files? I still have no answer to that, and this is a topic of future studying.

Next Steps

If you are interested and want to know more about the language, check it out in pipe.r2p.dev. From here I want to learn more about types and compilation. I believe I can transform PIPE into a simple useful typed language.