Wednesday, 23 May 2018

egg Chronology of Computer Languages 2

Another visualisation I tried with the COCL data-set was a chord diagram (revamped from the cocktail version I did a couple of years ago). Apart for realising how bad my JavaScript was back then, I discovered that chord diagrams do not make the "direction" of relationships obvious. I've added a bit of animation to try to remedy that when you hover over a language name.
I find this visualisation much more aesthetically pleasing that the rather workman-like "waterfall". The fact that the chronology is clock-like is also rather agreeable.

I turns out that drawing "marching ant" lines is quite expensive, even in modern browsers; so I re-vamped the "waterfall" diagram from yesterday to cache frames. It means that the animation is a bit jittery for the first few seconds after landing on the web page, but quickly becomes very smooth.

Click here for the interactive versions.

Tuesday, 22 May 2018

egg Chronology of Computer Languages 1

I was thinking about "curly brace languages" in the context of egg at the weekend, and remembered a graphic I'd seen of the "family tree" of computer programming languages. I googled around and found the images I'd seen at the turn of the millennium (e.g. here and here), but nothing from the last ten years. Strange, I thought; I'll have a go at either updating or redrawing a tree.

It quickly became apparent why I couldn't find such a chart.

I started by manually scraping the data from the Wikipedia pages and producing a spreadsheet of dependencies with:

  • Language name
  • Year it first appeared
  • Ancestors (prior languages that are said to have influenced it)
  • Link to Wikipedia page
That dataset (COCL: A Chronology of Computer Languages) is stored as JavaScript here. There are over sixty data points ranging from FORTRAN (1957) to Swift (2014). The choice of points is fairly subjective, the data is direct from Wikipedia.

Then I tried to draw the recombining tree (DAG) by hand. I got this far and gave up:


Even just concentrating on a subset (the "curlies") didn't help much:


The problem is that I'm trying to get the ordering of the chronology fixed in one of the axes (the y-axis in this case: oldest at the top, newest at the bottom). I've tried curved lines and even duplicated nodes to try to minimise the number of crossing edges. I've tried putting the oldest languages in the centre of the canvas and working outwards, like a bulls-eye. But nothing seems to work. I'm sure there are technique and/or software out there that can solve this problem, but the linear ordering seems to be beyond the ones I briefly looked at.

So I decided to throw a bit of JavaScript at it. It turns out it's easy to get ugly results from the dataset; the trick is to make it pretty.

My first attempt is an animated waterfall chart:


It's not very elegant, but does give some useful insights into popular ancestors. Everyone seems to want Lisp in their language's DNA these days.

Click here for the interactive version.

Tuesday, 1 May 2018

egg Strings

Currently, strings are a fundamental type in egg. In many languages they are not. I'm still undecided how much compiler and language support to provide. In particular, because egg is supposed to be a stable language, if strings are a fundamental type, any decisions I make now will need to be "baked" into the specification. So I've been looking into what strings actually are.

In languages where strings are not fundamental, such as C++, there may be many representations of text:

  char*
  char[N]
  std::string
  std::wstring
  std::u16string
  std::u32string

This has the advantage that the different representations and implementations can be tailored for different criteria, but it can be a bit overwhelming for learners.

Strings in egg are simply sequences of Unicode characters. They are immutable (like Java) and do not directly expose their internal representation. They are also independent of your locale.

So what can you do with strings like these? The Wikipedia pages leave a bit to be desired on this subject, so I had to start from first principles.

Assignment

Strings can be assigned from string literals or from other strings:

  string greeting = "hello";
  string word = greeting;

Concatenation

Strings can be created by concatenating other strings using the type-name constructor:

  string greeting = string("hello"); // "hello"
  greeting = string(greeting," world"); // "hello world"

Property: int length

Strings have a single property, "length", which is the count of the Unicode code points (not code units) that make up the string:

  string greeting = "hello";
  print(greeting.length); // prints '5'

Operator: string[int]

Strings have an indexing operator to extract single-character substrings at the relevant integer code point index:

  string greeting = "hello";
  string letter = greeting[0]; // sets letter to "h"

If the index is negative, or beyond the end of the string, an exception is thrown.

Operators: == and !=

Strings have equality/inequality operators:

  print("hello" == "hello"); // prints 'true'
  print("hello" == "goodbye"); // prints 'false'
  print("hello" != "hello"); // prints 'false'
  print("hello" != "goodbye"); // prints 'true'

Equality is case-sensitive and based on the equality of the constituent code point sequence. Strings are never equal to non-string values.

Iteration

The code points that make up a string can be iterated:

  string greeting = "hello";
  for (string i : greeting) print(i); // prints 'h' 'e' 'l' 'l' 'o'


Note that the iteration returns single-character strings.

Member: int hash()

Computes a 64-bit signed hash integer for use in containers and algorithms.

Member: int compare(string other)

Compares the string to another in an invariant manner (code point by code point). Returns negative/zero/position depending on whether the string is less/equal.greater than the other.

  "hello".compare("goodbye") // returns a positive integer

Case-insensitive and locale-specific comparisons are outside the remit of the basic string methods.

To prevent accident assumptions about collation ordering, the relational operators "<", "<=", ">=" and ">" are not directly supported for strings.

Member: bool contains(string needle)

Searches for a substring within the string.

  "beggar".contains("egg") // returns true

This is an optimisation of

  haystack.indexOf(needle) != null

Member: bool startsWith(string needle)

Determines if a string starts with substring.

  "beggar".startsWith("beg") // returns true

This is an optimisation of

  haystack.indexOf(needle) == 0

Member: bool endsWith(string needle)

Determines if a string starts with substring.

  "beggar".endsWith("beg") // returns false

This is an optimisation of

  haystack.indexOf(needle) == (haystack.length - needle.length)

Member: int? indexOf(string needle)

Returns the zero-based index of the first occurrence of the substring, or "null" if the substring is not present.

  "beggar".indexOf("egg") // returns 1

The full signature is

  int? indexOf(string needle,
               int? start_index = null,
               int? limit = null,
               bool? negate = null)

Member: int? lastIndexOf(string needle)

Returns the zero-based index of the last occurrence of the substring or "null" if the substring is not present.

  "beggar".lastIndexOf("g") // returns 3

The full signature is

  int? lastIndexOf(string needle,
                   int? start_index = null,
                   int? limit = null,
                   bool? negate = null)

Member: string replace(string needle, string replacement)

Replaces occurrences of a substring with another string.

  "beggar".replace("egg", "e") // returns "bear"

Remember, the result is returned from this function; the original string is unaltered. The full signature is

  string replace(string needle,
                 string replacement,
                 int? max_occurrences = null)

If "max_occurrences" is negative, replacements are made for the last occurrences of the substring.

Member: string slice(int? begin = null, int? end = null)

Extracts a substring from the string. The parameters are both optional indices. Negative indices count back from the end of the string:

  "beggar".slice(1) // returns "eggar"
  "beggar".slice(1, 4) // returns "egg"
  "beggar".slice(1, -2) // returns "egg"

Member: string[] split(string separator, int? limit = null)

Splits a string into an array of substrings:

  "beggar".split("g") // returns ["be","","ar"]

If "limit" is negative, splits are limited to the last occurrences of the separator.

Member: string join(any[] ...params)

Joins strings together:

  "a".join("b", "n", "n", "") // returns "banana"

Member: string repeat(int count)

Creates a new string by concatenating repetitions of a source string:

  "beggar".repeat(0) // returns ""


  "beggar".repeat(3) // returns "beggarbeggarbeggar"

And that's pretty much it. There are some obvious (and deliberate) omissions:

MISSING Member: string format(any[] ...params)

The functionality for formatting general text will live in separate modules. This will allow for more than one formatting scheme and also allow the algorithms to evolve over time.

MISSING Member: int codePointAt(int index)

This is really a function of the (Unicode) encoding specification, so should be defined there:

  import unicode = /*...*/;
  print(unicode.getCodePointAt("hello", 1)); // prints 101

MISSING Members: string lowercase() and string uppercase()

These are omitted because they rely on the Unicode character database, which changes over time. See here. The functionality will probably be made available via a versioned module:

  import unicode = /*...*/;
  print(unicode.toUpper("hello")); // prints "HELLO"

MISSING Members: string trim(), string trimLeft() and string trimRight()

Removing white-space (or any class of character) from the ends of strings is a common operation. The Unicode standards committee designated twenty-odd code points as "white-space". This includes one of my favourite characters: U+1680: OGHAM SPACE MARK. Alas, the categorisation is somewhat fluid, so one requires access to the Unicode character database to get it right.

MISSING Member: string reverse()

Although characters in egg strings are Unicode, reversing a Unicode string may not produce the results you expect. To do it properly again requires access to the Unicode character database, so the reverse function should belong to a versioned module.

Also, there some members that I'm still dithering about:

POSSIBLE Members: string padLeft() and string padRight()

These members can be synthesised using the existing members. However, they are surprisingly easy to get wrong, so maybe they're a useful addition:

  string padLeft(int min_length, string? padding = null)
  string padRight(int min_length, string? padding = null)

Computers Don't Lie T-Shirt

Computer Don't Lie - They're Just Misinformed

Tuesday, 17 April 2018

egg Functions

My first test script for functions in egg was everyone's favourite chestnut: the Fibonacci series.

  int fibonacci(int n) {
    if (n < 2) {
      return n;
    }
    return fibonacci(n - 1) + fibonacci(n - 2);
  }
  print(fibonacci(10));

It does indeed print out "55"!

In egg, functions defined like this are actually special cases of callable objects. The script declares an identifier "fibonacci" which is initialised with an instance of an object that supports the "call" operation: in this case, taking a single integer parameter and returning an integer.

At present, there is no notion of read-only variables in egg, so it's possible to subsequently assign a different function to "fibonacci":

  int fibonacci(int n) {
    if (n < 2) {
      return n;
    }
    return fibonacci(n - 1) + fibonacci(n - 2);
  }
  int zero(int n) {
    return 0;
  }
  fibonacci = zero;
  print(fibonacci(10));

This prints out "0". It's a good demonstration that functions in egg are considered first-class entities, but it might violate the principle of "least surprise" for newcomers.

egg "For" Statements

Recently, I got my first egg script running. Here it is:

  var s = 0;
  for (var i = 0; i < 100; ++i) {
    s += i;
  }
  print(s);

You'll be unsurprised to hear that this printed out "4950" (it was a pleasant surprise to me when it first happened, though, I can tell you!)

There are three things worth mentioning in this script.

Firstly, the "var" keyword initiates type inference for variables "s" and "i". In both cases, "int" is inferred (there are no unsigned integers in egg).

Secondly, the "print" function is a built-in method that will probably be removed, but is useful for testing, at the moment.

Finally, the syntax of "for" statements turns out to be non-trivial. Like C++, I adopted two forms:

  for (before ; condition ; step) { ... }
  for (iterator : collection) { ... }

The latter form is for iterating around collections, which we won't discuss here.

My main concern with the former form of the "for" loop is:
Which syntactic elements are valid for "before", "condition" and "step"?
The easiest of the three clauses is "condition". It's optional, but I only allow expressions that evaluate to Boolean values. This is similar to all the main curly-brace languages (C/C++/Java/JavaScript).

The "before" statement is also optional, but it must be a statement. This includes variable declarations: the scope of such variables is limited to the "for" statement, including the "condition" and "step" clauses. Again this is similar to the other languages (if we ignore JavaScript's problematic scoping rules).

The "step" statement is optional too, but cannot be a declaration. As mentioned previously, the increment and decrement operators are supported in egg purely to allow the classic for-loop idiom seen in this example.

But then I got a bit confused with which egg statements should be valid for "before" and "step". Can you use "break" and "continue"? If so, what do they do?

So I asked my C++ compiler, and it says that I cannot use "break" or "continue", but I can "throw" exceptions. The reason you cannot "break" or "continue" in "before" and "step" clauses is because those statements are just that: statements. The "before" and "step" clauses in C++ expect C++ expressions.

But why can you "throw" exceptions in those clauses in C++?

  for (auto i = 0; i < 100; throw "Bang!") {} // Valid C++

Well, it turns out that what I think of as throw statements are actually throw expressions (of type "void") in the formal syntax. It's a cul-de-sac that others have found themselves in too!

As egg is meant to be an easy-to-learn language, with few surprises, I decided to classify "throw" as a statement and explicitly forbid it in expressions and "before" and "step" clauses of "for" statements.

Tuesday, 3 April 2018

ZX Spectrum Flags of the European Union

Someone, somewhere, out there on the Internet, is looking for the flags of the twenty-eight (current) member countries of the European Union  ... in the original Sinclair ZX Spectrum SCREEN$ format. Surely...