Elixir Strcts Explained

What Are Elixir Structs & How Do They Relate to Other Key Value Data Structures

Background on Maps

Elixir has a nice feature in the standard library called struct, that provides a really nice user experience for handling key, value data. If your aren’t familiar with what I mean by key, value data, its simply a data structure where names/symbols are assigned to pieces of data which can be later accessed by referring to that name. They work like tables with labeled rows. Now, this is an overly simplistic explanation, but it’s useful in introducing the concept. If you’re interested in a bit of CS background these data structures are referred to more formally as Associative Arrays or alternatively as maps (like in elixir), symbol tables, or dictionaries. A language called SNOBOL was the first programming language to include a native associative array, called tables, released in SNOBOL4, 1962 [1]. Elixir has it’s own native map implementation and struct is built directly on top of that structure, which is itself build on top of Erlang’s map. This is all to say that structs are an abstraction over associative arrays (maps).

Let’s take a quick look at maps in Elixir and a few other languages so that we have a a frame of reference to build upon. Similar to Elixir, Ruby has hashes, which allow any Object t o be used as a key, generally you would use strings, atoms, or numbers as keys, but there’s no such limitation. However, as a best practice one should not use a mutable object as a key, because that breaks the associative property of the data structure. Python has dictionaries which function similarly, but require immutable keys (in contrast to Ruby). Both languages are distinct from Elixir in that their maps are mutable. Clojure offers a map implementation that’s immutable like Elixir’s map. If you’re more familiar with JavaScript, it has “objects” that are essentially like mutable associative arrays and structs at the same time. There’s a reason JSON has become the de facto serialization format of the web.

Maps in Comparison

Ruby

First let’s have a look at Ruby’s hashes, which as you can see below can be instantiated with curly-braces {}, which is a common theme among associative arrays.

  pry(main)> {}
  => {}

As mentioned previously, Ruby will happily accept any object as key, even another hash.

  pry(main)> h = Hash.new
  => {}
  pry(main)> {one: 1}
  => {:one=>1}
  pry(main)> h
  => {}
  pry(main)> test_hash = {h => "ruby hashes are interesting"}
  => {
      {} => "ruby hashes are interesting"
  }
  pry(main)> test_hash[{}]
  => "ruby hashes are interesting"
  pry(main)> test_hash[:color] = "red"
  => "red"
  test_hash[String] = "this is odd"
  => "this is odd"
  pry(main)> test_hash
  => {
                   {} => "wow",
               :color => "red",
      String < Object => "this is odd"
  }

As you can see, Ruby is very permissive and checks hash key equality based on value. You can achieve some cool results if you’re willing to play around with some of the features of Ruby hashes, but I will leave that as an exercise for the reader who is inclined to dig into Ruby.

Python

If you are Python developer, the Ruby example above will probably look both familiar and a bit alien at the same time. Python is not nearly as open to the whims of the the developer as Ruby may be. Let’s have a look at a similar set of inputs.

  >>> {}
  {}
  >>> d = {}
  >>> d
  {}
  >>> d = {'one': 1, 'two': 2}
  >>> d
  {'one': 1, 'two': 2}
  >>> d['three'] = 3
  >>> d
  {'one': 1, 'two': 2, 'three': 3}
  >>> d[{}] = 'dict'
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unhashable type: 'dict'

As you can see, since dict is mutable, it isn’t a “hashable” type, and can not be used as a key, which makes sense. This will save you the pain of finding out that some mutable key has been changed at runtime and you can no longer access the value you want. In Ruby, the land of mutable hash keys is inhabited by dragons.

Clojure

Clojure, if you are not familiar, is a lisp, and built on top of the JVM, and is in many ways very different than the previous languages, but it brings us into the domain of functional programming languages where we also find Elixir. Clojure, much like Elixir allows for the appearance of mutability. Specifically it has a immutable values which are associated with an identity, and while values do not change, associations may. You can see this below.

  > (hash-map)
  {}
  > {}
  {}
  > (def h (hash-map :one 1, :two 2))
  #'sandbox11187/h
  > h
  {:one 1, :two 2}
  > (def h (hash-map :one 1, (hash-map) 2))
  #'sandbox11187/h
  > h
  {{} 2, :one 1}
  > (def h (apply hash-map [:one 1 :two 2]))
  #'sandbox11187/h
  > h
  {:one 1, :two 2}
  > (def a (get h :one))
  #'sandbox16576/a
  > a
  1
  > (def h (hash-map [1 2 3] 1, :two 2))
  #'sandbox16576/h
  > h
  {[1 2 3] 1, :two 2}

As you can see, Clojure happily lets you assign a variety of data types as hash-map keys. It provides more flexibility than Python, while avoiding Ruby’s problem of mutable hash keys.

OCaml

OCaml is an interesting case, because it is both functional and in places mutable. It is also a strongly typed language, but infers types so it will have fewer declarations than Java for example. OCaml has Hash Tables which are mutable and allow the developer to store multiple pieces of data in the same bucket. This means that it isn’t exactly an associative array, it’s an abstraction that goes in a slightly different direction that the Struct we’re build up to. In short, OCaml Hash Tables use a hashing function to place values in buckets, some buckets(keys) hold multiple values, and the collection of buckets and be resized. You can read up on Hash Tables here[2].

  # let my_table = Hashtbl.create 34517;;
  val my_table : ('_a, '_b) Hashtbl.t = <abstr>
  # Hashtbl.add my_table "one" "1";
  Hashtbl.add my_table "two" "2";
  Hashtbl.add my_table "two" "two";;
  - : unit = ()
  Hashtbl.find my_table "two";;
  - : string = "two"
  # Hashtbl.find_all my_table "two";;
  - : string list = ["two"; "2"]

In practice you can use the Hash Table like a map or dictionary if you pretend that keys with multiple values have arrays as values. Since they have the type signature list, that happens to be convenient. OCaml has a built in Map which is a lot like a strongly typed version of the Struct, which is a little strict for my taste.

Elixir

Maps in Elixir look a lot like hashes in Ruby, and that is no accident, as José Valim the creator of Elixir heavily borrowed some of the best(arguably) parts of Ruby’s Syntax. You can see this in the following example:

  iex(5)> map = %{a: 1, b: 2, c: 3}
  %{a: 1, b: 2, c: 3}

As you can see, Elixir has a similarly terse and in my opinion readable syntax. They are also fairly simple, in the Rich Hickey sense of being the opposite of complex. At their core, Elixir maps sit directly on top of the map data structure introduced in Erlang 17. The Elixir implementation or the new function can be found here.

  @spec new(Enumerable.t) :: map
  def new(enumerable)
  def new(%{__struct__: _} = struct), do: new_from_enum(struct)
  def new(%{} = map), do: map
  def new(enum), do: new_from_enum(enum)

  defp new_from_enum(enumerable) do
    enumerable
    |> Enum.to_list
    |> :maps.from_list
  end

You may pass any Enumerable type into the new function, which in Elixir would be Map, Tuple, List, and of course Struct. This implementation makes of Elixir’s pattern matching to provide slightly different implementations for different inputs. Essentially any map gets returned and any other Enumerable that is properly structured will be converted to a list, which is then passed to Erlang’s :maps.from_list function, which takes a list of key-value tupes(ordered arrays) and builds the underlying data structure. That structure for small(ish) maps stored as two flat arrays where key_array[i] points to value_array[i]. In Erlang 18, large maps are converted to a HAMT (Hash-Array Mapped Trie) (described here) a structure similar to OCaml’s hash-table.

The Struct

Now that we have established associative arrays and looked at how they work similarly, but with their own flavors in different languages we are better positioned to understand what structs offer in addition to the functionality of maps.

Compile-time checks, dot access, and easily implemented default values all make structs an ideal place to store data in Elixir applications. Struts in Elixir allow the developer to the shape of Map like data and the fields and their types ahead of time. If you are comming from an Object Oriented language, structs will give you functionality similar to calsses that have attributes, simple validation at compile time, and are like the rest of Elixir, immutable. This is useful, because in Functional Programming, code will pass around data structures and transform them, and Structs allow developers to think about these structures at a higher level of abstraction.

Below is an example of a simple struct definition that will be useful for demonstrating the basic functionality, and developed to show additional features as we progress.

  defmodule Document do
    defstruct [:title, :body, :author, :notes]
  end 

If we compile this code, or run it in IEX, we can create a blank document struct.

  iex(1)> %Document{}
  %Document{author: nil, body: nil, notes: nil, title: nil}

As you can see, each field is represented in the associative array/map and is defaulted to nil. Out of the box we also have dot access for each of the fields.

  iex(1)> d = %Document{title: "Elixir Structs, For Fun & Profit"}
  %Document{author: nil, body: nil, notes: nil,
  title: "Elixir Structs, For Fun & Profit"}
  iex(2)> d.title
  "Elixir Structs, For Fun & Profit"

If you are coming from Ruby, Python, JavaScript or a number of other languages, the dot access for fields on the struct will feel familiar. However if you work in languages with getters and setters, this might be more impressive. This method for accessing fields works on maps as well, but only if the keys are atoms. Structs enforce atoms as keys and limit the possible keys to those found in the struct’s definition. This allows for safer access to the fields, as you know which keys are safe to call. This can be particluarly useful when data comes from users or other systems. To that end, enforcing the presence of some or all of the keys may be useful.

Enforcing keys can be accomplished with the @enforce_keys macro.

  defmodule Document do
    @enforce_keys [:title, :body]
    defstruct [:title, :body, :author, :notes]
  end 
  iex(1)> d = %Document{}
  ** (ArgumentError) the following keys must also be given when building struct Document: [:title, :body]
      (ex_scratch) expanding struct: Document.__struct__/1
      iex:1: (file)

Now Elixir will throw a helpful error message if these keys are missing from the struct.

As mentioned previously, the default value for struct fields is nil, but you aren’t stuct with that. Elixir makes it simple to define default values for any field. This can be useful when you know ahead of time that a particular field will have a certain value or needs to be a specific data type.

  defmodule Document do
    @enforce_keys [:title]
    defstruct title: nil, body: "", author: %{}, notes: []
  end 
  iex(1)> d = %Document{title: "Elixir Structs, For Fun & Profit"}
  %Document{author: %{}, body: "", notes: [],
  title: "Elixir Structs, For Fun & Profit"}

As you can see in this example, author, body, and notes have default values that suggsest the taype of data they will hold. This is helpful because a partially filled in document struct could be passed to a function that tries to deal with an author or list of notes, and it will be easier to pattern match on an empty author than an empty author or nil. In general this is a nice way to avoid having to worry about the nil case with respect to your own structs. You may also set a struct’s default values to other custom structs, which is a common practice in Elixir.

  defmodule Author do
    defstruct name: "", dob: nil
    @type t :: %Author{name: String.t, dob: any}
  end

  defmodule Note do
    defstruct text: "", line_number: 0
    @type t :: %Note{text: String.t, line_number: non_neg_integer}
  end

  defmodule Document do
    @enforce_keys [:title]
    defstruct title: nil, body: "", author: %Author{}, notes: []
    @type t :: %Document{title: String.t, body: String.t, author: Author.t, notes: list(Note.t) | []}
  end
  iex(1)> d = %Document{title: "Elixir Structs, For Fun & Profit"}
  %Document{author: %Author{dob: nil, name: ""}, body: "", notes: [],
  title: "Elixir Structs, For Fun & Profit"}
  iex(2)> d.author
  %Author{dob: nil, name: ""}

The above code defines Author and Note structs, creating a blank author for a document if none is supplied. It also defines some simple type annotations which can be used by a static analysis tool like dialyxir. Since this is a post about structs, I won’t cover typespecs, but you may read about them here. Generally speaking, data internal to ELixir applications should be constructed from custom structs when possible. This makes you application easier to reason about, makes your data easier to consume for other Elixir applications, simplifies testing and documentation, and makes Elixir’s already excellent pattern matching even better.

Assuming the structs listed above, the following code will match on an author’s name.

  defmodule Books  do
    def name_that_book(%Document{author: %Author{name: "Don DeLillo"}}) do
      IO.puts "White Noise"
    end

    def name_that_book(%Document{author: %Author{name: ""}})  do
      IO.puts "nope"
    end
  end
  iex(1)> d = %Document{title: "White Noise", author: %Author{name: "Don DeLillo"}}
  %Document{author: %Author{dob: nil, name: "Don DeLillo"}, body: "", notes: [],
  title: "White Noise"}
  iex(2)> Books.name_that_book(d)
  White Noise
  :ok

As you can see, Documents can be matches on their author’s name which is quite nice. You can achieve something similar with maps, but structs give additional context to function signatures and can help ensure that all data is accounted for whith at least some sane defaults. This in turn helps with testing because you can narrow your input space to just the structs you have defined and their allowed keys. If you’re new to Elixir, or lean heavily on maps, give structs a try, they have some great benefits and require minimal extra work to use.

[1]I’ll cover SNOBOL in future posts about Natural Language Processing(NLP).

[2] My good friend Vaidehi wrote this excellent article.


If you enjoyed this post, follow me on twitter @ChaseGilliam, sometimes I'm funny. You can also find me on Github.