Meta Scala (1)

Having read Niklaus Wirth’s Algorithms + Data Structures = Programs some 12 years ago I realized, multiple times during my career, that Wirth’s title equation is the ultimate truth of software engineering and that most of the things we consider new are simply implementation of old ideas, which, for different reasons, can only be realized now. Scala is one such implementation – an effective combination of object oriented and functional programming.

In this series I’ll try to introduce Scala topics based on some basic (the you-need-to-know-that) software engineering knowledge to bring Scala to wider audience and shed some light on that ‘new’ functional paradigm as I go along. Of course I have a hidden agenda and that is simply to play around and get deeper understanding of the tool I use daily.

I assume you have knowledge of programming in theory and practice. I don’t aspire for my posts to be a tutorial, there are other sources that do it a lot better (like this or that). What I really try to achieve is to answer some of the question I see being asked often by my peers or myself and/or try out wild ideas. So feedback is more than welcome!

Part 1 – Functional combinators

Recently I’ve been busy a lot with Scala and one thing I simply love about it is all the functional combinators defined on collections – map, filter, drop, zip, dropWhile, foldLeft, foldRight… Google Guava library tried to fill in the empty space in the heart of Java to make collection bearable for use, but still Java’s conservative ways of everything being an object (but not quite everything) get in a way of writing concise and readable code. With Scala it’s different. And, as you will hopefully learn from these and other tutorials, a big part of Scala is written in…Scala! Ok, that’s not true, but it looks like it. A lot of things you consider part of the language are simply libraries written in the language. Maybe I’ll touch that topic in some other part of these series. For now let’s tackle the statement I stumbled across in Twitter’s Scala School – ‘every functional combinator shown above can be written on top of fold’ and try to explain some typical misconceptions I noticed when talking to my fellow programmers. This is how we do it!


folding (left and right)

Fold a list into a single element of new type. foldLeft iterates from left and (surprise! surprise!) foldRight iterates from right.

How I see folding from left

How I see folding from right

Example

Let’s assume we have a list of numbers and function isTheOne, which returns Option (more on this later) of a first encounter of number 1 together with its index. The function takes y, which is previous outcome of execution on n-1th element and the next nth element x.

test("foldLeft and foldRight")  {

    val numbers = List(3, 1, -2, 5, 1)
    val initialLState:(Option[Int], Int) = (None, -1)
    val initialRState:(Option[Int], Int) = (None, candidates.length)

    ... //isTheOne impl goes here, for now it's not important    

    numbers.foldLeft(initialLState){ (y, x) => isTheOne(y, x, 1) } should be (Some(1), 1)
    numbers.foldRight(initialRState){ (x, y) => isTheOne(y, x, -1) } should be (Some(1), 4)
}

So what exactly happened there?

There are multiple Scala concepts here, which I will explain later. First let’s focus on what fold family can do for you. As you can see, fold takes two arguments – initial state (I call it the-value-that-you-drag-behind), which in each subsequent state can be substituted with new state and a function of the previous state (yn-1) and current item (xn), which returns new state for the next iteration or the final state if there are no iterations left (yn). In the example, if I fold from left, I will simply iterate over all items holding my initial state, thus starting from index -1 I will find number 1 on position 1. If I fold from right however, I will start from index candidates.length, which is currently 5, and go through the list from right, thus finding number 1 at position 4. In functional terms what happens here is y4 = f( f( f( f(y0, x1), x2), x3), x4) for n=4 when folding left and y0 = f( f( f( f(y5, x4), x3), x2), x1) for n=4 when folding right.
One thing to notice is that fold passes the arguments in order reflecting the direction of folding, so in case of foldLeft the dragged value y goes in first followed by subsequent item x, while in foldRight that order is turned around.
The great thing about fold family is you can use them to implement all other iteration-based functions of any collection. You can think of Scala’s collection, foldLeft and foldRight (with some additional appending and prepending operators), as a sort of functionally complete system – i.e. you can implement any operation on the list using only the folds and operators. Well, let’s try!


map

Map each element of a collection into a new element in a new list under given function f.

How I see the map

Test first (!)

I expect to map elements of MyList to a string representation with # (number) prepended.

test("map") {
    def f(x:Int) = "#%d" format x
    MyList(1, 2, 3).map(f _) should be (MyList("#1", "#2", "#3"))
}

Implementation second

I created a small MyList class and companion object so it looks ‘natural’ to new Scala users. To keep it simple I will not repeat those definitions in further combinator examples. The class type parametrization allows for a full type coverage, so you can map from any type to any type, just like in standard implementation (and it’s fun).

case class MyList[A](items: List[A]) {
    def map[B](f: A => B) = MyList(
        items.foldLeft(List[B]()){ (l, x) => l :+ f(x) }
    )
}

object MyList {
    def apply[A](items: A*) = new MyList(List(items: _*))
}

So what exactly happened there?

I defined a method of class MyList which takes as an argument a function which transforms value of type A into value of type B – just like standard map does. My map returns a new MyList constructed with all the items folded from left. Now, the folding starts with empty list, to which I append (:+ operator) value of f(x), thus creating a new list with each item transformed by function f.


filter

Create a new list containing only those items of the input list, for which predicate p holds (is true).

How I see filtering

Test first (!)

I expect only even numbers to go through my filter.

test("filter") {
    def p(x:Int) = x % 2 == 0
    MyList(1, 2, 3, 4, 5).filter(p _) should be (MyList(2, 4))
}

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(p:  A => Boolean) = MyList(
     items.foldLeft(List[A]()){ (l, x) => if (p(x)) l :+ x else l}
  )
}

So what exactly happened there?

It’s fairly easy this one – while folding from left I only append x for which p(x) holds true (function p in this context is called predicate) and return a new list. If the predicate does not hold I simply return the list I created so far.


zip

Create a new list holding pairs of items on corresponding positions from both lists.

How I see zipping

Test first (!)

I expect each item x from the first list to be paired up with item y on corresponding position in the second list, thus creating a list of (x,y) tuples. Following that logic, the resulting list should be as long as the shortest of two input lists.

test("zip")  {
    MyList(1, 2, 3).zip(MyList("#1", "#2")) should be (MyList((1, "#1"), (2, "#2")))
    MyList(1, 2).zip(MyList("#1", "#2", "#3")) should be (MyList((1, "#1"), (2, "#2")))
    MyList(1, 2, 3).zip(MyList("#1", "#2", "#3")) should be (MyList((1, "#1"), (2, "#2"), (3, "#3")))
}

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(fn:  A => Boolean) = ...

  def zip[B](ys: MyList[B]) = MyList(
     items.foldLeft(List[(A, B)]()){ (l, x) =>
       if (ys.length > l.length )
         l :+ (x, ys(l.length))
       else
         l
     }
  )

  def apply(i:Int) = items(i)

  val length = items.length
}

So what exactly happened there?

This one is not that straightforward. I had to extend MyList with length indication and implicit get to support the trick. So as you can see I start exactly like in the previous examples with an empty List of tuples and, as long as the ys list still has at least one element that can be paired with current x, I simply ad that pair to the new list. Otherwise I return the list I got so far. This is also a more clear example of closure.

Ok, so what is closure?

Simply put – if you use a term within your function which is not defined, the compiler will reach out to upper levels of your definitions to bound such term. In my example ys is so called free variable and is not bound within the scope of function defined as input of fold. Function containing at least one free variable is called open term. For each open term compiler will reach out to term‘s lexical environment (env, where function is defined) and try to bound all free variables, thus making the open term … (wait for it) … a closed term. In the end, the compilers goes out and out until it finds a match or simply fails with compilation problem. The real closure however happens at runtime when a snapshot of all the values of bound terms is taken. By popular demand I will dedicate a separate post to go into details of closures but until then – can you spot any other closures in my examples?


partition

Create a pair of lists where first one contains all of the items, for which predicate p holds and a second one containing all the items for which predicate p does not hold.

How I see partitioning

Test first (!)

I expect two lists, where first contains all the items gt 3 and the other one with all the items le 3.

test("partition") {
    def p(x:Int) = x > 3
    MyList(1, 4, 3, 5, 2).partition(p _) should be (MyList(4, 5),  MyList(1, 3, 2))
}

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(p:  A => Boolean) = ...

  def zip[B](ys: MyList[B]) = ...

  def partition(p: A => Boolean) = MyList(
     items.foldLeft((List[A](), List[A]())) { (ls, x) =>
       val (l1, l2) = ls
       if (p(x))
         (l1 :+ x, l2)
       else
         (l1, l2 :+ x)
     }
  )
  ...
}
object MyList {
  def apply[A](items: A*) = ...
  def apply[A](doubleItems:(List[A], List[A])) = (MyList(doubleItems._1), MyList(doubleItems._2))
}

So what exactly happened there?

I needed to extend MyList companion object to be able to produce a tuple of MyList instances to give a feeling of a standard List. Other than that the folding simply checks if given predicate p holds for current x and if so – puts it in the winners list l1, otherwise it lands in the loosers list l2. You can notice a nice feature of Scala at line 11 – multiple assignment of elements of tuple, which is nothing more than edge case of pattern matching. If I didn’t do that I would have to access the lists as ls._1 and ls._2 respectively, which is nice/not nice – depending on what you like and phases of the moon. I only used it to show it off, otherwise, in this particular case, imho it does not bring any value added (actually two of them).


find

Find the first element for which the predicate p holds.

How I see finding

Test first (!)

I expect the first even element to be found or None otherwise.

test("find") {
    def p(x:Int) = x % 2 == 0

    MyList(1, 2, 3, 4, 5).find(p _) should be (Some(2))
    MyList(1, 3, 5).find(p _) should be (None)
}

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(p:  A => Boolean) = ...

  def zip[B](ys: MyList[B]) = ...

  def partition(p: A => Boolean) = ...

  def find(p: A => Boolean) = {
    val x0:Option[A] = None
    items.foldLeft(x0){ (o, x) =>
      o match {
        case Some(_) => o
        case None => if (p(x)) Some(x) else None
      }
    }
  }
  ...
}

So what exactly happened there?

Well, first thing to notice is that find‘s contract expects Option. Easiest explained – Option is your savior for all the null battles you fought over the years. It’s an easy way to express that you may get a value or you may get squat (otherwise known as None in academic circles), so at least you know what to expect. Also – it goes great with pattern matching as you can see in the example above – it rolls off Scala’s tongue. Coming back to the implementation – I start with None and then I check if given predicate p holds for current x. If it does, then I represent it as Some and keep on returning it for subsequent iterations. Otherwise I keep on returning None. Oh, and the x0 trick is to make sure that the compilator infers the correct return type for the fold (which it always does based on initial folding value). If I would simply put literal None in place of x0 then it would not compile thinking that I want to return None.type. Try it and see if you can find better way to do this.


drop

Return a list containing all items but the first n.

How I see dropping

Test first (!)

I expect items on position 1(0) through 3(2) to be omitted in a result list.

test("drop") {
    MyList(5, 2, 1, 4, 3).drop(3) should be (MyList(4, 3))
}

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(p:  A => Boolean) = ...

  def zip[B](ys: MyList[B]) = ...

  def partition(p: A => Boolean) = ...

  def find(p: A => Boolean) = ...

  def drop(n:Int) = MyList(
    items.foldRight(List[A]()) { (x, l) =>
      if (items.length - l.length > n ) x +: l else l
    }
  )
  ...
}

So what exactly happened there?

This is the first time that foldRight makes a difference. For each subsequent iteration I check the how many items I transfered from the input list to my new list – I want to miss n of them. So if the difference is gt n I prepend (+: operator) the current x to the new list. Otherwise I skip it. Well, try using foldLeft ;).

Note on colon-ending operators

Any operator ending in : is called right-associative operator. The easiest way to think about it is that the operator is defined by the right operand. It certainly helps in distingushing metaphors like prepending (elem +: list) and appending (list :+ elem).


dropWhile

Return a list containing all items but the first few, for which the predicate p holds.
(At first I thought it simply dropped any item, for which the predicate did not hold, but what it does it drops until the predicate does not hold – checkout the test below. Thank you Samuel!)

How I see dropping-while

Test first (!)

I expect dropping items until an item is encountered, which is not divisible by zero.

test("dropWhile") {
    def p(x:Int) = x % 2 == 0

    MyList(2, 4, 3, 4, 5).dropWhile(p _) should be (MyList(3, 4, 5))
    MyList(1, 2, 3, 4, 5).dropWhile(p _) should be (MyList(1, 2, 3, 4, 5))
}

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(p:  A => Boolean) = MyList(
     items.foldLeft(List[A]()){ (l, x) => if (p(x)) l :+ x else l}
  )

  def zip[B](ys: MyList[B]) = ...

  def partition(p: A => Boolean) = ...

  def find(p: A => Boolean) = ...

  def drop(n:Int) = ...

  def dropWhile(p:  A => Boolean) = MyList(
    items.foldLeft(List[A]()){ (l, x) => if (l.isEmpty && p(x)) l else l :+ x}
  )
  ...
}

So what exactly happened there?

Quite similar to filter (actually more to filterNot), you simply add items, for which the predicate p does not hold. The trick here is to mark the moment when the predicate p does not hold for the first time and only start adding after that point – for this you simply check if the list so far is empty and if the predicate holds – keep on returning empty list. You can play around with De Morgan’s laws to suit the logical expressions, but one thing that is actually not that easy, is to implement the same on top of foldRight – think about it…

flatten

Returns a list in which all the first level collections are dismantled into single items.

How I see flattening

Test first (!)

I expect all first level lists (lists that are immediate items of my list) to be dismantled into separate items and added as immediate items of my list in situ.

test("flatten") {
    MyList(MyList(1, 2 ,3), MyList(4, 5, 6)).flatten should be (MyList(1, 2, 3, 4, 5, 6))
    MyList(MyList(MyList(1, 2), MyList(3, 4)), MyList(5, 6)).flatten should be (MyList(MyList(1, 2), MyList(3, 4), 5, 6))
}

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(p:  A => Boolean) = ...

  def zip[B](ys: MyList[B]) = ...

  def partition(p: A => Boolean) = ...

  def find(p: A => Boolean) = ...

  def drop(n:Int) = ...

  def dropWhile(p:  A => Boolean) = ...

  def flatten = MyList(
    items.foldLeft(List[Any]()) { (l, x) =>
      x match {
        case xl:MyList[_] => l ++ xl()
        case _ => l :+ x
      }
    }
  )
  ...
  def apply() = items
}

So what exactly happened there?

So what I did is fairly simple and showcases another nice usage of pattern matching, one that you can also use with try-catch in Scala, i.e. type matching. For each subsequent item I check whether its of type MyList and if so I simply add all its items to the new list (++ concatenation operator). Otherwise I simply add the current item to the list using already known appending operator :+. The only trick here is the xl() statement, for which I had to devise another implicit apply function. This time it returns the inner items of MyList. This is only to retain the whole metaphor of standard List without a need to implement all its behavior and simply because I can ;).


foreach

Invokes provided function for each of the elements.

Test first (!)

For each element I expect it to be printed out to standard out and added to a list. Because there is no return type, the list extension with the item should have no effect (ugly, ain’t it?).

  test("foreach") {
    val list = List()
    def f(x:Int) { println(x); list :+ x }

    MyList(1, 2, 3).foreach(f _)
    list should be (List())
  }

Implementation second

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = ...

  def filter(p:  A => Boolean) = ...

  def zip[B](ys: MyList[B]) = ...

  def partition(p: A => Boolean) = ...

  def find(p: A => Boolean) = ...

  def drop(n:Int) = ...

  def dropWhile(p:  A => Boolean) = ...

  def flatten = ...

  def foreach(f: A => Unit) {
    items.foldLeft(){ (_, x) => f(x) }
  }
  ...
}

So what exactly happened there?

First thing you’ll notice is the strange return type for this fold – (). It’s called Unit and means I expect no value to be returned. That’s also why I don’t bother assigning the current fold value y to any variable by specifying _ (partial application). Then I take the x and apply provided function f to it.

Note on Unit

foreach is a great example on how hard it is to prove (test) any non-functional code. It is what is considered to be a procedure in a programming sense, so how does this work with Scala’s functional side? The Unit return type is very similar to void in java – it means no return value, but you still return it. In Scala any function definition which is not followed by = sign will by default return Unit even if your last line returns a value.


It’s your turn now…

I think I gave you enough ammunition to experiment further with the Scala collections. You can try to implement more functions from the List api like take, unzip, etc. You can also try to implement some of the functions in this article using foldRight iso foldLeft and vice versa. Finally you can try to implement similar functionality for other Collections – Sets, Maps, etc. I say – go wild!. Send any new implementations and I’ll include them in this post with your name in bold (oh, the fame!).
Feel free to extend, play and experiment with the final listing below. Feedback is more than welcome. Till next time!

package mutablenotions

import org.scalatest.junit.JUnitRunner
import org.junit.runner.RunWith

import org.scalatest.{BeforeAndAfter, FunSuite}
import org.scalatest.matchers.ShouldMatchers

@RunWith(classOf[JUnitRunner])
class BlogTest extends FunSuite with BeforeAndAfter with ShouldMatchers {

  test("foldLeft and foldRight")  {

    val candidates = List(3, 1, -2, 5, 1)
    val initialLeftState:(Option[Int], Int) = (None, -1)
    val initialRightState:(Option[Int], Int) = (None, candidates.length)

    def isTheOne(y:(Option[Int], Int), x:Int, i: Int) = {
      def next(index:Int) = index + i
      y match {
        case (None,  index) => if (x == 1) (Some(x), next(index)) else (None, next(index))
        case foundItem => foundItem
      }
    }

    candidates.foldLeft(initialLeftState){ (y, x) => isTheOne(y, x, 1) } should be (Some(1), 1)
    candidates.foldRight(initialRightState){ (y, c) => isTheOne(y, x, -1) } should be (Some(1), 4)
  }

  test("zip")  {
    MyList(1, 2, 3).zip(MyList("#1", "#2")) should be (MyList((1, "#1"), (2, "#2")))
    MyList(1, 2).zip(MyList("#1", "#2", "#3")) should be (MyList((1, "#1"), (2, "#2")))
    MyList(1, 2, 3).zip(MyList("#1", "#2", "#3")) should be (MyList((1, "#1"), (2, "#2"), (3, "#3")))
  }

  test("map") {
    def f(x:Int) = "#%d" format x

    MyList(1, 2, 3).map(f _) should be (MyList("#1", "#2", "#3"))
  }

  test("drop") {
    MyList(5, 2, 1, 4, 3).drop(3) should be (MyList(4, 3))
  }

  test("filter") {
    def p(x:Int) = x % 2 == 0

    MyList(1, 2, 3, 4, 5).filter(p _) should be (MyList(2, 4))
  }

  test("dropWhile") {
    def p(x:Int) = x % 2 == 0

    MyList(2, 4, 3, 4, 5).dropWhile(p _) should be (MyList(3, 4, 5))
    MyList(1, 2, 3, 4, 5).dropWhile(p _) should be (MyList(1, 2, 3, 4, 5))
  }

  test("find") {
    def p(x:Int) = x % 2 == 0

    MyList(1, 2, 3, 4, 5).find(p _) should be (Some(2))
    MyList(1, 3, 5).find(p _) should be (None)
  }

  test("partition") {
    def p(x:Int) = x > 3
    MyList(1, 4, 3, 5, 2).partition(p _) should be (MyList(4, 5),  MyList(1, 3, 2))
  }

  test("foreach") {
    val list = List()
    def f(x:Int) { println(x); list :+ x }

    MyList(1, 2, 3).foreach(f _)
    list should be (List())
  }

  test("flatten") {
    MyList(MyList(1, 2 ,3), MyList(4, 5, 6)).flatten should be (MyList(1, 2, 3, 4, 5, 6))
    MyList(MyList(MyList(1, 2), MyList(3, 4)), MyList(5, 6)).flatten should be (MyList(MyList(1, 2), MyList(3, 4), 5, 6))
  }
}

case class MyList[A](items: List[A]) {

  def map[B](f: A => B) = MyList(
     items.foldLeft(List[B]()){ (l, x) =>  l :+ f(x) }
  )

  def filter(p:  A => Boolean) = MyList(
     items.foldLeft(List[A]()){ (l, x) => if (p(x)) l :+ x else l}
  )

  def zip[B](ys: MyList[B]) = MyList(
     items.foldLeft(List[(A, B)]()){ (l, x) =>
       if (ys.length > l.length )
         l :+ (x, ys(l.length))
       else
         l
     }
  )

  def partition(p: A => Boolean) = MyList(
     items.foldLeft((List[A](), List[A]())) { (ls, x) =>
       val (l1, l2) = ls
       if (p(x))
         (l1 :+ x, l2)
       else
         (l1, l2 :+ x)
     }
  )

  def find(p: A => Boolean) = {
    val x0:Option[A] = None
    items.foldLeft(x0){ (o, x) =>
      o match {
        case None => if (p(x)) Some(x) else None
        case Some(_) => o
      }
    }
  }

  def drop(n:Int) = MyList(
    items.foldRight(List[A]()) { (x, l) =>
      if (items.length - l.length > n ) x +: l else l
    }
  )

  def dropWhile(p:  A => Boolean) = MyList(
    items.foldLeft(List[A]()){ (l, x) => if (l.isEmpty && p(x)) l else l :+ x}
  )

  def flatten = MyList(
    items.foldLeft(List[Any]()) { (l, x) =>
      x match {
        case xl:MyList[_] => println("adding list %s to %s" format (xl, l)); l ++ xl()
        case _ => println("adding single %s to %s" format (x, l)); l :+ x
      }
    }
  )

  def foreach(f: A => Unit) {
    items.foldLeft(){ (_, x) => f(x) }
  }

  def apply(i:Int) = items(i)
  def apply() = items

  val length = items.length
}

object MyList {
  def apply[A](items: A*) = new MyList(List(items: _*))
  def apply[A](doubleItems:(List[A], List[A])) = (new MyList(doubleItems._1), new MyList(doubleItems._2))
}
Advertisements

Leave a reply after the beep...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s