Wednesday, February 20, 2008

Scala class at Stanford

I am pleased to announce that Ben Newman, Julien Wetterwald, and I will be teaching CS94SI: Cross-Paradigm Programming with Scala, a student-initiated course at Stanford University this coming Spring. As the name suggests, the course will focus on how Scala can be used to program with several programming paradigms (functional, object-oriented, concurrent, domain-specific, etc.).

If you have any experience teaching Scala, in an academic setting or otherwise, I'd love to hear about your experiences at jorge.ortiz@gmail.com.

<Lecturer Details>
Ben is pursuing a Masters degree in Computer Science at Stanford with a specialization in Software Theory. He has a passion for programming languages and has extensive teaching experience, having been a teaching assistant for CS106A, CS107, and CS156 at Stanford.

Julien is also pursuing a Masters degree in Computer Science at Stanford with a specialization in Software Theory. His main areas of interest are programming language design and compilers. As an undergraduate at EPFL, he was a teaching assistant for Martin Odersky's class Programmation IV, taught in Scala. He has used Scala to write a compiler and a domain-specific language for graphical user interfaces.

Jorge graduated from Stanford with a Bachelors in Symbolic Systems in June 2007. During his time there he helped teach CS106A, CS106X, and CS107. He is a contributor for http://scala-blogs.org/ and the lift web framework, written in Scala. He uses lift and Scala every day at work.
</Lecturer Details>

<Course Details>
No previous exposure to Scala is required, though we do ask that students have studied at least two programming languages in an academic setting (CS107 is recommended; CS 242 may increase your appreciation for the class). Students will be expected to complete a final project of their choosing. The one-unit class will meet Wednesdays from 4:15-5:45pm in Gates 260. Stanford students can register for CS94SI on Axess.
</Course Details>

Tuesday, February 5, 2008

lift 0.5, now with Blueprint CSS beauty, jQuery excellence, and JSON goodness

Folks,

I am pleased to announce lift version 0.5.

lift is an expressive and elegant framework for writing web applications. lift stresses the importance of security, maintainability, scalability and performance, while allowing for high levels of developer productivity.

Demo: http://demo.liftweb.net

New in version 0.5:


  • jlift JavaScript utilities and associated binder helpers

  • Blueprint CSS built in (version 0.6 of Blueprint CSS) so it can be served via /classpath

  • XML -> JavaScript bindings so that JSON -> XHTML rendering can be defined using Scala XML literals, but the resulting JavaScript code can be sent to the browser to do mass rendering of JSON objects

  • JSON support including: - S.buildJSONFunc - JSONHandler class - Lots of fancy JavaScript expression building

  • Parser Combinator Helper routines

  • ByList query term

  • Support for serving certain common files (e.g., jQuery, Blueprint CSS) right from the JAR/WAR file

  • MappedText and MapperFakeClob classes for storing very long text fields in the database




Changes include:

  • Revised the example site to use Blueprint CSS for style

  • When lift is running in test mode, the default form generators will insert lift:field_name attribute into the form fields to aid in testing

  • Added json2.js (and minified version) Enhanced the resource serving stuff so that: - The path is defined in LiftServlet and changable (defaults to "classpath") - Moved the resource location from the root to "toserve" (can be changed) to avoid any possible way of serving up classes - The white list is a Partial Function for more flexibility - The white list by default contains jquery.js and json.js - There's a re-writer to re-write the request into an actual file (so jquery.js gets re-written to jquery-1.2.2-min.js)

  • MappedField.toForm now returns a Can[NodeSeq ] so a field can be omitted from a form

  • Added LocInfo and LocInfoVal to SiteMap for storing CSS and other tidbits about each menu item

  • Fixed the 'Welcome to the your project' type-o

  • Further abstracted the whole logging thing. There's a default log4j setup that can be override The LiftLogger trait is made concrete in Log4JLogger, but the generic LiftLogger creation is done by function that can be modified in LogBoot: var loggerByName: String => LiftLogger = _logger var loggerByClass: Class => LiftLogger = _loggerCls One can insert a new logging system into lift by replacing the loggerByName and loggerByClass functions with appropriate functions that return a LiftLogger.

  • Extensive updates to the <lift:xxx /> tag mechanism that uses tag labels as snippets and support for adding arbitrary tag handling to lift.


The lift community has grown to over 400 people. There are a number of lift-based project in development including http://much4.us which makes use of lift's Comet, AJAX, and JSON support.

Thanks to the excellent, talented team of lift committers including Jorge Ortiz who was build-master for this release, Steve Jenson who keeps adding practical things to lift, and the rest of the lift team.

Please take a look at lift 0.5, join the lift community, and give us some feedback!

Thanks,

David

Sunday, February 3, 2008

From Regexps to Regexp Patterns, still functional

Here now is the long overdue follow-up to "RegExp but no state machine". Remember, we were talking about regular expressions: state machines come to mind, and we hurry to the cabinet with the imperative chainsaw, but it turns out that some scalpel-like pure functions can be used to cut the problem as well.

Binding Variables


Quite often, we are not just interested in the boolean answer to the question "did this input match the regexp?". We also want to know the parts that matched parts of the regexp, as in "if the file ends in .txt, put its name without the extension into variable x". In other words, we want to bind variables to parts of the input, depending on how the input matched. Regexps that contain variables deserve to get a beautiful and original name: let us call them regexp patterns.

You want to see an example that is not buried in natural language? We can do that... I will use Scala syntax, but keep in mind that regexp patterns in Scala are
undergoing a redesign - these example are just here to explain regexp patterns.

 
def just_the_name(name:Seq[Char]) = name match {
case Seq(filename @ _*, '.', 't', 'x', 't') => filename
}


Matching regexp patterns is a bit more involved than matching regexps. That is not apparent from the above, tame example, but what about something like this



def is_backup(name:Seq[Char]) = name match {
case Seq(_*, bak @ (() | ('.','b','a','k'))) => !bak.isEmpty
}


There are two ways that the input name can match the regexp patterns, a fact that is cleverly used to return a boolean to the caller. When there are many or-patterns and "wildcard_star" patterns, the number of possibilities can grow very large.

Back to Business


Maybe regexp patterns will be resurrected soon in Scala, for now, let us focus on the problem and its functional solution. We will write a Scala program that matches regular expression patterns.

Here is code to represent regexp patterns. It uses the RegExp classes defined before (a complete listing that you can compile in one go is provided at the end).

abstract class Pat[A]
// v is the variable (represented as Int), w the prefix that was
// already seen, r the remaining part we yet have to match
case class PVar[A](v:Int, word:RegExp.Word[A], re:RegExp[A]) extends Pat[A] {
override def toString = "x"+v+"@"+re.toString
}
case class PPair[A](left:Pat[A], right:Pat[A]) extends Pat[A]
case class PChoice[A](left:Pat[A], right:Pat[A]) extends Pat[A]
case class PatNil[A] extends Pat[A]
case class PatCons[B](hd:B, tl:List[B]) extends Pat[B]


As the comment says, these pattern representation is interesting: it does not only represent regexp patterns, but also partial matches of inputs. When we write down a regexp pattern, the "word" field of PVar will be empty. During the matching, as input becomes available, the different possibilities of going through the input will correspond to different instances of Pat.

Let us try how that would look for the is_backup example. To keep things short, we turn the pattern into x@'A', y@(()|'B')). We match the input A against the pattern, and see how we would go about this. We freely add the empty sequence () where it is appropriate.


Is the first letter of A consumed by x@'A', y@(()|'B'))? Yes, and it is bound to x.
Is () consumed by y@(()|'B'))? Yes, and nothing is bound y and we are done.



Now the same with AB


Is the first letter of AB consumed by x@'A', y@(()|'B'))? Yes, and it is bound to x.
Is the first letter of B consumed by y@(()|'B'))? Yes, and B is bound y.
Is () consumed by ()? yes and we are done.


So we want to step character by character through the input, and
keep track of the bindings by rewriting our patterns -- this is were the partial derivatives come into play. A function pdPat will rewrite a pattern with a partial derivative (the partDeriv function was explained before, in "RegExp but no state machine".)


def pdPat[A](pat:Pat[A],c:A):Pat[A] =
pat match {
case PVar(x, w, r) => PVar(x, w:::List(c), partDeriv(r, c))
case PPair(p1, p2) =>
val pn = PPair(pdPat(p1, c), p2)
if (acceptsEmpty(strip(p1)))
PChoice(pn, PPair(mkEmpPat(p1), pdPat(p2, c)))
else
pn
case PChoice(p1, p2) => PChoice(pdPat(p1, c), pdPat(p2, c))
}


In short, this function takes a regexp pattern and a character. It goes through the regexp pattern and constructs the partial derivative of its underlying regexp. When it needs to keep track of a binding (case PVar), it does so by appending the character at the end. It uses the functions mkEmpPat (which means matching against empty input and makes the regexps disappear or fail in a regexp pattern, but keeps the bindings that were computed before) and strip, which just gives us the underlying regexp and ignores all variable binding stuff that might be going on.

Exercise: write a function that computes the result of accepts_empty(strip(p)) directly, without constructing a regexp first.


The Full Monty


Here now, without further ado, the full example (including stuff from the first part). As always, it might be instructive to run to try out with your own examples, execute step-wise in your IDE debugger (Eclipse and Netbeans), or, the classic, insert lots of println statements.

Note that different possibilities of feeding an input character to a choice are tracked by return a list of environments. The "longest match" and the "shortest match" policies can be recovered by selecting a special element of the list of environments that is the end result of running the patMatch function.


object RegExpPat {

// from part 1

type Word[A] = List[A]

def matchRegExp[A](re:RegExp[A], w:Word[A]): Boolean =
w match {
case List() => acceptsEmpty(re)
case c::w1 => matchRegExp(partDeriv(re, c), w1)
}

def acceptsEmpty[A](re:RegExp[A]): Boolean =
re match {
case Phi() => false
case Empty() => true
case Choice(r1,r2) => acceptsEmpty(r1) || acceptsEmpty(r2)
case Seq(r1,r2) => acceptsEmpty(r1) && acceptsEmpty(r2)
case Star(r) => true
case L(_) => false
}

def partDeriv[A](re:RegExp[A], c:A): RegExp[A] = {
(re, c) match {
case (Phi(), _) => Phi()
case (Empty(), _) => Phi()
case (L(c1), c2) if (c1 == c2) => Empty()
case (L(c1), _) => Phi()
case (Choice(r1,r2), c) => Choice(partDeriv(r1,c),partDeriv(r2,c))
case (Seq(r1,r2), c) =>
val rn = Seq(partDeriv(r1,c),r2)
if (acceptsEmpty(r1)) Choice(rn, partDeriv(r2,c)) else rn
case (rs @ Star(r), c) => Seq(partDeriv(r,c),rs)
}
}

abstract class RegExp[A]

case class Phi[A]() extends RegExp[A]
case class Empty[A]() extends RegExp[A]
case class L[A](letter:A) extends RegExp[A]
case class Choice[A](left:RegExp[A], right:RegExp[A]) extends RegExp[A]
case class Seq[A](left:RegExp[A], right:RegExp[A]) extends RegExp[A]
case class Star[A](exp:RegExp[A]) extends RegExp[A]

// part 2

abstract class Pat[A]
// vx is the variable (represented as Int), w the prefix that was
// already seen, r the remaining part we yet have to match
case class PVar[A](v:Int, word:Word[A], re:RegExp[A]) extends Pat[A] {
override def toString = "x"+v+"@"+re.toString
}
case class PPair[A](left:Pat[A], right:Pat[A]) extends Pat[A]
case class PChoice[A](left:Pat[A], right:Pat[A]) extends Pat[A]
case class PatNil[A] extends Pat[A]
case class PatCons[B](hd:B, tl:List[B]) extends Pat[B]

// combinator
def pat_var[A](n: Int, r:RegExp[A]) = PVar(n,List(),r)

// binding
type Env[a] = List[(Int,Word[a])]

def patMatch[A](pat:Pat[A], word:Word[A]):List[Env[A]] = {
(pat,word) match {
case (p, c::w) =>
patMatch(pdPat(pat, c), w)
case (PVar(x,w,r),List()) =>
if (acceptsEmpty(r)) List(List((x,w))) else List()
case (PChoice(p1,p2), List()) =>
patMatch(p1, List()):::patMatch(p2, List())
case (PPair(p1,p2), List()) =>
for(xs <- patMatch(p1, List()); ys <- patMatch(p2, List())) yield xs:::ys
}
}

def longPatMatch[A](pat:Pat[A], word:Word[A]):Option[Env[A]] =
patMatch(pat,word) match {
case env::_ => Some(env)
case _ => None
}

def shortPatMatch[A](pat:Pat[A], word:Word[A]):Option[Env[A]] =
patMatch(pat,word) match {
case List() => None
case xs => Some(xs.last)
}

def pdPat[A](pat:Pat[A],c:A):Pat[A] =
pat match {
case PVar(x, w, r) => PVar(x, w:::List(c), partDeriv(r, c))
case PPair(p1, p2) =>
val pn = PPair(pdPat(p1, c), p2)
if (acceptsEmpty(strip(p1)))
PChoice(pn, PPair(mkEmpPat(p1), pdPat(p2, c)))
else
pn
case PChoice(p1, p2) => PChoice(pdPat(p1, c), pdPat(p2, c))
}

def strip[A](pat:Pat[A]):RegExp[A] =
pat match {
case PVar(_,w,r) => r
case PPair(p1,p2) => Seq(strip(p1),strip(p2))
case PChoice(p1,p2) => Choice (strip(p1),strip(p2))
}

def mkEmpPat[A](pat:Pat[A]):Pat[A] =
pat match {
case PVar(x,w,r) =>
if (acceptsEmpty(r))
PVar(x, w, Empty[A]())
else
PVar(x, w, Phi[A]())
case PPair(p1,p2) => PPair(mkEmpPat(p1), mkEmpPat(p2))
case PChoice(p1,p2) => PChoice (mkEmpPat(p1), mkEmpPat(p2))
}

// example for testing
val the_choice = Choice(Empty(), L('B'))

val ex = PPair(PVar(1, List(), Star(L('A'))), PVar(2, List(), the_choice))

def main(args:Array[String]) {
println(longPatMatch(ex, List('A')))
println(longPatMatch(ex, List('A','B')))
}
}


Oh, did I mention that there is no mutable state involved?

Credit: The functional implementation described here follows Kenny Zhuo Ming Lu and Martin Sulzmann. A toast to them! Several people, including myself, found similar solutions, but they are the heroic ones that implemented and proved it all.