Archive for May, 2008

Some concepts behind hvac

Having written the hvac and HStringTemplate libraries, I need to get around to documenting them better, not to mention polishing hvac up for a hackage release. But in the meantime, by way of introduction to (hopefully) a series of tutorials, I figured I’d go through some of the basic principles behind what I’ve been doing. Which is sort of an arrogant thing, so apologies therefore for how dull or obvious the following may be.

Web programming is an exercise in managing scope

Even a reasonably well-architected web framework in Java or Python or what-have-you will at some point throw scope and typing to the wind. You have page-level scope, request-level scope, cookie-level scope, session-level scope, database-level persistent scope, application-level scope, server-level scope, and on top of that maybe conversation-level, or a few others depending on how fancy things get. (Oh, and GET and POST params fall into different scopes too… and then there are rest-style url params too).

In php you generally get this stuff (application and db aside) shoved into what are called “superglobal” variables, which are maps from strings to… stuff. Rails does the same thing with hashes, and sometimes with symbols rather than strings. Java generally tends to shove stuff into hash-like things as well (e.g. request, pageContext [which is, horrors, indexed by scope *and* name]). And with java frameworks using beans, you tend to get a stack in the page scope as well, so tags have something implicit to operate over (although this tends to be hidden from the end-user).

So scope then gives us a number of problems — anything can be anywhere, named anything, and of any type. And most webapps are built on, at some level, just trusting that some component somewhere up the chain got the right thing and set it to the right type, and that therefore when you look up the name you get it, and when you try to use it as the type you expect nothing goes terribly wrong. And don’t even get me started on xml configs (injection-style or otherwise).

Thus to make web programming sane, we want some guarantees that would be elementary in any language since, I guess, Fortran. We want to have a guarantee that the things you expect to exist do exist,and furthermore, that they are what you expect them to be. We’re not talking Coq or even Haskell level strong static typing here, but more like Java, or C for that matter.

So wipe the slate clean and pretend we have no scope at all outside of the request itself — i.e. that our server is a pure function from Request -> String. Now, our request gives us a url, some get and post params, and some other possible stuff (user-agent, language, cookies, what-have-you). We want to map that request to a function that handles it. The normal way to do this is to do some silly regexp or write an ugly xml file, or both, or just to write a naive switch expression, or finally, to simply have a php model where each request maps to a file on the directory hierarchy (Oh, and of course, throw in a smattering of mod_rewrite rules as well!).

The directory model actually turns out to be the most interesting. This is because, each step into the tree we take, we narrow the possibilities of each further step. So we’re actually traversing a very tiny grammar. And if we match a file, then the path is a well-formed statement in that grammar. This brings us to a core principle of hvac — do you see it?

Controllers/dispatchers are parsers

We can explicitly write our controller function now as something which parses a request, and whose result is an http response. Or, to generalize back out, whose result is a function on the execution environment as a whole, yielding an http response.

Naturally, we want a backtracking parser, because we want to allow the end-user to construct an arbitrary grammar rather than a tightly restricted one. But not everything needs to backtrack. We “tokenize” our url into /-separated strings and consume and backtrack on those. But, GET params, etc? We don’t need to consume them, just check their existence.

And the really nice thing about Haskell here is that it comes (in the form of the Alternative class in Control.Applicative) with a built-in almost-dsl for writing neat little monadic parsers. Rather than write plain old functions, we can write combinator functions and so wrap the parser functionality up in something that reads like what it does. So a small snippet of hvac code from a controller might look like this:

 h |/ "board" |\ \(boardId::Int) ->
     (h |/ "new" >>
         h |// "POST" *> ... {- POST stuff -}
	  <|> ... {- GET stuff -} )
      <|> ... {-OTHER stuff-}

The hs are null parsers necessary to set off a chain of combinators. This code says: if the next part of the path is “board” and the part after that can be parsed to an Int, then if the part after that is “new” and its a POST request, do one set of things. But if its a GET request, do another. And if the part after the boardId *isn’t* “new” then go do some other stuff instead.

So we see already how treating our controller as a parser both lets us cut down on code duplication (we’re only getting boardId once, for all paths that have it), lets us embed fancy logic very compactly, and most importantly, guarantees that, in this case, we’ll have something named boardId in scope, and of the right type.

There are a few more dsl combinators in hvac, but they all operate essentially the same. So now, this gets us partway there. However, it hasn’t addressed get/post params, cookies, sessions, databases, etc. at all yet (although there are combinators to match simple equality on get/post params too). Let’s add just get/post params for the next level of complexity. Now get/post params are used slightly differently, because they’re based generally on user input. So our criteria for them need to be different — there are optional fields that may not be filled in, “valid” values might not meet other criteria (too short, invalid dates, not a genuine email address etc.) and soforth. Furthermore we don’t generally want bad values to prevent matching a page, rather we want an error page that highlights *all* the errors. So we know we need another layer for this.

Generally, web frameworks will hold another validation layer that purely performs validation. ASP bakes this in rather obscurely. PHP generally provides no particular solution. Python, as I understand it uses newForms and bakes validation into a forms layer in general. Ruby does it through activeRecord (which begs the question, how to validate data that isn’t destined for an activeRecord?). Java frameworks, unsurprisingly, will tend to use XML (Or nowadays, some annotation monstrosity too, I imagine, as tied to, e.g., Hibernate).

These styles of validation can lead to lost information at times — e.g., even once a validator passes you still may need subsequent manual casting or parsing. Furthermore, they tend to mix layers in that “valid” for the database (i.e. well-typed) should well be different than “valid” in a more narrow sense (i.e. a proper email address). And then an “invalid” userid depends on whether that userid is in the database as well as if that userid has enough of only the allowed characters, etc. You can guess where I’m going, right?

Validation is parsing (in context)

We can place all our constraints on incoming data, as well as instructions on how to parse it into the data type we want (which might even mean, e.g., turning a user id into a full-fledged hunk of user data as built from a database query). Here’s an example validation function that does just that (positing that mkUser takes the returned map and shoves it into a user object).

validUser :: ValidationFunc s String User
validUser s = do
  u <- selectRow "* from users where name=?"
  if M.null u
    then fail $ "No such user: " ++ s
    else return mkUser u

The type signature here says that this function operates on any session type (which we’ll ignore for now) and takes a String and returns a User (or, of course, an error). One neat thing worth mentioning here, even though I haven’t touched on databases and application state yet, is that the type of ValidationFunc allows us to read things such as databases, but *not* to write to them, since it would lead to all sorts of trouble if validators started preforming unrestricted side-effects.

We might invoke the validator (and a few others, just to show how nifty it is to chain them) like this:

withValidation
  (("user", trim >=> lengthAtLeast 2 >=> userValid),
   ("pass", lengthAtLeast 3))
  (\ (user, pass) -> ... )

Here, we see that the action inside of the ellipsis is performed only if the validation and parsing succeeds, and again, when it is performed, it is guaranteed that the values it recieves are in the data type we expect. (If the validation fails, the enclosing “parse” fails the controller takes over again, but first the specific validation failures are recorded elsewhere for handy reference).

That’s enough for one post, I think, and I haven’t even dealt with templates yet, much less scope and mutable state.

Comments (7)

Simple Extensible Records — Quick Generic Tricks, Pt. 1

There have been a few discussions lately about how to do quick and easy typesafe extensible records in Haskell. And there have been a number of discussions lately about extensions to do it more cleanly (see, e.g. this one on haskell-cafe). This came up on the channel the other day and somebody immediately suggested HList, which seems too heavyweight for this sort of thing to me. HaskellDB, for example, also rolls its own fairly heavyweight new records system.

Anyway, in discussion, we cooked up an idea which Kamina, who originally asked the question, later implemented very cleanly. The gist is that one defines a map of properties, and implements a typeclass for accessing them. Subsequently, one defines a typeclass for each property, whose default methods know how to query them from the properties map. Then, providing any data type with a new property is just a quick instance declaration away. There’s a slight messiness in that one needs to directly index properties by strings, or by a Property enumeration that’s set in stone. (One can actually index them by TypeReps using Typeable, but that’s sort of messy.)

However, there’s another issue as well — we want records extensible in two ways. The typical way is just being able to reuse an accessor and setter function over multiple data types with the same property. But we also want to be able to actually extend records with something like inheritance — i.e. to declare some records strict supersets of others. The following solution, with generics, both solves that issue, and is significantly more lightweight, although it still requires mild boilerplate. Additionally, while the former solution lets one “break” it by circumventing the typeclass-bound “smart accessor” functions and operating on the properties map directly, this solution can also be “broken” by giving it a bad instance declaration.

Both solutions, one should note, can easily be augmented with Template Haskell to reduce the boilerplate still further.

We start by realizing that rather than creating a properties map, it would be nice if we could walk directly over the data in our record and select something of the right type. This, of course, is something that the Scrap Your Boilerplate generics library excels at. In fact, it turns out we have the function we need right at our fingertips: gfindtype, which is of type (Data x, Typeable y) => x -> Maybe y . We ask it for a specific type, and if that type is an immediate subterm of x, then it returns it, otherwise it fails.

Now we need a modifier. There isn’t one built in, but its simple enough to add one. The type signature is immediately evident: greplace :: (Data a, Typeable b) => a -> b -> Maybe a . However, what to do with this type signature is a bit trickier. Data.Generics gives us gmapMo of type (Data a, MonadPlus m) => (a -> m a) -> a -> m a. It traverses the immediate subterms of a type, and applies a transformation. We build our traverser using `extM` which composes a chain of actions to try, matching from right to left the first with an appropriate type. Our default action when nothing matches is to const Nothing (i.e., to leave the term unchanged). When the term does match, we return just the term. This gives us:

greplace x y = gmapMo (const Nothing `extM` (const (Just y))) x

And indeed, this does what we want. However, we don’t have any type safety constraints yet. Every query and update gives us a maybe, depending if it matches. But we can combine this system with the typeclass constraints from the other system and get a very clean interface. Assume we have properties of type Color and Size:

data Color = Red | Yellow | Green | Blue | White | Black deriving (Show, Data, Typeable)
data Size = Size Int deriving (Show, Data, Typeable)

Now we can just write:

class Data a => HasColor a where
    color :: a -> Color
    color = fromJust . gfindtype
    setColor :: a -> Color -> a
    setColor = (fromJust .) . greplace

And for anything with a Color, e.g., a car, we just can write instance HasColor Car. Not too shabby!

However, as we’ve written it, this only works for immediate subterms. So, just like our old solution with an explicit properties map, we still can’t extend our records just by wrapping them as part of a larger record.

Luckily, the generics libraries make it trivial to do deep searches as well. We can replace gfindnew with gfind = something (const Nothing `extQ` Just) . This function is built much the same as our greplace was — yielding Just a value of the appropriate type, or Nothing otherwise. The something combinator just maps our query function deeply over every term, from top to bottom, until it hits a match. And as for greplace, we can just replace gmapMo with “somewhere” for a similar effect. Now, if we have data DrivenCar = DrivenCar Person Car we can give the appropriate instance declarations, and get and set the color of the car directly here as well.

One more problem though: the somewhere combinator will apply our transformation at least once — which means that it might apply it many multiple times, and even if it doesn’t, it’ll keep working ever after it succeeds. This too is easy enough to fix. Look at the source and we see that somewhere is just defined as f x `mplus` gmapMq (somewhere f) x. So its trying the transform, and if the transform fails, its mapping the transform over all the immediate subterms, recursively. All we have to do is write a function “once” that looks exactly the same, but with gmapMo instead of gmapMq, and we have what we really need.

A complete working example follows below:

{-# LANGUAGE FlexibleInstances, FlexibleContexts, DeriveDataTypeable  #-}

module Records where

import Data.Generics.Basics
import Data.Generics.Aliases
import Data.Generics.Schemes
import Control.Monad
import Data.Maybe

greplace :: (Data a, Typeable b) => a -> b -> Maybe a
greplace x y = once (const Nothing `extM` (const (Just y))) x

once :: MonadPlus m => GenericM m -> GenericM m
once f x = f x `mplus` gmapMo (once f) x

gfind :: (Data a, Typeable b) => a -> Maybe b
gfind = something (const Nothing `extQ` Just)

data Color = Red | Yellow | Green | Blue | White | Black deriving (Show, Data, Typeable)
newtype Size = Size Int deriving (Show, Data, Typeable)
unSize (Size a) = a

data Car = Car Color Size deriving (Data, Typeable, Show)
data Cube = Cube Color deriving (Data, Typeable, Show)

data Two = Two Int Car Car deriving (Data, Typeable, Show)

class Data a => HasColor a where
    getColor :: a -> Color
    getColor = fromJust . gfind
    setColor :: a -> Color -> a
    setColor = (fromJust .) . greplace

class Data a => HasSize a where
    getSize :: a -> Int
    getSize = unSize . fromJust . gfind
    setSize :: a -> Int -> a
    setSize = (fromJust .) . (. Size) . greplace

instance HasColor Car
instance HasColor Cube
instance HasSize Car
instance HasColor Two
instance HasSize Two

{-
*Records> getColor (Two 12 (Car Red (Size 34)) (Car Blue (Size 93)))
Red

*Records> setSize (Two 12 (Car Red (Size 34)) (Car Blue (Size 93))) 232
Two 12 (Car Red (Size 232)) (Car Blue (Size 93))
-}

Comments (1)