Some concepts behind hvac

Having written the hvac and HStringTemplate libraries, I need to get around to documenting them better, not to mention polishing hvac up for a hackage release. But in the meantime, by way of introduction to (hopefully) a series of tutorials, I figured I’d go through some of the basic principles behind what I’ve been doing. Which is sort of an arrogant thing, so apologies therefore for how dull or obvious the following may be.

Web programming is an exercise in managing scope

Even a reasonably well-architected web framework in Java or Python or what-have-you will at some point throw scope and typing to the wind. You have page-level scope, request-level scope, cookie-level scope, session-level scope, database-level persistent scope, application-level scope, server-level scope, and on top of that maybe conversation-level, or a few others depending on how fancy things get. (Oh, and GET and POST params fall into different scopes too… and then there are rest-style url params too).

In php you generally get this stuff (application and db aside) shoved into what are called “superglobal” variables, which are maps from strings to… stuff. Rails does the same thing with hashes, and sometimes with symbols rather than strings. Java generally tends to shove stuff into hash-like things as well (e.g. request, pageContext [which is, horrors, indexed by scope *and* name]). And with java frameworks using beans, you tend to get a stack in the page scope as well, so tags have something implicit to operate over (although this tends to be hidden from the end-user).

So scope then gives us a number of problems — anything can be anywhere, named anything, and of any type. And most webapps are built on, at some level, just trusting that some component somewhere up the chain got the right thing and set it to the right type, and that therefore when you look up the name you get it, and when you try to use it as the type you expect nothing goes terribly wrong. And don’t even get me started on xml configs (injection-style or otherwise).

Thus to make web programming sane, we want some guarantees that would be elementary in any language since, I guess, Fortran. We want to have a guarantee that the things you expect to exist do exist,and furthermore, that they are what you expect them to be. We’re not talking Coq or even Haskell level strong static typing here, but more like Java, or C for that matter.

So wipe the slate clean and pretend we have no scope at all outside of the request itself — i.e. that our server is a pure function from Request -> String. Now, our request gives us a url, some get and post params, and some other possible stuff (user-agent, language, cookies, what-have-you). We want to map that request to a function that handles it. The normal way to do this is to do some silly regexp or write an ugly xml file, or both, or just to write a naive switch expression, or finally, to simply have a php model where each request maps to a file on the directory hierarchy (Oh, and of course, throw in a smattering of mod_rewrite rules as well!).

The directory model actually turns out to be the most interesting. This is because, each step into the tree we take, we narrow the possibilities of each further step. So we’re actually traversing a very tiny grammar. And if we match a file, then the path is a well-formed statement in that grammar. This brings us to a core principle of hvac — do you see it?

Controllers/dispatchers are parsers

We can explicitly write our controller function now as something which parses a request, and whose result is an http response. Or, to generalize back out, whose result is a function on the execution environment as a whole, yielding an http response.

Naturally, we want a backtracking parser, because we want to allow the end-user to construct an arbitrary grammar rather than a tightly restricted one. But not everything needs to backtrack. We “tokenize” our url into /-separated strings and consume and backtrack on those. But, GET params, etc? We don’t need to consume them, just check their existence.

And the really nice thing about Haskell here is that it comes (in the form of the Alternative class in Control.Applicative) with a built-in almost-dsl for writing neat little monadic parsers. Rather than write plain old functions, we can write combinator functions and so wrap the parser functionality up in something that reads like what it does. So a small snippet of hvac code from a controller might look like this:

 h |/ "board" |\ \(boardId::Int) ->
     (h |/ "new" >>
         h |// "POST" *> ... {- POST stuff -}
	  <|> ... {- GET stuff -} )
      <|> ... {-OTHER stuff-}

The hs are null parsers necessary to set off a chain of combinators. This code says: if the next part of the path is “board” and the part after that can be parsed to an Int, then if the part after that is “new” and its a POST request, do one set of things. But if its a GET request, do another. And if the part after the boardId *isn’t* “new” then go do some other stuff instead.

So we see already how treating our controller as a parser both lets us cut down on code duplication (we’re only getting boardId once, for all paths that have it), lets us embed fancy logic very compactly, and most importantly, guarantees that, in this case, we’ll have something named boardId in scope, and of the right type.

There are a few more dsl combinators in hvac, but they all operate essentially the same. So now, this gets us partway there. However, it hasn’t addressed get/post params, cookies, sessions, databases, etc. at all yet (although there are combinators to match simple equality on get/post params too). Let’s add just get/post params for the next level of complexity. Now get/post params are used slightly differently, because they’re based generally on user input. So our criteria for them need to be different — there are optional fields that may not be filled in, “valid” values might not meet other criteria (too short, invalid dates, not a genuine email address etc.) and soforth. Furthermore we don’t generally want bad values to prevent matching a page, rather we want an error page that highlights *all* the errors. So we know we need another layer for this.

Generally, web frameworks will hold another validation layer that purely performs validation. ASP bakes this in rather obscurely. PHP generally provides no particular solution. Python, as I understand it uses newForms and bakes validation into a forms layer in general. Ruby does it through activeRecord (which begs the question, how to validate data that isn’t destined for an activeRecord?). Java frameworks, unsurprisingly, will tend to use XML (Or nowadays, some annotation monstrosity too, I imagine, as tied to, e.g., Hibernate).

These styles of validation can lead to lost information at times — e.g., even once a validator passes you still may need subsequent manual casting or parsing. Furthermore, they tend to mix layers in that “valid” for the database (i.e. well-typed) should well be different than “valid” in a more narrow sense (i.e. a proper email address). And then an “invalid” userid depends on whether that userid is in the database as well as if that userid has enough of only the allowed characters, etc. You can guess where I’m going, right?

Validation is parsing (in context)

We can place all our constraints on incoming data, as well as instructions on how to parse it into the data type we want (which might even mean, e.g., turning a user id into a full-fledged hunk of user data as built from a database query). Here’s an example validation function that does just that (positing that mkUser takes the returned map and shoves it into a user object).

validUser :: ValidationFunc s String User
validUser s = do
  u <- selectRow "* from users where name=?"
  if M.null u
    then fail $ "No such user: " ++ s
    else return mkUser u

The type signature here says that this function operates on any session type (which we’ll ignore for now) and takes a String and returns a User (or, of course, an error). One neat thing worth mentioning here, even though I haven’t touched on databases and application state yet, is that the type of ValidationFunc allows us to read things such as databases, but *not* to write to them, since it would lead to all sorts of trouble if validators started preforming unrestricted side-effects.

We might invoke the validator (and a few others, just to show how nifty it is to chain them) like this:

withValidation
  (("user", trim >=> lengthAtLeast 2 >=> userValid),
   ("pass", lengthAtLeast 3))
  (\ (user, pass) -> ... )

Here, we see that the action inside of the ellipsis is performed only if the validation and parsing succeeds, and again, when it is performed, it is guaranteed that the values it recieves are in the data type we expect. (If the validation fails, the enclosing “parse” fails the controller takes over again, but first the specific validation failures are recorded elsewhere for handy reference).

That’s enough for one post, I think, and I haven’t even dealt with templates yet, much less scope and mutable state.

7 Comments »

  1. Justin Bailey said

    I like the idea of parsing controlling what code gets run. Slick. The discussion on scopes was also intriguing and I’d like to see where you go with that. What are some drawbacks you’ve found when trying to work this way (that is, where current scope is very well defined).

    Just a comment on readability. I see |\, /| and |//. Do you have equivalents that are named? I don’t find your operators terribly readable. They probably make a lot of sense once you’ve used the framework awhile, but consider adding some aliases to them for us newbs.

    Finally, how do you deal with validation failures?

  2. sclv said

    The withValidation function stashes errors away elsewhere in the request scope. Then it “falls though” to the next possible action, just as a failed parse would. So a pattern that works pretty well I’ve found for certain types of forms is to manage them as postbacks, where on failure of a POST they fall through to the GET handler, which in turn renders the errors as part of the form page. On success you either display a new page, or redirect.

    There’s also a handleValidation function that rather than falling through takes an explicit “failure” continuation that’s a function of the errors.

    I don’t have named variants of the operators at the moment, but rather than an alternative more traditional “parser-like” api, it might be easier just to provide those. That’s a nice idea.

    ie. |/ = `andPath`, |// = `withMethod`, |\ = `withUrlParam`.

    On the other hand, I tried to give things names that were obvious mnemonics, but maybe the associations just work for me. The forward slash is as with a path, the two slashes are as with those following a protocol, the backslash is as with a lambda, etc.

  3. gwern said

    sclv, I’ve got to agree with Justin. The code may be elegant, but my eyes are hurting. I can’t immediately figure out what’s being escaped (if at all), what’s adding a lambda binding, what’s combinator A, what’s combinator B…

    I mean, just ow in general, y’know?

  4. sclv said

    sigh. dbpatterson encouraged me not to, but on to Network.Frameworks.HVAC.AltController it is…

  5. sclv said

    And now checked in at the darcs repo (http://code.haskell.org/~sclv/hvac/) we have such an api. Pleasantly for me, since its equivalent and simpler, it cleans up the code a bit. Import Network.Frameworks.HVAC.AltController and you get: path, meth, param, takePath, readPath, and endPath. The above code that gets a “board” from the path then a boardId, then “new” then checks that it is a post can be written (in a do block):

      path “board”
      boardId <- readPath
      newPost boardId <|> showBoard boardId

    newPost boardId = do
      path “new”
      (meth “POST” >> {- POST stuff-}) <|> {- GET stuff -}

    showBoard boardId = do {- other stuff -}

    Wordier and more irritating in my opinion, although it makes the parser connection clear.

    This can be condensed into e.g.:

    path “board” >> readPath >>= \boardId ->
      (path “new” >>
        (meth “POST” >> {- POST stuff -}) <|> {- GET stuff -})
       <|>
        {- other stuff -}

    which then looks more obviously like a wordier version of the code above, so the nature of how its condensed becomes more clear. All the ugly parens also make it clear why one wants to mix and match *> as well, which is just >> with a precedence that’s higher than <|>.

  6. After some fiddling with hvac + lighttpd, I came to the conclusion that the lighttpd-packages in Debian testing are *broken*. (Maybe also in Ubuntu). The latest version on 1.4.20 works, however. My config, for the example, is as follows:

    server.document-root = “/var/www”
    server.modules = ( “mod_fastcgi” )
    server.port = 3000

    fastcgi.debug = 1
    fastcgi.server = ( “/test” =>
    (
    (
    “socket” => “/tmp/test.sock”,
    “bin-path” => “/path/to/hvac-board”,
    “min-procs” => 1,
    “max-procs” => 1,
    “check-local” => “disable”
    )
    )
    )

  7. Art Silver said

    I’m looking forward to more about templates.

RSS feed for comments on this post · TrackBack URI

Leave a reply to Justin Bailey Cancel reply