Archive for February, 2012

Monoidal instances for pipes

February 4, 2012

In this post, I’m going to introduce a new class of combinators for pipes, with an interesting categorical interpretation. I will be using the pipe implementation of my previous post.

> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> module Blog.Pipes.MonoidalInstances where
> 
> import Blog.Pipes.Guarded hiding (groupBy)
> import qualified Control.Arrow as A
> import Control.Category
> import Control.Categorical.Bifunctor
> import Control.Category.Associative
> import Control.Category.Braided
> import Control.Category.Monoidal
> import Control.Monad (forever)
> import Control.Monad.Free
> import Data.Maybe
> import Data.Void
> import Prelude hiding ((.), id, filter, until)

When pipes were first released, some people noticed the lack of an Arrow instance. In fact, it is not hard to show that, even identifying pipes modulo some sort of observational equality, there is no Arrow instance that satisfies the arrow laws.

The problem, of course, is with first, because we already have a simple implementation of arr. If we try to implement first we immediately discover that there’s a problem with the Yield case:

first (Yield x c) = yield (x, ???) >> first c

Since ??? can be of any type, the only possible value is bottom, which of course we don’t want to introduce. Alternative definitions of first that alter the structure of a yielding pipe are not possible if we want to satisfy the law:

first p >+> pipe fst == pipe fst >+> p

Concretely, the problem is that the cartesian product in the type of first forces a sort of "synchronization point" that doesn’t necessarily exist. This is better understood if we look at the type of (***), of which first can be thought of as a special case:

(***) :: Arrow k => k a b -> k a' b' -> k (a, a') (b, b')

first = (*** id)

If the two input pipes yield at different times, there is no way to faithfully match their yielded values into a pair. There are hacks around that, but they don’t behave well compositionally, and exhibit either arbitrarily large space leaks or data loss.

This has been addressed before: stream processors, like those of the Fudgets library, being very similar to Pipes, have the same problem, and some resolutions have been proposed, although not entirely satisfactory.

Arrows as monoidal categories

It is well known within the Haskell community that Arrows correspond to so called Freyd categories, i.e. premonoidal categories with some extra structures.

Using the Monoidal class by Edward Kmett (now in the categories package on Hackage), we can try to make this idea precise.

Unfortunately, we have to use a newtype to avoid overlapping instances in the case of the Hask category:

> newtype ACat a b c = ACat { unACat :: a b c }
>   deriving (Category, A.Arrow)

First, cartesian products are a bifunctor in the category determined by an Arrow.

> instance A.Arrow a => PFunctor (,) (ACat a) (ACat a) where
>   first = ACat . A.first . unACat
> instance A.Arrow a => QFunctor (,) (ACat a) (ACat a) where
>   second = ACat . A.second . unACat
> instance A.Arrow a
>       => Bifunctor (,) (ACat a) (ACat a) (ACat a) where
>   bimap (ACat f) (ACat g) = ACat $ f A.*** g

Now we can say that products are associative, using the associativity of products in Hask:

> instance A.Arrow a => Associative (ACat a) (,) where
>   associate = ACat $ A.arr associate
> instance A.Arrow a => Disassociative (ACat a) (,) where
>   disassociate = ACat $ A.arr disassociate

Where we use the Disassociative instance to express the inverse of the associator. And finally, the Monoidal instance:

> type instance Id (ACat a) (,) = ()
> instance A.Arrow a => Monoidal (ACat a) (,) where
>   idl = ACat $ A.arr idl
>   idr = ACat $ A.arr idr
> instance A.Arrow a => Comonoidal (ACat a) (,) where
>   coidl = ACat $ A.arr coidl
>   coidr = ACat $ A.arr coidr

Where, again, the duals are actually inverses. Also, products are symmetric:

> instance A.Arrow a => Braided (ACat a) (,) where
>   braid = ACat $ A.arr braid
> instance A.Arrow a => Symmetric (ACat a) (,)

As you see, everything is trivially induced by the cartesian structure on Hask, since A.arr gives us an identity-on-objects functor. Note, however, that the Bifunctor instance is legitimate only if we assume a strong commutativity law for arrows:

first f >>> second g == second g >>> first f

which we will, for the sake of simplicity.

Replacing products with arbitrary monoidal structures

Once we express the Arrow concept in terms of monoidal categories, it is easy to generalize it to arbitrary monoidal structures on Hask.

In particular, coproducts work particularly well in the category of pipes:

> instance Monad m
>       => PFunctor Either (PipeC m r) (PipeC m r) where
>   first = PipeC . firstP . unPipeC
> 
> firstP :: Monad m => Pipe a b m r
>        -> Pipe (Either a c) (Either b c) m r
> firstP (Pure r) = return r
> firstP (Free (M m)) = lift m >>= firstP

Yielding a sum is now easy: just yield on the left component.

> firstP (Free (Yield x c)) = yield (Left x) >> firstP c

Awaiting is a little bit more involved, but still easy enough: receive left and null values normally, and act like an identity on the right.

> firstP (Free (Await k)) = go
>         where
>           go = tryAwait
>            >>= maybe (firstP $ k Nothing)
>                      (either (firstP . k . Just)
>                              (\x -> yield (Right x) >> go))

And of course we have an analogous instance on the right:

> instance Monad m
>       => QFunctor Either (PipeC m r) (PipeC m r) where
>   second = PipeC . secondP . unPipeC
> 
> secondP :: Monad m => Pipe a b m r
>         -> Pipe (Either c a) (Either c b) m r
> secondP (Pure r) = return r
> secondP (Free (M m)) = lift m >>= secondP
> secondP (Free (Yield x c)) = yield (Right x) >> secondP c
> secondP (Free (Await k)) = go
>         where
>           go = tryAwait
>            >>= maybe (secondP $ k Nothing)
>                      (either (\x -> yield (Left x) >> go)
>                              (secondP . k . Just))

And a bifunctor instance obtained by composing first and second in arbitrary order:

> instance Monad m
>       => Bifunctor Either (PipeC m r)
>                    (PipeC m r) (PipeC m r) where
>   bimap f g = first f >>> second g

At this point we can go ahead and define the remaining instances in terms of the identity-on-objects functor given by pipe:

> instance Monad m => Associative (PipeC m r) Either where
>   associate = PipeC $ pipe associate
> instance Monad m => Disassociative (PipeC m r) Either where
>   disassociate = PipeC $ pipe disassociate
> 
> type instance Id (PipeC m r) Either = Void
> instance Monad m => Monoidal (PipeC m r) Either where
>   idl = PipeC $ pipe idl
>   idr = PipeC $ pipe idr
> instance Monad m => Comonoidal (PipeC m r) Either where
>   coidl = PipeC $ pipe coidl
>   coidr = PipeC $ pipe coidr
> 
> instance Monad m => Braided (PipeC m r) Either where
>   braid = PipeC $ pipe braid
> instance Monad m => Symmetric (PipeC m r) Either

Multiplicative structures

There is still a little bit of extra structure that we might want to exploit. Since PipeC m r is a monoidal category, it induces a (pointwise) monoidal structure on its endofunctor category, so we can speak of monoid objects there. In particular, if the identity functor is a monoid, it means that we can define a "uniform" monoid structure for all the objects of our category, given in terms of natural transformations (i.e. polymorphic functions).

We can represent this specialized monoid structure with a type class (using kind polymorphism and appropriately generalized category-related type classes, it should be possible to unify this class with Monoid and even Monad, similarly to how it’s done here):

> class Monoidal k p => Multiplicative k p where
>   unit :: k (Id k p) a
>   mult :: k (p a a) a

Dually, we can have a sort of uniform coalgebra:

> class Comonoidal k p => Comultiplicative k p where
>   counit :: k a (Id k p)
>   comult :: k a (p a a)

The laws for those type classes are just the usual laws for a monoid in a (not necessarily strict) monoidal category:

first unit . mult == idl
second unit . mult == idr
mult . first mult == mult . second mult . associate

first counit . comult == coidl
second counit . comult == coidr
first diag . diag == disassociate . second diag . diag

Now, products have a comultiplicative structure on Hask (as in every category with finite products), given by the terminal object and diagonal natural transformation:

> instance Comultiplicative (->) (,) where
>   counit = const ()
>   comult x = (x, x)

while coproducts have a multiplicative structure:

> instance Multiplicative (->) Either where
>   unit = absurd
>   mult = either id id

that we can readily transport to PipeC m r using pipe:

> instance Monad m => Multiplicative (PipeC m r) Either where
>   unit = PipeC $ pipe absurd
>   mult = PipeC $ pipe mult

Somewhat surprisingly, pipes also have a comultiplicative structure of their own:

> instance Monad m => Comultiplicative (PipeC m r) Either where
>   counit = PipeC discard
>   comult = PipeC . forever $ do
>     x <- await
>     yield (Left x)
>     yield (Right x)

Heterogeneous metaprogramming

All the combinators we defined can actually be used in practice, and the division in type classes certainly sheds some light on their structure and properties, but there’s actually something deeper going on here.

The fact that the standard Arrow class uses (,) as monoidal structure is not coincidental: Hask is a cartesian closed category, so to embed Haskell’s simply typed λ-calculus into some other category structure, we need at the very least a way to transport cartesian products, i.e. a premonoidal functor [1].

However, as long as our monoidal structure is comultiplicative and symmetric, we can always recover a first-order fragment of λ-calculus inside the "guest" category, and we don’t even need an identity-on-objects functor [2].

The idea is that we can use the monoidal structure of the guest category to represent contexts, where weakening is given by counit, contraction by comult, and exchange by swap.

There is an experimental GHC branch with a preprocessor which is able to translate expressions written in an arbitrary guest language into Haskell, given instances of appropriate type classes , which correspond exactly to the ones we have defined above.

Examples

This exposition was pretty abstract, so we end with some examples.

We first need to define a few wrappers for our monoidal combinators, so we don’t have to deal with the PipeC newtype:

> split :: Monad m => Pipe a (Either a a) m r
> split = unPipeC comult
> 
> join :: Monad m => Pipe (Either a a) a m r
> join = unPipeC mult
> 
> (*+*) :: Monad m => Pipe a b m r -> Pipe a' b' m r
>       -> Pipe (Either a a') (Either b b') m r
> f *+* g = unPipeC $ bimap (PipeC f) (PipeC g)
> 
> discardL :: Monad m => Pipe (Either Void a) a m r
> discardL = unPipeC idl
> 
> discardR :: Monad m => Pipe (Either a Void) a m r
> discardR = unPipeC idr

Now let’s write a tee combinator, similar to the tee command for shell pipes:

> tee :: Monad m => Pipe a Void m r -> Pipe a a m r
> tee p = split >+> firstP p >+> discardL
> 
> printer :: Show a => Pipe a Void IO r
> printer = forever $ await >>= lift . print
> 
> ex6 :: IO ()
> ex6 = do
>   (sourceList [1..5] >+>
>     tee printer >+>
>     (fold (+) 0 >>= yield) $$
>     printer)
>   return ()
> {- ex6 == mapM_ print [1,2,3,4,5,15] -}

Another interesting exercise is reimplementing the groupBy combinator of the previous post:

> groupBy :: Monad m => (a -> a -> Bool) -> Pipe a [a] m r
> groupBy p =
>    -- split the stream in two
>    split >+>
> 
>    -- yield Nothing whenever (not (p x y))
>    -- for consecutive x y
>   ((consec >+>
>     filter (not . uncurry p) >+>
>     pipe (const Nothing)) *+*
>   
>   -- at the same time, let everything pass through
>   pipe Just) >+>
> 
>   -- now rejoin the two streams
>   join >+>
>   
>   -- then accumulate results until a Nothing is hit
>   forever (until isNothing >+>
>            pipe fromJust >+>
>            (consume >>= yield))
> 
> -- yield consecutive pairs of values
> consec :: Monad m => Pipe a (a, a) m r
> consec = await >>= go
>   where
>     go x = await >>= \y -> yield (x, y) >> go y
> 
> ex7 :: IO ()
> ex7 = do (sourceList [1,1,2,2,2,3,4,4]
>           >+> groupBy (==)
>           >+> pipe head
>            $$ printer)
>          return ()
> {- ex7 == mapM_ print [1,2,3,4] -}

References

[1] J. Power and E. Robinson, “Premonoidal categories and notions of computation,” Mathematical. Structures in Comp. Sci., vols. 7, pp. 453-468, 1997.

[2] A. Megacz, “Multi-Level Languages are Generalized Arrows,” arXiv:1007.2885, 2010.

An introduction to guarded pipes

February 2, 2012

Pipes are a very simple but powerful abstraction which can be used to implement stream-based IO, in a very similar fashion to iteratees and friends, or conduits. In this post, I introduce guarded pipes: a slight generalization of pipes which makes it possible to implement a larger class of combinators.

> {-# LANGUAGE NoMonomorphismRestriction #-}
> module Blog.Pipes.Guarded where
> 
> import Control.Category
> import Control.Monad.Free
> import Control.Monad.Identity
> import Data.Maybe
> import Data.Void
> import Prelude hiding (id, (.), until, filter)

The idea behind pipes is straightfoward: fix a base monad m, then construct the free monad over a specific PipeF functor:

> data PipeF a b m x = M (m x)
>                    | Yield b x
>                    | Await (Maybe a -> x)
> 
> instance Monad m => Functor (PipeF a b m) where
>   fmap f (M m) = M $ liftM f m
>   fmap f (Yield x c) = Yield x (f c)
>   fmap f (Await k) = Await (f . k)
> 
> type Pipe a b m r = Free (PipeF a b m) r

Generally speaking, a free monad can be thought of as an embedded language in CPS style: every summand of the base functor (PipeF in this case), is a primitive operation, while the x parameter represents the continuation at each step.

In the case of pipes, M corresponds to an effect in the base monad, Yield produces an output value, and Await blocks until it receives an input value, then passes it to its continuation. You can see that the Await continuation takes a Maybe a type: this is the only thing that distinguishes guarded pipes from regular pipes (as implemented in the pipes package on Hackage). The idea is that Await will receive Nothing whenever the pipe runs out of input values. That will give it a chance to do some cleanup or yield extra outputs. Any additional Await after that point will terminate the pipe immediately.

We can write a simplistic list-based (strict) interpreter formalizing the semantics I just described:

> evalPipe :: Monad m => Pipe a b m r -> [a] -> m [b]
> evalPipe p xs = go False xs [] p

The boolean parameter is going to be set to True as soon as we execute an Await with an empty input list.

A Pure value means that the pipe has terminated spontaneously, so we return the accumulated output list:

>   where
>     go _ _ ys (Pure r) = return (reverse ys)

Execute inner monadic effects:

>     go t xs ys (Free (M m)) = m >>= go t xs ys

Save yielded values into the accumulator:

>     go t xs ys (Free (Yield y c)) = go t xs (y : ys) c

If we still have values in the input list, feed one to the continuation of an Await statement.

>     go t (x:xs) ys (Free (Await k)) = go t xs ys $ k (Just x)

If we run out of inputs, pass Nothing to the Await continuation…

>     go False [] ys (Free (Await k)) = go True [] ys (k Nothing)

… but only the first time. If the pipe awaits again, terminate it.

>     go True [] ys (Free (Await _)) = return (reverse ys)

To simplify the implementation of actual pipes, we define the following basic combinators:

> tryAwait :: Monad m => Pipe a b m (Maybe a)
> tryAwait = wrap $ Await return
> 
> yield :: Monad m => b -> Pipe a b m ()
> yield x = wrap $ Yield x (return ())
> 
> lift :: Monad m => m r -> Pipe a b m r
> lift = wrap . M . liftM return

and a couple of secondary combinators, very useful in practice. First, a pipe that consumes all input and never produces output:

> discard :: Monad m => Pipe a b m r
> discard = forever tryAwait

then a simplified await primitive, that dies as soon as we stop feeding values to it.

> await :: Monad m => Pipe a b m a
> await = tryAwait >>= maybe discard return

now we can write a very simple pipe that sums consecutive pairs of numbers:

> sumPairs :: (Monad m, Num a) => Pipe a a m ()
> sumPairs = forever $ do
>   x <- await
>   y <- await
>   yield $ x + y

we get:

> ex1 :: [Int]
> ex1 = runIdentity $ evalPipe sumPairs [1,2,3,4]
> {- ex1 == [3, 7] -}

Composing pipes

The usefulness of pipes, however, is not limited to being able to express list transformations as monadic computations using the await and yield primitives. In fact, it turns out that two pipes can be composed sequentially to create a new pipe.

> infixl 9 >+>
> (>+>) :: Monad m => Pipe a b m r -> Pipe b c m r -> Pipe a c m r
> (>+>) = go False False
>   where

When implementing evalPipe, we needed a boolean parameter to signal upstream input exhaustion. This time, we need two boolean parameters, one for the input of the upstream pipe, and one for its output, i.e. the input of the downstream pipe. First, if downstream does anything other than waiting, we just let the composite pipe execute the same action:

>     go _ _ p1 (Pure r) = return r
>     go t1 t2 p1 (Free (Yield x c)) = yield x >> go t1 t2 p1 c
>     go t1 t2 p1 (Free (M m)) = lift m >>= \p2 -> go t1 t2 p1 p2

then, if upstream is yielding and downstream is waiting, we can feed the yielded value to the downstream pipe and continue from there:

>     go t1 t2 (Free (Yield x c)) (Free (Await k)) =
>       go t1 t2 c $ k (Just x)

if downstream is waiting and upstream is running a monadic computation, just let upstream run and keep downstream waiting:

>     go t1 t2 (Free (M m)) p2@(Free (Await _)) =
>       lift m >>= \p1 -> go t1 t2 p1 p2

if upstream terminates while downstream is waiting, finalize downstream:

>     go t1 False p1@(Pure _) (Free (Await k)) =
>       go t1 True p1 (k Nothing)

but if downstream awaits again, terminate the whole composite pipe:

>     go _ True (Pure r) (Free (Await _)) = return r

now, if both pipes are waiting, we keep the second pipe waiting and we feed whatever input we get to the first pipe. If the input is Nothing, we set the first boolean flag, so that next time the first pipe awaits, we can finalize the downstream pipe.

>     go False t2 (Free (Await k)) p2@(Free (Await _)) =
>       tryAwait >>= \x -> go (isNothing x) t2 (k x) p2
>     go True False p1@(Free (Await _)) (Free (Await k)) =
>       go True True p1 (k Nothing)
>     go True True p1@(Free (Await _)) p2@(Free (Await _)) =
>       tryAwait >>= \_ -> {- unreachable -} go True True p1 p2

This composition can be shown to be associative (in a rather strong sense), with identity given by:

> idP :: Monad m => Pipe a a m r
> idP = forever $ await >>= yield

So we can define a Category instance:

> newtype PipeC m r a b = PipeC { unPipeC :: Pipe a b m r }
> 
> instance Monad m => Category (PipeC m r) where
>   id = PipeC idP
>   (PipeC p2) . (PipeC p1) = PipeC $ p1 >+> p2

Running pipes

A runnable pipe, also called Pipeline, is a pipe that doesn’t yield any value and doesn’t wait for any input. We can formalize this in the types as follows:

> type Pipeline m r = Pipe () Void m r

Disregarding bottom, calling await on such a pipe does not return any useful value, and yielding is impossible. Another way to think of Pipeline is as an arrow (in PipeC) from the terminal object to the initial object of Hask1.

Running a pipeline is straightforward:

> runPipe :: Monad m => Pipeline m r -> m r
> runPipe (Pure r) = return r
> runPipe (Free (M m)) = m >>= runPipe
> runPipe (Free (Await k)) = runPipe $ k (Just ())
> runPipe (Free (Yield x c)) = absurd x

where the impossibility of the last case is guaranteed by the types, unless of course the pipe introduced a bottom value at some point.

The three primitive operations tryAwait, yield and lift, together with pipe composition and the runPipe function above, are basically all we need to define most pipes and pipe combinators. For example, the simple pipe interpreter evalPipe can be easily rewritten in terms of these primitives:

> evalPipe' :: Monad m => Pipe a b m r -> [a] -> m [b]
> evalPipe' p xs = runPipe $
>   (mapM_ yield xs >> return []) >+>
>   (p >> discard) >+>
>   collect id
>   where
>     collect xs =
>       tryAwait >>= maybe (return $ xs [])
>                          (\x -> collect (xs . (x:)))

Note that we use the discard pipe to turn the original pipe into an infinite one, so that the final return value will be taken from the final pipe.

Extra combinators

The rich structure on pipes (category and monad) makes it really easy to define new higher-level combinators. For example, here are implementations of some of the combinators in Data.Conduit.List, translated to pipes:

> sourceList = mapM_ yield
> sourceNull = return ()
> fold f z = go z
>   where
>     go x = tryAwait >>= maybe (return x) (go . f x)
> consume = fold (\xs x -> xs . (x:)) id >>= \xs -> return (xs [])
> sinkNull = discard
> take n = (isolate n >> return []) >+> consume
> drop n = replicateM n await >> idP
> pipe f = forever $ await >>= yield . f -- called map in conduit
> concatMap f = forever $ await >>= mapM_ yield . f
> until p = go
>   where
>     go = await >>= \x -> if p x then return () else yield x >> go
> groupBy (~=) = p >+>
>   forever (until isNothing >+>
>            pipe fromJust >+>
>            (consume >>= yield))
>   where 
>     -- the pipe p yields Nothing whenever the current item y
>     -- and the previous one x do not satisfy x ~= y, and behaves
>     -- like idP otherwise
>     p = await >>= \x -> yield (Just x) >> go x
>     go x = do
>       y <- await
>       unless (x ~= y) $ yield Nothing
>       yield $ Just y
>       go y
> isolate n = replicateM_ n $ await >>= yield
> filter p = forever $ until (not . p)

To work with the equivalent of sinks, it is useful to define a source to sink composition operator:

> infixr 2 $$
> ($$) :: Monad m => Pipe () a m r' -> Pipe a Void m r -> m (Maybe r)
> p1 $$ p2 = runPipe $ (p1 >> return Nothing) >+> liftM Just p2

which ignores the source return type, and just returns the sink return value, or Nothing if the source happens to terminate first. So we have, for example:

> ex2 :: Maybe [Int]
> ex2 = runIdentity $ sourceList [1..10] >+> isolate 4 $$ consume
> {- ex2 == Just [1,2,3,4] -}
> 
> ex3 :: Maybe [Int]
> ex3 = runIdentity $ sourceList [1..10] $$ discard
> {- ex3 == Nothing -}
> 
> ex4 :: Maybe Int
> ex4 = runIdentity $ sourceList [1,1,2,2,2,3,4,4]
>                 >+> groupBy (==)
>                 >+> pipe head
>                  $$ fold (+) 0
> {- ex4 == Just 10 -}
> 
> ex5 :: Maybe [Int]
> ex5 = runIdentity $ sourceList [1..10]
>                 >+> filter (\x -> x `mod` 3 == 0)
>                  $$ consume
> {- ex5 == Just [3, 6, 9] -}

Pipes in practice

You can find an implementation of guarded pipes in my fork of pipes. There is also a pipes-extra repository where you can find some pipes to deal with chunked ByteStreams and utilities to convert conduits to pipes.

I hope to be able to merge this into the original pipes package once the guarded pipe concept has proven its worth. Without the tryAwait primitive, combinators like fold and consume cannot be implemented, nor even a simple stateful pipe like one to split a chunked input into lines. So I think there are enough benefits to justify a little extra complexity in the definition of composition.


  1. In reality, Hask doesn’t have an initial object, and the terminal object is actually Void, because of non-strict semantics.