Archive for the ‘c++’ Category

From event-driven programming to FRP

January 10, 2012

The problem

Most of modern programming is based on events. Event-driven frameworks are the proven and true abstraction to express any kind of asynchronous and interactive behavior, like in GUIs or client-server architectures.

The core idea is inversion of control: the main loop is run by the framework, users only have to register some form of “callbacks”, and the framework will take care of calling them at the appropriate times.

This solves many issues that a straightforward imperative/procedural approach would present, eliminates the need for any kind of polling, and creates all sorts of opportunities for general-purpose optimizations inside the framework, with no impact on the complexity of user code. All of this without introducing any concurrency.

There are drawbacks, however. Event-driven code is hideous to write in most languages, especially those lacking support for first class closures. More importantly, event-driven code is extremely hard to reason about. The very nature of this callback-based approach makes it impossible to use a functional style, and even the simplest of interactions requires some form of mutable state which has to be maintained across callback calls.

As a very simple example, suppose we want to perform a GET request and retrieve some data, handling any HTTP error that might occur. In a generic event-driven frameworkm, we would need to implement a simple state machine whose graph will look somewhat like this:

A simple state machine

Each state (except the initial one) corresponds to a callback. The transitions are determined by the framework. To avoid starting more than one request at a time, we will need to explicitly keep track of the current state.

Now let’s try to make a simple change to our program: suppose we want to retry requests when they fail, but not more than once. Now the state machine becomes more complicated, since we need to add extra nodes for the non-fatal error condition.

A slightly more complicated state machine

In our hypotetical event-driven code, we need to keep track of whether we already encountered an error, check this flag at each callback to perform the right action, and update it appropriately. Moreover, this time the code isn’t even shaped exactly like the state machine, because we reuse the same callback for multiple nodes. To test our code exhaustively, we need to trace every possible path through the graph and reproduce it.

Now assume we want to allow simultaneous requests… you get the idea. The code gets unwieldy pretty fast. Small changes in requirements have devastating consequences in terms of the state graph. In practice, what happens most of the times is that the state graph is kept implicit, which makes the code impossible to test reliably, and consequently impossible to modify.

Towards a solution

A very simple but effective solution can be found by observing that state graphs like those of the previous examples have a very clear operational interpretation in the context of the equivalent synchronous code.

A single forward transition from A to B can be simply modelled as the sequence A;B, i.e. execute A, then execute B. Extra outward transitions from a single node can be mapped to exceptions, while backward arrows can be thought of as looping constructs.

Our second state machine can then be translated to the following pseudopython:

while True:
    for i in xrange(2):
        error = None
        try:
            reply = start_request()
            data = get_data(reply)
            break
        except Exception as e:
            error = get_error(e)
    if error:
        handle_error(error)
    else:
        display_data(data)

This code is straightforward. It could be made cleaner by splitting it up in a couple of extra functions and removing local state, but that’s beside the point. Note how easy it is now to generalize to an arbitrary number of retries.

So the key observation is that we can transform asynchronous code into synchronous-looking code, provided that we attach the correct semantics to sequencing of operations, exceptions and loops.

Now the question becomes: is it possible to do so?

We could turn functions like start_request and get_data into blocking operations that can throw. This will work locally, but it will break asynchronicity, so it’s not an option.

One way to salvage this transformation is to run the code in its own thread. Asynchronous operations will block, but won’t hang the main loop, and the rest of the program will continue execution.

However, we need to be careful with the kind of threads that we use. Since we don’t need (and don’t want!) to run multiple threads simultaneously, but we need to spawn a thread for each asynchronous operation, we have to make sure that the overhead is minimal, context switching is fast, and we’re not paying the cost of scheduling and synchronization.

Here you can find a sketched solution along these lines that I wrote in python. It’s based on the greenlet library, which provides cooperative multithreading.

In the next post I will talk about alternative solutions, as well as how to extend the idea further, and make event-driven code more declarative and less procedural.

Advertisements

Dynamic types and virtual inner types

April 5, 2007

The concept based polymorphism I tried to explain in my previous post can be more conveniently expressed with another syntax which avoids concepts at all.

The key observation is that runtime polymorphism and F-bounded polymorphism are very similar in nature, and, to some extent, the latter can be implemented using the former. If I have understood correctly, that is exactly how the Java virtual machine implements generics.

For example, suppose you have an abstract class Shape with a virtual function draw, and several concrete classes like Square, Circle, Triangle, which override draw.
Client code using runtime polymorphism would look like:

void render(Shape* shape)
{
  shape->draw();
}

while using generic programming one could write:

template <typename S>
where Inherits<S, Shape>
void render(S* s)
{
  s->draw();
}

assuming that the concept Inherits<T, U> is verified when T inherits from U (possibly indirectly). An approximation for Inherits is

auto concept Inherits<typename T, typename U>
{
  // T can be implicitly converted to U
  T::operator U();
}

Using the generic code in such situations should probably be considered a mistake, because if one wants to use template based polymorphism, defining the abstract class Shape is pointless. It would be better to directly define a ShapeConcept having a member function draw.

However, the example suggests that a syntax resembling templates could be used to express runtime polymorphism:

template <Shape! S>
void render(S* s)
{
  s->draw();
}

the idea being that S is the runtime type of s. The compiler could instantiate the template immediately and implement the body just like a Shape pointer were used instead of S. Here S is not really a type, since for example expressions like

new S

should be rejected; let’s call things like S dynamic types. They behave much like incomplete types (cannot be instantiated directy, sizeof cannot be called on them), but are a different beast: for the purpose of typechecking they are just subtypes of their parent concept (Shape, in this example), but for code generation they are considered equivalent to it.

Why could all of this syntax be possibly useful? Well, when using ordinary abstract classes, it does not really add anything to the language, but it shows its power when another (more than syntactical) extension is introduced: virtual inner types.

Just like the name suggests, virtual inner types are inner types which behave polymorphically. The typical (and motivating) example here is still that of the abstract factory. So suppose you have the following abstract factory:

class AbstractAnimal;
class AbstractFood;

class AbstractFactory
{
public:
  virtual AbstractAnimal* createAnimal() = 0;
  virtual AbstractFood* createFood() = 0;
};

and the following abstract products:

class AbstractAnimal
{
public:
  virtual void eat(Food*) = 0;
};

class AbstractFood
{
public:
  virtual int quality() const = 0;
};

As explained in my previous posts [1] [2], there’s no way to explain to the compiler that the Food that an animal is able to eat is only that created by the same factory which created that very animal.
Virtual inner types could be used to solve the problem:

class AbstractAnimal;
class AbstractFood;

class AbstractFactory
{
public:
  virtual typedef AbstractAnimal Animal;
  virtual typedef AbstractFood Food;
  virtual Animal* createAnimal() = 0;
  virtual Food* createFood() = 0;
};

class AbstractAnimal
{
public:
  virtual typedef AbstractFactory Factory;
  virtual void eat(Factory::Food* food) = 0;
};

class AbstractFood
{
public:
  virtual typedef AbstractFactory Factory; // unused, only here for completeness
  virtual int quality() const = 0;
};

The idea is that virtual typedef Base X forces inherited class to define an inner type X inheriting from Base. Furthermore, this type behaves polymorphically (the type itself, not only instances!). That means that if A is a dynamic type with parent AbstractAnimal, then A::Factory is the correct dynamic type with parent AbstractFactory.

To explain this fact in more detail, let’s continue the example:

class Cow;
class Grass;

class CowFactory : public AbstractFactory
{
public:
  typedef Cow Animal;
  typedef Grass Food;
  Cow* createAnimal();
  Grass* createFood();
};

class Cow : public Animal
{
public:
  typedef CowFactory Factory;
  void eat(Grass* grass);
};

class Grass : public Food
{
public:
  typedef CowFactory Factory;
  int quality() const;
};

notice that we have defined inner types corresponding to the virtual typedefs of parent classes. Besides, the argument to eat is Grass and not AbstractFood.
Client code, in fact, cannot simply pass an AbstractFood object to the eat member function. The following function should not compile:

void feed(AbstractAnimal* animal, AbstractFood* food)
{
  animal->eat(food);
}

but can be fixed with the use of dynamic types

template <AbstractFactory! Factory>
void feed(Factory::Animal* animal, Factory::Food* food)
{
  animal->eat(food);
}

expressing the fact that both animal and food come from the same factory of dynamic type Factory. The food parameter (actually stored in an AbstractFood variable), is statically downcasted to Grass before being passed to the eat member function. The static typechecking via dynamic types ensures that such a downcast is always safe, so there is no need to perform a dynamic cast.

These ideas for a language extension are essentially equivalent to the ones expressed in the previous post, but seem to require less drastic changes in compiler implementations. Furthermore, they don’t use concepts, so it could be even implemented on top of the existing gcc.

Covariance versus substitutivity

April 3, 2007

The covariant argument problem which I tried to explain in a previous post has its roots in the incompatibility between covariance and substitutivity in a static typed world.

Substitutivity is considered the most important feature in the design of an object oriented language, though it is often underrated or misunderstood by many users of the language. Basically, it states that subtyping is a “is-a” relation, i.e: any object of a derived class can be used everywhere an object of the base class in required, expressed with C++ terminology. In languages that follow this rule, derived classes may only widen the interface of their parent, since narrowing it could cause conflicts which are unpredictable at compile time. Narrowing the interface could be done in many ways, so they are all prohibited in languages like C++, C# or Java:

  • removing a virtual function of the base class;
  • contravariantly changing the return type of a virtual function (i.e. moving it up in the inheritance hierarchy);
  • covariantly changing the type of any argument of a virtual function (i.e. moving it down in the inhritance hierarchy).

The problem is that any of these possibilities where allowed, a piece of code that is able to operate on an object of the base class could “use a feature” that the derived class has removed (or incompatibly modified) and cause runtime failures.

For example, suppose that class Cow inherits Animal, where the latter has a virtual function eat which takes a Food object as an argument. It seems natural to override eat in Cow so that it takes a Grass (cows don’t eat any food), but if we do so, the following function

void f(Animal* a)
{
  Food* f = new Food;
  a->eat(f);
}

which is perfectly valid when a is of class Animal, fails when called with an instance of Cow, and this violates substitutivity.

The wikipedia article about overriding is a perfect example of how much confusion the substitutivity rule can cause. The example in the article uses runtime exceptions to narrow the interface of the base class (note that the language used there is dynamically typed, so there would be no reason to do so). Using exception works (even in static typed languages, of course) but has several drawbacks:

  • moves a part of typechecking to runtime, and that goes against the idea of having static typing in the first place;
  • clutters the code with typechecking statements;
  • feels a bit like a hack: it is essentially working around a limitation of the programming language.

Using exceptions for narrowing the interface of derived classes, the previous example could be worked out as follows:

class Cow : public Animal
{
  virtual void eat(Food* food)
  {
    if (Grass* grass = dynamic_cast<Grass*>(food)) {
      // method body
    }
    else {
      throw typechecking_exception(/* ... */);
    }
  }
};

I guess at this point some of you may still feel a bit unsure about why would anyone need to narrow the interface of derived classes.
Indeed, many times, the urge of doing so turns out to be a direct consequence of a flawed object oriented design. Look at the Person / Baby example from the wikipedia article, for instance: the error there is that Baby should not be modelled as a subclass of Person. It is probably better to have a (likely abstract) class HumanBeing, with Person and Baby being both derived classes. If Baby needs to reuse part of Person’s implementation, it’s probably better to consider using incapsulation (after all inheritance is not for code reuse).

However, there are situations (like the one I tried to describe here) where the need to narrow interfaces while inheriting is justifiable. These situations can be distinguished by the fact that one is willing to sacrifice substitutivity, because sometimes it even gets in the way, and it is not a valuable feature anymore, but an obstacle for a clean design.
In fact, it was clear from the previous examples that interface narrowing (called CAT in Eiffel terminology) is not an evil thing per se; it’s just that it can cause runtime failures when the substitutivity principle is applied.
For an example of a situation where a CAT is necessary, take the Abstract Factory pattern. For those who don’t have familiarity with this technique, it basically consists of an abstract factory class, which is able to create abstract products. Each abstract product is represented by a different abstract class that is the return type of a specialized (abstract) factory method of the abstract factory class. Concrete products must then be defined by inheritance; they are returned by the overridden factory methods of a concrete factory class. Several factory classes associated with different concrete products can be defined. The client only uses the abstract products and the abstract factory, and this allows for great generality in that the client is able to deal with a number of different product families, just by changing the particular concrete factory it uses (and this can be even done at runtime).

The fact that the client only accesses products through the particular concrete factory it happens to be using implies that it won’t be mixing concrete products belonging to different families (i.e. created with different concrete factories). There are some facts worth noticing here:

  • Substitutivity does not apply. Or better, it should not be applied. Applying it would mean that incompatible products could be mixed, probably resulting in undefined behaviour.
  • It is client code’s responsibility to use the correct factory. If more than one factory is used, respective products should not be intermixed.
  • If the abstract products (hence the concrete ones) are interdependent, an interface narrowing could be needed. To stick with the previous example, suppose that both Animal and Food are abstract products, Cow and Grass being corresponding concrete implementations. If an Animal takes a Food object as an argument to one of its methods, how can this method be overridden in the concrete class? The dilemma here is that (assuming a well behaved client) we know that the method is going to be called with a Grass argument, but we are forced to declare it as a general Food because we need to preserve the parent signature.

Mainstream static typed languages do not offer a clean solution to the problems outlined here. The template wrapper workaround used in KBoard is nice, but has several drawbacks as well (I’ll probably talk about it in more detail in another post). Obviously, such a solution could be simplified by direct language support, but the fact that typechecking has been partially moved to runtime makes it suboptimal anyway, so that it is not probably worth including it in the language, if it claims to be statically typed.
The main question here is: “it is possible to give up substitutivity in a hierarchy to achieve greater power on the customization of derived classes, while having all type checking done at compile time?”.

To see why the answer has to be yes, let us find a solution to the problem using generic polymorphism instead of virtual functions.
It turns out that, using generic programming facilities, the Abstract Factory pattern can be implemented without using an abstract factory class at all! Here is a sample realization of the previous scenario using C++ templates:

class Cow;
class Grass;

class CowFactory
{
public:
  typedef Cow Animal;
  typedef Grass Food;

  // factory methods are not needed! one can simply use "new CowFactory::Food"
}

class Cow
{
public:
  virtual void eat(Grass* g) { /* ... */ }
};

class Food
{
  // ...
};

As you can see, there are no abstract classes at all. We directly implement concrete products following an implicit interface. Client code could be structured like this:

template <typename Factory>
class Client
{
  void f()
  {
    typename Factory::Animal animal;
    typename Factory::Food food;

    animal.eat(food);
  };
};

This may look like a perfect solution: elegant, clean, efficient (notice that there are no virtual functions, and even dynamic allocations are avoided) and just as general and powerful as that based on runtime polymorphism, but unfortunately it has a fundamental drawback which makes it unfeasible in all practical circumstances: it requires the client code to have a Factory template argument.
This has a series of unfortunate consequences:

  • Client code has to be completely recompiled whenever you change the factory or the products.
  • There is a copy of the client code for each product family. If there are a lot of families, this leads to code bloat.
  • There’s no way to dynamically load a product family at runtime. A family cannot reside in a dynamic library. This is because genericity here depends on template instantiation, which can only happen at compile time.

Nevertheless, it is possible to elaborate on that idea to define a language extension which almost perfectly solves the problem. My idea is based on concepts, a new C++ feature which will be introduced with the new language standard C++0x.
To begin with, I’ll show you how the ConceptC++ language can be extended so that abstract classes can be modelled by concepts. Let’s stick to the Java terminology of calling interfaces those abstract classes with no data members and all pure virtual functions.
There is an obvious natural map from interfaces to concept, mapping for example a class like

class A
{
  virtual void f(int x) = 0;
  virtual Foo* g() const = 0;
};

to the concept

concept A<typename T>
{
  void T::f(int x);
  Foo* T::g() const;
};

Code using the interface A could be rewritten using a template parameter T constrained by the concept A, and the semantics would be preserved, as well as the typechecking, with all pros and cons of the generic abstract factory implementation outlined above. Indeed, that was a particular case of this transformation, where we have simply omitted the concepts involved.

Let’s now consider an implementation of the A interface

class X : public A
{
public:
  void f(int x) { cout << x << endl; }
  Foo* g() const { return new Foo; }
};

Is it possible to extend the language so that the above class compiles even when A is a concept? If the concept A has a single argument, no constructor or free function constraints, and no associated types, the answer is yes. The compiler needs to create an abstract class corresponding to the concept and have X inherit from it.

In this case we did not gain anything, but let’s broaden a little bit the set of admissible concepts. Define a concept dynamic if it has a single argument, no constructor or free function constraints, and all its associated types are constrained by dynamic concepts and never used by value (probably other restrictions are needed, but you get an idea). For example:

concept AbstractFoo<typename T>
{
  int T::foo() const;
};

concept A<typename T>
{
  AbstractFoo Foo = T::Foo;
  void T::test(Foo* x) const;
  Foo* T::create() const;
};

class MyFoo : public AbstractFoo
{
public:
  int value;
  MyFoo(int value) : value(value) { }
  int foo() const { return value; }
};

class X : public A
{
public:
  typedef MyFoo Foo;
  void test(MyFoo* f) const { f->value = 37; }
  MyFoo* create() const { return new MyFoo(42); }
};

Since AbstractFoo and A are dynamic concepts by the above definition, the compiler lets you treat them as abstract classes, and inherit from them.
Under the hood, the compiler creates the corresponding abstract classes and virtual tables, and behaves just like the concepts where abstract classes at all. If you are wondering what all this complication is for, just look at the argument of the test member function: it is an associated type, and as such, covariantly modified in the derived class. This could not be possible with normal inheritance.

To see how all of this can possibly work (and be statically checked), let’s write down some usage examples:

void evil()
{
  A* a = new X;
  AbstractFoo* foo = new MyFoo(15);
  a->test(foo);
}

void safe()
{
  A* a = new X;
  a->test(a->createFoo());
}

The compiler has no way to understand that the evil function doesn’t actually do anything evil, so it has to give a type checking error on the test call there.
However, the safe function does something interesting: it calls test with a parameter which is of the Foo inner type of a’s actual type. Therefore, the compiler can safely assume that no type mismatch is taking place, and accept the call.

But what if one wants to store the result of createFoo in some member of temporary variable? What is the type of this variable? Here we need to invent a bit of new syntax. I like the idea of being able to write something like the following:

A* a = new X;
conceptof(a)::Foo* foo = a->createFoo();
a->test(foo);

Everything can be checked at compile time, since we know from the concept specification that createFoo is going to return the Foo inner type of the actual type of a, and we can typecheck the second and third line without knowing this type exactly. The conceptof keywords extracts the dynamic concept associated to an object and returns a dynamic type.
Dynamic types can be used just like ordinary types: they can be typedef’d like this:

A* a = new X;
typedef A! conceptof(a) T;
T::Foo* foo = a->createFoo();
a->test(foo);

(note the exclamation mark), and used as template parameters:

template <A! T>
void f(T* a)
{
  T::Foo* foo = a->createFoo();

}

in which case the template is immediately instantiated.

For completeness’ sake, I’ll translate the previous example about cows and grass to dynamic concepts:

concept GenericFactory<typename Factory>
{
  GenericFood Animal = Factory::Animal;
  GenericAnimal Food = Factory::Food;

  Animal* Factory::createAnimal();
  Food* Factory::createFood();
};

concept GenericFood<typename Food>
{
  GenericFactory Factory = Food::Factory;
  int Food::quality();
};

concept GenericAnimal<typename Animal>
{
  GenericFactory Factory = Animal::Factory;
  void Animal::eat(GenericFactory::Food* f);
};

class CowFactory;

class Grass : public GenericFood!
{
public:
  typedef CowFactory Factory;
  int quality() { return 0; }
};

class Cow : public GenericAnimal!
{
public:
  typedef CowFactory Factory;
  void eat(Grass* grass) { }
};

class CowFactory : public GenericFactory!
{
public:
  typedef Cow Animal;
  typedef Grass Food;

  Cow* createAnimal() { return new Cow; }
  Grass* createFood() { return new Grass; }
};

class Client
{
public:
  void f(GenericFactory* factory) {
    typedef GenericFactory! conceptof(factory) Factory;
    Factory::Animal* animal = factory->createAnimal();
    Factory::Food* food = factory->createFood();
    animal->eat(food);
  }

  template <GenericFactory! Factory>
  void feed(Factory::Animal* animal,
            Factory::Food* food)
  {
    animal->eat(food);
  }
};

Of course, these are only a bunch of ideas, and I cannot be sure that this approach is feasible. I would love to hear the opinion of some C++ expert about this stuff.

A smart logger

March 11, 2007

At work I’ve been asked to design a C++ logger class to be used like this:

Logger logger;
logger << "The universe will collapse in " << remaining_time << " seconds";

This seams like a very easy task, but has some hidden difficulties:

  1. A header should be automatically added when logging (with a timestamp, for example), but only at the first occurence of the insertion operator.
  2. There’s no termination character or manipulator to inform the logger object that a line is complete.
  3. The whole thing should be thread-safe at statement level, that is, you don’t want multiple concurrent logging statements to overlap in the logfile.

At first I thought that it was impossible to achieve all of this without altering the API, but after a bit of experimentation with the lifetime of temporary objects and the ownership transfer semantics of auto_ptr, I’ve come out with a solution based on a technique similar to template expressions.

The solution revolves around a template helper object like the following:

template <typename Helper, typename T>
class LoggerHelper { /* ... */ };

which is returned by value in each call to operator<<. The helper encapsulates a pointer to the previous helper (of type Helper) and to the logger value (of type T), plus a boolean flag active. The helper constructor takes a non const reference to the previous helper, deactivates it and sets itself as the active helper: it will be the one to actually perform the logging action, unless another helper steps in and takes control.
The actual logging is performed in the active helper destructor, which is called at the end of the logging statement. The deactivation of all the encapsulated helpers assures that only one active helper exists at any time. The logging procedure involves locking a global mutex, writing to a log stream and then unlocking it, so that multiple logging statements do not overlap.
This solution works for logging temporary objects as well, because they are guaranteed to be destroyed in the inverse order of their creation: since the helper objects will always be created later, their destructors will be called before the destruction of the temporaries, so they are still valid when they are forwarded to the actual stream.

Here is the code, simplified to avoid dependencies.

The “covariant argument problem”

February 25, 2007

Working on KBoard I’ve been forced to face great number of design problems, which I’d say have all been solved quite neatly, with one big exception: the covariant argument problem, hereafter referred as CAP.

This dreaded problem manifests itself when trying to abstract a great amount of functionality from a system using a set of interfaces (i.e. abstract classes with only pure virtual functions) so that client code can add custom behaviour to the system by implementing all these interfaces.
Nothing new, you’d say, that’s exactly the purpose of abstract classes and interfaces, but the difference here is that we have a set of (possibly interdependent) interfaces that we would like to implement at once.
To better explain the situation, I need to set up some names, so suppose you have three interfaces IA, IB, IC, and two different and completely incompatible incarnations A, B, C and U, V, W. Suppose further that IA has a method foo taking an IB as a parameter. When implementing the ABC incarnation in client code, you need to preserve the signature of foo, so you’re forced to accept any IB as its parameter, while of course only B’s have a meaning there. This is the fundamental dilemma of CAP.

The story isn’t new: it’s been studied in great detail by type theorist and even addressed in real programming languages like Eiffel, but the subject is still somewhat controversial and I’m not sure that a clean solution really exists, even accepting a language switch (not that I would second it :)).

Our solution consists in a layer of ugly (and by ugly I mean very ugly) template code which does the work of type conversion and type checking that the compiler is unable to do. The strong point of this solution is that the client never needs to delve inside this messy kludge, and can simply implement the required functionality by creating classes complying to implicit interfaces.

If you are curious to understand how this really works, you can find a short description here, and obviously the code, which is only one header file (but ignore the comments, they’re ancient… yes, I know they should be kept up to date…).

XML configuration files for KDE

February 24, 2007

Since KBoard needed a powerful hierarchical configuration system and QSettings was too limited for our needs, we decided to implement an XML-based configuration framework from scratch.
Hence, Settings was born.

Today, I’ve decided to write a bit of documentation for Settings, and to eliminate all references in it to the rest of KBoard, so that other KDE (or Qt) developers not satisfied with QSettings/KConfig can benefit.

You can find the code and a sketched documentation on the kboard wiki.