Make the Type System Do the Work

I wrote this post February 2012 and somehow never hit publish. So here it goes, two years later, to kick off February 2014.

Declaring types and being restricted by the type system is often cited as a negative aspect of C++. I think this is an unfair assessment: a type system can make a programmer’s life considerably easier if it’s embraced instead of fought, as we’re seeing with the rise in popularity of Haskell. But C++, despite all its warts, has a pretty formidable type system of its own.

The object-oriented paradigm is commonly taught with the “Dog is-a Mammal” architectural mentality where your classes are supposed to mirror real life objects and act accordingly. Make no mistake, this approach is an over-simplification of software architecture and should be treated as such, but the principles behind it are actually fairly sound. Classes should aim to be a self-contained representation of some concept or thing that has state and actions. Here, we’re going to focus on how to make the type system work for you instead of against you.

Specifically, we’re going to focus on the conversion of data from one form to another. Many seem to think of conversions as being functions, taking one piece of data and returning another. But in doing so, we callously throw away dimensional analysis, a skill that appears to have been lost in translation from the natural sciences to computing.

A simple example that demonstrates the importance of dimensional consistency is temperature conversions. All too often we see functions converting equivalent units look something like this:

	double celsiusToFahrenheit(double deg_celsius)
	{
		return deg_celsius * 9 / 5 + 32;
	}

	double temperature_fahrenheit = celsiusToFahrenheit(20);

OK, it works. It compiles, runs, gives the right answer, passes all tests. The only problem is that you end up with a variable that fails to describe itself better than “I’m a number”. We end up using Hungarian-like system (apps Hungarian, specifically) to indicate the true units of the variable (Fahrenheit or Celsius). We recognize the importance of maintaining unit analysis, but we fail to enforce this convention; as with all Hungarian systems, the onus falls on the developer (and future developers) to maintain the accuracy of the system.

Instead, we should rely on the type system of the language to enforce this.

	struct Degrees
	{
		double val;
		Degrees(double _val) : val(_val) {}
	};
	struct DegCelsius : public Degrees
	{
		DegCelsius(double deg) : Degrees(deg) {}
		DegCelsius(const DegFahrenheit &deg)
			: Degrees((deg.val - 32) * 5 / 9) {}
	};
	struct DegFahrenheit : public Degrees
	{
		DegFahrenheit(double deg) : Degrees(deg) {}
		DegFahrenheit(const DegCelsius &deg)
			: Degrees(deg.val * 9 / 5 + 32) {}
	};

	DegFahrenheit input(68);
	DegCelsius temperature = input;

Now it’s obvious to any developer what type of degrees the temperature variable is holding, and the units are carried and enforced by the compiler; you’re physically unable to assign a Celsius degree to a Fahrenheit degree without it converting it properly for you.

The overhead of setting up a coherent type system may seem burdensome, but in an application or library that handles many conversions in ways that should be transparent to the developer, this time investment will pay for itself. All units coming from math and science would benefit from being setup this way: just think how much easier it would be if sin took Radians instead of a double, and Radians had a constructor that took Degrees: you could write sin(Degrees(180)) and get the correct result.

Coordinates

Let’s say you’re plotting points on a graph (one of the many widgets in your application). You want the user to be able to click on a point in the graph and have it draw the point and log the graph coordinates.

Since we’re just dealing with x and y, we could get away with just passing int32_t‘s around. But often times this gets confusing because the graph widget’s mouse click event gives you the coordinates relative to itself, whereas the graph coordinates have the origin at the center of the graph widget, and y grows as you go up instead of down. (And to make things more confusing, sometimes you have absolute coordinates relative to your graph widget’s parent, too.)

As with before, we may have a function with the signature Point pointCoordToGraphCoord(const Point &coord);, but this requires the programmer to remember what type of coordinates they have when handling the data, and creating a developer-enforced naming convention to help convey this meaning is error-prone and tedious. Instead, the type system will not only enforce this convention, it will convert between the coordinate systems as well.

	// just holds an (x,y), oblivious to its purpose in life
	struct Point
	{
		int32_t x, y;
		Point(int32_t _x, int32_t _y) : x(_x), y(_y) {}
		Point() : x(0), y(0) {}
	};
	// represents a point where (0,0) is the top-left of the widget
	struct RealPoint : public Point
	{
		RealPoint(int32_t x, int32_t y) : Point(x, y) {}
		RealPoint() : Point() {}
	};
	// represents a point where (0,0) is in the center, and y grows up
	struct GraphPoint : public Point
	{
		GraphPoint(int32_t x, int32_t y) : Point(x, y) {}
		GraphPoint() : Point() {}
	};

Our mouse handler event, being a system call, probably still gives us a raw x and y, with which we can immediately construct a RealPoint for further use. Now our conversion function can be called GraphPoint realToGraphCoords(const RealPoint &point);, and it’s clear what type of coordinate system any given variable is using.

Naturally, this conversion function should be part of GraphPoint, such as static GraphPoint GraphPoint::FromRealCoords(const RealPoint &coords);. Once the problem has been reduced to just converting real coordinates to graph coordinates, though, it makes the most sense to just create a constructor in the GraphPoint to handle the conversion for us.

	// represents a point where (0,0) is in the center, and y grows up
	struct GraphPoint : public Point
	{
		GraphPoint(int32_t x, int32_t y) : Point(x, y) {}
		GraphPoint() : Point() {}
		GraphPoint(const RealPoint &coords) {
			x = coords.x - GraphWidget::width / 2;
			y = GraphWidget::height - coords.y - GraphWidget::height / 2;
		}
	};

Now, as a developer, we don’t even have to think about which coordinates we have on-hand.

	bool GraphWidget::clickHandler(int32_t x, int32_t y)
	{
		RealPoint coords(x, y);
		
		drawPoint(coords);
		logPoint(coords, "user click");

		return true;
	}

	void GraphWidget::drawPoint(const RealPoint &coords)
	{
		DrawingLibrary::Circle(coords, 2); // etc.
	}

	void GraphWidget::logPoint(const GraphPoint &coords,
		const string &action)
	{
		logfile << action << " at (" << coords.x << ", " << coords.y << ")"
			<< endl;
	}

The type system does all the work for us. The click handler (i.e. the user of our system) does not need to know that drawing and logging require different coordinates systems, and perhaps even better, the drawPoint and logPoint functions don’t need to worry about what’s being passed in. Nobody needs to make assumptions, which means less human errors and more reliable code.

Further Reading

The type system affords developers an opportunity to save time and reduce bugs. Writing maintainable code should be a first priority, and embracing the power of static typing can make code easier to work with down the road. Wrong code should look wrong, and failing to compile is even better. There are numerous every-day examples of how types can help. One such example is handling safe and unsafe strings to prevent XSS attacks by having the type-system enforce unsafe-by-default output: print(NoEscapeString("<b>Note:</b>)); print(usermsg); is easy to reason with.

Since first writing this article in January, I’ve been exposed to Bjarne Stroustrup’s C++11 Style talk which inspired me to finally edit and post it. Stroustrup’s talk includes a great demonstration of how to implement a unit system using C++11′s new user-defined literals, and makes a great argument for type-rich programming.

It’s time to start embracing type systems instead of using non-descript number types and to ask ourselves: how else can I take advantage of the type system to make my life easier?

18 Responses to Make the Type System Do the Work
  1. Henri Tuhola

    Better alternative is to just consistently use kelvins to represent temperatures, and radians to represent angles.

    That way you won’t need any of that.

    • Nathan Wong

      If only, right? :)

  2. Paul McJones

    Byarne => Bjarne.

    • Nathan Wong

      Oops, thanks! Fixed.

  3. Jani Kajala

    problem is that coordinate conversion is context (i.e. frame of reference) dependent and GraphPoint & RealPoint need to be aware of the context. to overcome/hide this limitation you are forced to introduce global (!) GraphWidget::width which is used for conversions. after that, you are trouble once you start doing hierarchical conversions, etc., which require multiple frame of references in nested contexts. then, to overcome that, you are forced to add OpenGL style “global state” (for GraphWidth::width) which gets switched, and that results in obscure bugs as the values change depending which context is active, and that context is far from explicit/understandable from the code you’re looking at (it might be set anywhere!). so, as summary, better keep point as point and handle conversions in higher context-aware level. :) for this reason all window toolkits have coordinate conversion functions as members of the window, not point

    • Nathan Wong

      That’s true. I envisioned (in my contrived mini-example) that GraphPoint would be part of the GraphWidget (maybe even as a subclass), since, as you suggest, the conversion needs to be owned by the widget. You’re definitely right about having to switch the numbers out though, which isn’t ideal.

      One of the motivations of this example was actually writing a similar graph widget in JavaScript, and pulling my hair out over keeping track of the coordinate systems with the Hungarian-like style described here. So while the types aren’t a panacea, it’s a lot better than not having them at all. :)

  4. Cody Jackson

    Just thought I’d make a note. You may want to make your constructors (at the very least constructors with single parameters) explicit constructors to help better enforce type safety.

    An explanation can be found here: http://stackoverflow.com/a/121163

    • Nathan Wong

      Cool, thanks for sharing!

  5. Anton

    Really nice and clean code. Great work!

    • Nathan Wong

      Thanks!

  6. Dmitry

    I have similar thoughts and ask for review about it on codereview stackexchange . No one really liked this approach.

    • Nathan Wong

      Interesting, thanks for sharing. I’m not much into game engines, but I know Ogre uses a Degree and Radian class with operator overloads to handle conversions and it seems to make examples cleaner than just forcing Radians: http://www.ogre3d.org/docs/api/1.9/OgreMath_8h_source.html

  7. Lyndon

    Nice post.

    In that last example, don’t you mean void GraphWidget::drawPoint() ?

    • Nathan Wong

      Thanks, nice catch! Fixed.

  8. Semi Essessi

    Just a small niggle:

    deg_celsius * 9 / 5 + 32

    brackets and types would help a lot here.

    ( deg_celsius * ( 9.0 / 5.0 ) ) + 32.0

    otherwise (i think) by precedence and implicit conversions 9 / 5 will become the integral value 1.

    the brackets i suggest merely because not thinking about precedence is easier

    • Nathan Wong

      Thanks for the suggestion. In this case, because multiplication/division are left-associative, it’s as if it’s written ((deg_celsius * 9) / 5) + 32. So since deg_celsius is a double, it carries through. I would tend to agree that writing 9.0 would be preferable though.

      • Semi Essessi

        ah yes. i always forget that division has the same precedence as multiplication in C/C++ rather than higher as in standard math notation.

        in any case brackets make it clear

        :)

  9. Phil Tornquist

    I agree that the type system should be used to convey interpretation of data. However it is quite difficult/annoying to define data types in most languages which was my motivation for creating a new programming language. I’d appreciate if you’d take a look at it on github. It’s called .