The Worst Mistake in Computer Science

Let’s stop for a moment to think about the worst mistake that has been ever made in computer science. There are a lot of contenders for this title, but I want to tell you about a mistake that has been with us for fifty years. And it became so familiar that we believed it to be essential.

Do you know what am I talking about?

NULL.

A value that is not a value.

It’s a paradox that caused and is still causing significant damage in computer systems. NullReferenceException and NullPointerException are the most popular exceptions in our applications.

Let’s see what the NULL creator is saying about it.

I call it my billion-dollar mistake…At that time, I was designing the first comprehensive type system for references in an object-oriented language. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

– Tony Hoare

What exactly is NULL?

In the majority of object-oriented languages, NULL exists as a default value of reference variables. It indicates that a given variable doesn’t have a specific value. NULL itself also has to have its own representation in a program’s memory.

If we dig deeper, we can notice that NULL is a pointer pointing to address zero in the virtual memory of a given process. More specifically, it leads to the first page of virtual memory, which is reserved by a system for NULL representation.

In the case of Windows systems, this even includes the first 64kb of memory, to which a program doesn’t have authorisation and which are unavailable to it. Because of this, a NULL reference causes exceptions.

What is wrong with NULL?

Overwrite null

As I have mentioned before, NULL represents the first page in the virtual memory of a given process, and the operating system prevents programs from accessing that memory fragment. But is true, without a doubt, that there is no way to modify this part of the virtual memory? In one of my presentations, I show how to overwrite NULL.

It appears that there is an unreported NtAllocateVirtualMemory method in the Windows system, which can allocate this memory block if given properly prepared parameters. Next, we can write anything we want there, i.e. a fragment of a procedure that will execute itself in the event of a NULL reference.

This gap may be even more dangerous, because, inside the virtual memory of a process, there is also a part dedicated to processes working in kernel mode. They use the same virtual memory and the same value of NULL.

The skillful exploitation of these facts may result in a privilege escalation attack. If you are interested in this topic, check my GitHub (https://github.com/bytebarian/NullDereference). I put the code fragments here, tested on Windows 7 32 bit.

Pointers

First, let’s focus on C/C++, because this language provides many proofs that NULL is one of the biggest mistakes. In the beginning, let’s talk about the pointers. They are complex within themselves. but adding NULL to them makes them even weirder.

Take a look at the code fragment below:

As seen, a cptr variable is a pointer to a marker variable, and it adopts memory address, which is an integer, verified correctly by a compiler. But the code below is incorrect:

In this case, we have no guarantee that the address in memory with a value of 404 is correct or that it points to the marker variable. Because of this, the compiler doesn’t allow for such a construction, but if we change this value to 0, then the compiler will allow it.

It appears that 0 is the correct value for a compiler, because, in the case of C++, 0 is treated as NULL. Of course, referencing to a cptr variable will cause the segmentation fault error during program execution.

C-strings

C language creators decided to go upstream by saving strings using ‘\0’ NULL instead of address + length as a magical symbol ending a sequence of symbols.

I don’t know whether it was for optimisation purposes or if there was another reason for this. What I know is that, from today’s perspective, this decision caused terrible consequences. Some people describe it as a “the most expensive one-byte mistake”.

Thanks to the address + magic_marker construction, getting a sequence length became a linear-time operation, which negatively affected optimisation of the memory fragmentation process and memory cache.

Because of this record, null-terminated strings aren’t compatible with ASCII and extended ASCII, but they may be used only in an atypical format called “ASCIIZ”.

However, the biggest problem with this solution appeared at the computer systems security level. The best example of that is the gets() method. If we use it, we can’t predict how many symbols will be read, and this gap is commonly used to conduct buffer overflow attacks.

NULL ambiguity

Now let’s take a look at a problem that occurs in many programming languages and is connected with key-value structures. Let us suppose that we have a dictionary object which has two methods: set(key, value) method which allows for adding a given value under a specific key and get(key) method which obtains a value assigned to a particular key.

Check out the code below, which – in similar form – could be found in many applications.

Although, let’s assume that some of our users don’t have a phone number. How, in that case, can we add a user? For example, it could look like this:

Now we have a situation in which the get() method returns NULL in two cases – if a particular user doesn’t exist in a data set or if he exists but doesn’t have a phone number. Because of this, when we receive a NULL value from the get() method, we can’t define what situation we are facing.

NULL conversion

It’s time to laugh at Java. Java quietly carries out a transaction between simple types and reference types. It gets weird when NULL enters the stage.

First, take a look at this fragment of code.

Of course, the program won’t compile, because int is primitive and we can’t assign a NULL value to it. But the next code fragment will compile correctly.

Integer type is a reference type, and its value may be set to NULL. And because in Java, Integer conversion to int is set by default, the program will be compiled. Of course, during execution, the application will throw up a NullPointerException.

Difficulties with debugging

Let’s get back to C++, which will once again prove how problematic NULL is. Specifically, we are talking about executing a function on a NULL object. Such an execution may or may not end our program with an error.

When we execute the code above, calling the bar() method will end successfully, but calling baz() will cause an error. Why? It turns out that, because the result of bar() function is already known during compilation, the function itself is converted to the static Foo_bar(foo) function.

Then, even if the foo variable is NULL, the method doesn’t refer to it during its implementation. Another situation is in the case of the baz() method, which refers to field x, defined in the object, which was set to NULL. This causes a classic segmentation fault.

To complicate this situation a little bit, let’s go back to the bar() method and let’s declare it as virtual.

Will this small modification change anything? It appears that yes, it will. When it comes to virtual methods, a program – during execution – must check the real type of foo variable, which causes an exception and ends the program with an error.

As you can see, C++ programmers don’t have it easy when debugging similar code. To complicate their life even further, it should be mentioned that, in the C++ standard, referencing to NULL is an unidentified operation, so at a technical level, we can expect anything.

Javascript to the rescue

Javascript is a dynamically typed programming language, and in this case, it makes things even more complicated. Why is that? Because what if we have an object and we try to refer to one of its parameters, which hasn’t been declared?

Some dynamically typed programming languages return an exception or NULL in such a situation. However, Javascript creators wanted to distinguish a case in which a field in an object existed but wasn’t set to any value (NULL) from a situation where no value has been declared (undefined).

And it was at this moment that the creativity of Javascript creators ran out. They probably didn’t notice that it’s possible to declare a field in an object and set its value to “undefined”. Unfortunately, we can’t choose “superundefined” or “uberundefined” values. In the end, Javascript ended up with two problems instead of one because of weird practices.

We’ve got a winner!

Javascript creators claimed that one NULL is not enough, and at the same time, Objective-C creators wanted to set some record. In this programming language, there are four versions of NULL.

Classical NULL as an inheritance from the C language, which refers to simple types, and with a value set to 0, as in the C language
In a category of pointers to objects, as the “nil” value
In a category of pointers to classes, as the “Nil” value
NSNull – a singleton object indicating that there is no value.

I am not an Objective-C programmer, and I don’t know what they had in mind, but I can’t explain why anyone would need as many as four representations of “nothing” in one programming language.

Slow failing vs failing fast

There are two main approaches to the problem of error handling in computer applications. Some may choose to do everything it takes so their app won’t stop working despite numerous errors.

And some may allow their system to stop working just after an error occurs in order to know that something wrong happened.

NULL overuse causes our system to die in great agony. It is also torture for programmers who are looking for causes of exceptions and errors.

One of the most frequent errors made by programmers is returning a NULL value while the documentation does not set out how the system must behave in a specific situation.

If we don’t want to face errors like NullReferenceException in this situation, we have to use defensive programming. Our code loses a lot of its readability, and it becomes more and more complicated because, before every object reference, we have to check whether it’s NULL or not.

How to fix a billion dollar mistake

If it is such a severe error, what chance do we have to defend against it? Can we at least limit its harmful effects? Of course, because there are languages that don’t define such a thing as NULL, i.e. Hask or Erlang.

Crystal language took another path – every type is non-nullable by default, but it is still possible to declare a variable as NULL. Then the compiler enters the stage, because it can track such references and give the programmer information by using errors and alerts.

What are the possibilities available to us if we are dealing with a language that has a problem with a NULL reference? For example, we can use exceptions in methods instead of returning NULL.

Of course, we should use exceptions only in extreme cases when a program’s behaviour is invalid.

Null Object Pattern

Do you want another solution? Here you are – you can use a design pattern called Null Object Pattern. It involves creating a special class which represents the empty value. Let’s take an employee class as an example, and let’s see how to declare an empty class in this case.

As you can see, we declare a dedicated class representing the NULL value in the employee class in the form of the NullEmployee class. It overwrites virtual methods in different ways, and it can return exceptions or implement logic, which doesn’t change an application’s status. Using this pattern could look like this:

May Object Pattern

Most of you may recognise this approach to dealing with NULL from different functional languages. It’s also a solution very similar to NullableTypes from the C# language, but it only refers to value types.

In the beginning, we declare an interface containing two methods – one of them allows for checking if an object has any value and the second allows for getting this value.

Then, we create two implementations of that interface – one specifically for objects that have value, and one for empty objects representing NULL.

We can use such a solution in this way.