Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What is wrong with NULL (lucidchart.com)
150 points by mjswensen on Aug 31, 2015 | hide | past | favorite | 142 comments


It's worth noting that having a NULL, somewhere, is not so much a problem as forcing types to have a NULL. In this respect, e.g. Python and Rust both get it right: while Python has a None, and Rust has std::ptr::null(), an object that is a string cannot be any of those, because those are different types (NoneType and raw pointer, respectively). C's problem is that a string could be null, and there's no syntax for a non-null string.

Python's problem, meanwhile, is that any parameter could be any type. In C, you can pass NULL to a function expecting a non-null string. In Python, you can also pass 42, [], or the sqlalchemy module. None isn't special here. :)

Also, to quibble a bit: Rust's std::ptr::null() is just a raw pointer, and isn't actually usable in safe Rust. Actual safe pointer types are guaranteed non-null and you'd use Option<&T> to specify that they're nullable. Raw pointers cannot be dereferenced in safe Rust. It is true that std::ptr::null() is in Rust's standard library, but Foreign.Ptr.nullPtr is in Haskell's, and the purpose is the same (a special raw-pointer type only used for FFI purposes), so Rust isn't any worse than Haskell here.


"In Python, you can also pass 42, [], or the sqlalchemy module. None isn't special here..."

...the interpreter's environment, my left elbow, the magic unicorn that lives under my steps,...


Sorry, no. You can only pass a stable, property typed Python representation of your left elbow, or the magic unicorn, not the elbow or unicorn itself.


If it quacks like a unicorn then it's still a duck


Unicorn? Une corne?

Ceci n'est pas une corne.

Sometimes a horn is just a cigar.


You can get a None type when you are expecting a String type... that is exactly the bug they are talking about. The fact that Python makes this problem BIGGER by allowing other types as well doesn't make it less bug prone.


Except the language is designed around this - for example, trying to get a value from a dictionary in Python doesn't return `None` if it doesn't exist, it throws an exception.

I'm not saying static typing has no value, but there are benefits to dynamic typing, and Python embraces its dynamic typing and is designed around it, so none of the problems given in the article really apply.


You make a good point. I removed the mention of std::ptr::null, since it really isn't really a NULL like other languages. Like you said, it's for FFI.


Could you add a strikethrough and an addendum noting that it is like Haskell's nullPtr? It can be educational for others to see why you removed it.


I also think that Go did pretty well to fix NULL. It didn't get rid of it completely, but it (nil) is only valid for pointers and interfaces, so it's not possible to use it in place of a string, integer, etc. It's not as bulletproof as Rust or Haskell, but something as simple as disallowing nil strings can make a world of difference.


This is what C does, so I don't think Go deserves much credit for innovation on this point...


Go didn't really fix anything when it comes to NULL. The defense the Go authors gave to keep it around was laughable.


Uglier than a Windows backslash, odder than ===, more common than PHP, more unfortunate than CORS, more disappointing than Java generics, more inconsistent than XMLHttpRequest, more confusing than a C preprocessor, flakier than MongoDB, and more regrettable than UTF-16, the worst mistake in computer science was introduced in 1965.

That could be the greatest intro sentence ever seen on Hacker News.


I also don't think a C preprocessor is at all confusing, it's quite a simple program, both to write and to program. Including less used features such as concatenation or stringification.


This is what you have to do to concatenate a string with another macro:

    #define VARIABLE 3
    #define NAME2(fun,suffix) fun ## _ ## suffix
    #define NAME1(fun,suffix) NAME2(fun,suffix)
    #define NAME(fun) NAME1(fun,VARIABLE)

    int NAME(some_function)(int a);
From:http://stackoverflow.com/questions/1489932/how-to-concatenat...

C preprocessor can be confusing at times.


I think it's more that it can be confusing for the reader.


Pretty good, but I've never had a problem with an XMLHttpRequest being inconsistent. Everything else seems spot-on though.


XML is uppercase but Http is not. That seems to be inconsistent, though in some coding standards that's a correct naming scheme: acronyms with length less that 4 are named in uppercase (XML, URL) and acronyms with length >= 4 are named as Http. Another example is HttpURLConnection class from the standard Java library.


Clearly you've only started using it in the past few years... It was horrific and totally inconsistent across browsers and between versions about 8 years ago and earlier.


I think they're joking about the capitalization of the name.


Yeah. That's what I was going for ;)


I get it now! Duh.


You've probably started using it more recently, but if you want to see what it used to be like, check out this XMLHttpRequest wrapper and see how many browser bugs it fixes:

https://github.com/ilinsky/xmlhttprequest/blob/master/XMLHtt...


Disagree. I really dislike that sentence stylistically. By the time the author got to the point, there was no way it could live up to that many words (that I mostly scanned, since they don't really add anything to the main point of the article).

The article is great, don't get me wrong, and I wouldn't have mentioned anything if you hadn't brought up this point, but I thought I would just provide a counter-point.


One of the problem's Maybe still has is in the deeper question of "why are you expecting None to be here?". Don't get me wrong, there are valid cases for this, and Maybe is certainly preferable to null across the board, but I think the movement to Maybe in the greater programming space will in many cases practically result in trading one set of explicit errors (crashes) for (a more subtle?) set of errors (behavioral). In particular, this style of programming will become frequent (from Swift documentation):

   if let roomCount = john.residence?.numberOfRooms {
      println("John's residence has \(roomCount) room(s).")
   } else {
      println("Unable to retrieve the number of rooms.")
   }
Or from this blog post's own example:

   option.ifPresent(x -> System.out.println(x));
In other words, I think the core problem still hasn't been attacked and we may end up in the same situation we were in with exceptions originally: programmers will just throw up their hands and wrap everything in a Maybe/Optional and/or just maybe-protect until the compiler stops bugging them. At the end of the day, if you ? all your values then you end up with something equivalent to having everything be nullable and correctly null-checking them.

Obj-C had a form of this with nil calling of methods silently doing nothing (so you end up with methods that "conveniently" don't happen, and don't crash! when the receiver is nil). However, despite having lots of legitimate uses (don't bother checking your delegate exists), it can still very silently sneak in to other parts of the code.


This is a concern, but the advantage to Maybe is it makes this concern very explicit. SML/NJ code will not compile if you don't have a binding to handle the None for an option 'a type. The programmer does of course then have the option of doing

None => raise Error

but practically speaking, when deciding it's time to make your code more robust, it's a lot easier to text search for instances of "None => raise" than to search for the absence of proper handling of the possibility of a null arg or return value.


Exactly. You can still get errors due to things not being there. But your assumptions are explicitly stated.

You see None => raise Error, and a little alarm goes off in your head, and you look around for a reassurance that we really want to do that.

In contrast, you don't give x.toUpperCase() a second glance.


You seem to be saying that developers can make logic errors despite expressive types. Of course that’s true—type systems are a first pass to absolve us from the trivial errors that otherwise waste our time.

I don’t agree that lazy programmers will add optionality to please compilers rather than fix their program logic. In languages where optional types are ubiquitous, you usually see people avoiding needless optionality, because it’s simply easier.


> Obj-C had a form of this with nil calling of methods silently doing nothing (so you end up with methods that "conveniently" don't happen, and don't crash! when the receiver is nil).

For me it is the worst part of Objective-C. Instead of getting a crash at nil, you will get a weird program behavior later because something wasn't called.

And I never use this feature because it complicates code flow and brings no benefits.


The mistake is an unsafe null: making every object type carry a value in its domain which says "Oops, though my type says I'm a Foobar, I'm not actually an object, la la la! Have a free exception on me, in your face!"

Lisp's NIL is brilliant. It's in its own type, the null type! And it's the only instance of that type. No other type has a null instance.

Null references in the language simply reflect the tension between the static type system which is supposed to assure us that if static checks pass, then values are not misused, and the need to have sometimes (at the very least) a "Maybe" type: maybe we have an object, or maybe not. Null references are the "poor hacker's Maybe": let's make every type support a null value, so that Maybe shows up everywhere.

Maybe, or something like the Lisp NIL, are themselves not mistakes by any stretch of the imagination. Sometimes you really need the code to express the idea that an object isn't there: directly, not with hacks like some representative instance, and some booelan flag which says "that's just a decoy indicating the object isn't there". You want the exception if the decoy is used as if it were an object; that's a bug. Something is trying to operate on an object in a situation when there is no object; safely operating on a decoy just sweeps this under the rug. (Yet, not always: sometimes operating on the decoy is fine. Lisp's NIL lets us specialize a method parameter to the NULL class and deal with it that way in one place.)


nil really isn't any better than null-pointers/references. In the end, you need to check for nil-ness anyway, and there's no mechanism that will help you ensure the check is made. Algebraic data types with pattern matching, that is, Maybe/option is really the proper solution.


NIL is objectively, unquestionably safer than a raw machine pointer that is null. It's a first class object with a type; a symbol with a name, and instance of the NULL class and so on.

Of course, if you want to use NIL as a character string or floating-point value, there is an exception. That's your fault: you needed a way to indicate "I don't have a string here" or "I don't have a number", and you bungled the logic.


"Of course, if you want to use NIL as a character string or floating-point value, there is an exception. That's your fault: you needed a way to indicate "I don't have a string here" or "I don't have a number", and you bungled the logic."

And there's the rub: Just like NULL. And, in fact, that's the major problem with NULL. NIL is better than NULL, but not by much.


> NIL is better than NULL, but not by much.

NIL is probably the best you can get once you realize that the empty state is unavoidable. My personal problem with this kind of rant is that is full of hate but lacking in general in alternatives to represent an empty state.

The author mention optional types as an alternative. That's fine for parameters but that doesn't cover when you need to have an empty reference.


NIL is fine ... as long as you have some container for the opposite.

There should be a way to express Maybe<Maybe<T>>, aka an optional optional value.

The Store article in the article is an example of when this would be needed: a cache optionally has an optional phone number for a person.

If Lisp has the equivalent of Maybe<Maybe<T>> (perhaps a singleton list of a singleton list?), then the optionality is composable. Otherwise, the optionality is not really composable, and you will have problems.


The optionality is composable, because we can have the nonexistence of a cache entry (cache lookup fails, perhaps indicating so by returning nil instead of a cache entry) paired with the possibility that cache entry which exists (successful lookup) has a nil value for a phone number.


Though the philosophy is different, I'm not sure I see an advantage for NIL when you just look at the effect it has on a programmer's code. You can still have a "NULL-pointer dereference" in CL, except the error is rephrased as "SYSTEM::%STRUCTURE-REF: NIL is not a structure of type X" or whatever. And you still have to have "(if (not (null x)) ...)" which doesn't do much for you, regardless of the underlying theory.


Nope. You have choices:

  (defmethod quack ((foo string))
    ... ;; foo is never anything but a string here. 
    ... ;; definitely not nil!
    )

  (defmethod quack ((foo null))
    ... ;; foo is definitely nil here
    )
Unlike in languages where you have a String reference that can be null, a CLOS method parameter specialized to a string cannot be NIL!

We can also have this piece of minimalism, quite often recurring in Lisp code:

   ;; yield x if it is not nil, else yield y
   (or x y)
Or play hardball:

  ;; if calculation signals an error,
  ;; catch it and keep going
  (ignore-errors (calculation x))
After programming in Lisp for a while you would never write:

  (if (not (null x)) ...)
because it is verbose way of expressing:

  (if x ...)
Every value is its own test for non-nullness, since every value is a generalized boolean which stands for falsehood if it is nil, and truth otherwise.


Well, except that the Lisp's NIL (and Python's None) only works with dynamic types, where you never had any kind of guarantee to start with.

C and Haskell for example have a void type, that is not null, and in Haskell has a value. But a static void is completely different from a null.


Dynamic types in fact provide something which qualifies as a guarantee. If properly implemented, they prevent an incorrectly typed operation from being applied to a value, which would otherwise result in nonportable, unpredictable or fatal consequences, possibly without any diagnostic. Dynamic types, in an of themselves, do this at the last possible moment. Dynamic typing doesn't preclude the application of static checking with type inference and/or optional declarations, however.


Lisp nil is really unit right? Nothing to do with null as otherwise used?

Can you describe when you want to directly have an object that isn't there? If you want a "decoy object" that faults if used... why isn't Maybe perfect for that?


ruby nil is similar in that it's an instance of the Nil class. though, calling a method on nil that isn't there is a common error.

I like how swift handles things. strong type system that makes it a little painful to do things unsafely. guarantees some things that make it a powerful and more safe language than objective c. I think they lost a few dynamic patterns though in the transition but that's just what i recall from a blog post and the details of it are a little fuzzy.


There is a reason why in Lisp the type of NIL is NULL and not NIL. NIL does in fact name a type also. That is different from the NULL type/class.

NULL has one instance, the object NIL. NULL isn't derived from anything; its superclass is the master superclass T.

The NIL type has an empty domain: no object is of the NIL type. (Somewhat like in C and C++, no object is of the void type, for those who like pointless C and C++ analogies that don't lead anywhere.) Furthermore, the NIL type is a subtype of every other type, the same way that T is a supertype of every other type. That is to say, every type is implicitly a supertype of nil. This goes hand in hand with the fact that the set of instances of any type is a superset of the empty set. E.g. set of integers (a type) has an empty subset (every set does), and that empty subset can be identified as the NIL type, whose domain is the empty set.


I agree, Swift's optionals are really great. But I also like Objective-C's nil: you can send messages to it and it won't do anything. That too solves a ton of problems with NULL.


What solves problem with null is if you can send messages to it, and have it do exactly what you want depending on the nature of message.


We don't need to get rid of null, we need more, type-specific and context-carrying nulls: types need a way to signal various error states via an enumeration of instance constants that carry the error state in a type-compatible manner without some syntactically crappy mechanism like Maybe.

  class Connection {

     //Error instances
     BAD_ADDR("The given address was not correct...")
     ...

  }
These constants should throw, just like a null value would, but with the specific error-state message, and should be testable for both the general and specific cases.

Try/catch has its place, but the persistence of null-as-error-state behavior among programmers shows that signalling-via-return-value is a practical and intuitive way to handle error states. We should make that better, rather than simply tipping our nose at null.


If you have a language with enum types (Haskell, ML, etc., or any of the languages inspired by them like Rust, Scala, etc.), that's straightforward. In e.g. Rust, you'd have

    enum Connection {
        ValidConnection {fd: RawFd, address: SockAddr, ...},
        BadAddr,
        BadPort,
        ...
    }
In fact Rust's representation of the maybe type is just a generic

    enum Option<T> {
        Some(T),
        None,
    }
so all you're doing is getting rid of the two layer Option<Connection> syntax and making it a single Connection, so you're not writing Option<this>, PossiblyBadAddress<that>, etc. everywhere.

Then all your call sites can do one of two things. Either they can check for specific errors, like

    match conn {
        Connection {fd, address, ..} => write(fd, ...),
        BadAddr => /* handle error */,
        BadPort => /* handle error */,
    }
or you can write a simple function that throws an exception, if you really want an exception-handling style:

    impl Connection {
        fn get_fd(&self) -> RawFd {
            match *self {
                ValidConnection {fd, ..} => fd,
                BadAddr => panic!("The given address was not correct"),
                BadPort => panic!("The given port was not correct"),
            }
        }
    }
and then call conn.get_fd(). For a simple command-line app, panicking and aborting is pretty much what you'd want to do anyway. For a test suite, panics are caught per test case (and you can even test that a function does panic with a given message), so it's very testable.


You shouldn't have to specify the error states in each method, this is why what I'm advocating is different than a simple enum:

- You don't need to define a constructor to take the error string

- You don't need to implement method dispatch (all methods of an error state instance automatically throw, like null does)

Enums are close, but not quite right.


Hm, I think what you might want is an enum / sum type, but with the ability to state that a particular variable is statically a particular variant of that enum. So, for instance, I can write fn serve(conn: &Connection::ValidConnection), and it's a type error to pass a generic unchecked Connection to it. Then serve() can go and call other functions that take a ValidConnection without doing any further error handling.

You might, for convenience, have a single method that turns a Connection to a ValidConnection, or else throws, and you can encode your error messages in one place in this method.

I think this proposal permits such a thing, although I haven't read it closely yet: http://smallcultfollowing.com/babysteps/blog/2015/08/20/virt...


Rust's error handling (try! and error-interoperability) do this well.


> We don't need to get rid of null, we need more, type-specific and context-carrying nulls: types need a way to signal various error states via an enumeration of instance constants that carry the error state in a type-compatible manner without some syntactically crappy mechanism like Maybe.

Sounds like you want to go a step beyond Maybe to Either.


I don't think enumeration is the problem with handling error states, I think handling error states is the problem with error states.

That is to say, there's nothing wrong with null: There's something wrong with null-as-error, but it's not the blankness of the null that's the problem.

The user is frustrated by this: Errors are fixable, so more important than more types and more maybes is that the errors communicate how to resolve them and what happens next:

    connection new_connection(address) {
      if(!address_valid(address)) return new_connection(prompt("The given address was not correct...", address));
      ...
Look: The user is prompted to correct the address in an interactive implementation, but a non-interactive (command-line) instance can implement prompt() as a combination of perror() and exit().

Then consider the following:

* Start writing a large file * Run out of disk space * Delete the whole file and return an error

The user has been frustrated by this for decades! It should be:

    again: ret=write(...);
    if(ret == -1) switch(errno) {
    case ENOSPC: wait(ENOSPC); goto again;
    ...
and yet very few languages or environments have useful implementations of prompt() and wait() despite how trivial it is (even in C)! Even fewer libraries make shoehorning correct error handling in. Error handling just isn't part of our (collective) programming culture and if we're going to change something, we should fix it right.


I think Either's handle this really well (and in fact this is more or less how its done with ErrorMonads). You basically return Either Value [Error], which has the benefit of being much more compassable than something that breaks control-flow like try/catch.

For example, if you have something like

    x = emptyStringOnDoesntExist(read(path)) // "" on not exist error BUT NOT permission error
    x = emptyStringOnAnyError(read(path)) // "" on any error
So right now this doesn't seem THAT different than using try/catch to select values depending on the type of error "thrown". But its critical to notice that we are going through normal programming control flow here (we pass in the result value, not some sort of weird function wrapper function that wraps read and inserts a try/catch). This becomes much more apparent how its useful when you have more interesting tasks:

    var filenames = [..,...,..];
    var concated = filenames.reduce(pipe(read, emptyStringOnDoesntExist, concat), "");
So now we've done something really neat: we are concating a bunch of files, and accepting some may not exist, BUT if any of them have a permission error the whole thing will return Either _ [PermissionError]. Now are errors are ACTUALLY composable.


I agree with the article, yet, despite what the table at the end suggest I got significantly less problems with NullPointerExceptions since I switched from Java to C++ for my work.

C++ has an inbuilt alternative: references. References in conjunction with the Null Object Pattern have solved my problems until now.

References simply cannot be null -- and I generally do not use raw pointers in my code unless some library forces me to.


A null pointer can be dereferenced into a "null reference" which can be passed around, however. Theoretically UB, in practice, typically represented identically to the null pointer and behaves the same when used. Still helps in that the null bug root cause can hide in less places (modulo memory overwriting which adds root causes unimaginable in Java)


The title originally matched the blog post ("The worst mistake of computer science"), but then it was changed by a moderator.

Maybe it could have been a less drastic change?

"NULL, the worst mistake of computer science"

or

"The worst mistake of computer science: NULL"


Here is what is wrong with Option.

option.ifPresent(x -> System.out.println(x));

So instead of just checking to see if it is NULL you want me to create an instance of a specialized class that holds my variable that has a method that acts like an "if" statement that I need to pass an anonymous function to that receives the value I already have?

Why not just do:

x && System.out.println(x)

or:

System.out.println(x) if x

Or if you want to skip over the rest of the logic if it is null:

return unless x

System.out.println(x)

I don't see why I need to introduce a new type system and complicate things.

For default values you want me to create an instance of specialized class that holds my variable and has a method that allows me to get my value or return a default value?

option.orElseGet(5)

So instead of a memory lookup I now need the overhead of an entire function call?

Why not just do:

x ||= 5

Or better yet, just put it in your function declaration as a default value:

myFunc = (x=5, y, z) -> ...

The one benefit I see is type safety but it is extremely rare for experienced programmers working in dynamic languages to have bugs related to type.

You are layering on abstractions and forcing programmers to go through hoops just to get at their data. It is more complicated and increasing the cognitive load.

There's also the fact that you are going through functions and classes instead of just memory accesses. This makes code less performant as well.


> it is extremely rare for experienced programmers working in dynamic languages to have bugs related to type

That's a good one. I really needed that ;)


> The one benefit I see is type safety but it is certainly not easier

Type safety is one benefit, but composability is an important benefit (as, you know, is extensively discussed in the source article.)


Composability comes from functions, macros, and inheritance. What does that have to do with types?

"In fact, composibility is really the fundamental issue behind many of these problems. For example, the Store API returning nil for non-existant values was not composable with storing nil for non-existant phone numbers."

That's only because you invented some abstraction that got in your way in the first place. NULL is a perfectly acceptable value in dynamic languages and even in most document store databases.


This is not contrived.

(1) A cache returns something or NULL. It's generic; i.e. implementation doesn't care what it is storing: integers, strings, etc.

(2) A particular value of interest may be NULL or non-NULL.

But now I can no longer use my generic cache and my values of interest together.

A very real example of this is Java's Map interface. It's completely up to implementation on how they handle null, making interpreting null results difficult.

While often you use NULL and everything works, you're still limited to a certain set of circumstances. You can't pick up thing A and thing B and just use them together. Hence non-composable.


You're conflating the meaning of NULL in that case. That's not a problem of NULL, you're using NULL to represent multiple different things. You can use exceptions to handle that scenario.

  cache = {}
  getFromCache = (key) ->
    return cache[key] if cache[key]?
    throw "not found"
  
  doSomethingWithValue = (key) ->
    try
      value = getFromCache(key)
    catch
      value = getValueFromNonCache(key)
    return _doSomethingWith(value)


You can, but resorting to exceptions for what may not be an exceptional circumstance is a workaround for not being able to think of a way to communicate results correctly through the return value. Which illustrates the need here as effectively as the ambiguous use of null in the article.


You don't like exceptions so your solution is creating an instance of a specialized class that holds my variable that has a method that acts like an "if" statement that I need to pass an anonymous function to that receives the value I already have?


I think your failure to use Option correctly cannot be blamed on Option alone, given that many people use it successfully.


I think your failure to use NULL correctly cannot be blamed on NULL alone, given that many people use it successfully.


The whole problem with NULL is that practically no one can use it correctly.


C++ introduced non-null references. But they didn't enforce them. There are still "I'm so l33t I can use null references" people. The new move semantics use null references for things moved from. C++ is trying to do Rust-like borrow checking without a borrow checker. Errors result in de-referencing null and crashing.

Rust doesn't have null, but it has Option<T>, which is often syntactic sugar for null.

It's not that null is bad in itself. It's that having variables which might be null need to be distinguished from ones which can't be null.


"Rust doesn't have null, but it has Option<T>, which is often syntactic sugar for null."

In a sense, yes. But you have to do something positive before you can pretend that nothing is something. And the call to unwrap can be annotated with an argument as for why it can never fail, or why it's unimportant if it panics, or even be outlawed by a style guide.

"It's not that null is bad in itself. It's that having variables which might be null need to be distinguished from ones which can't be null."

Amen, brother.


The new move semantics use null references for things moved from.

No. When you move the contents out of somewhere like an std::vector instance, that object is left in an undefined but valid state. You're free to continue working with that object (though the only meaningful thing you can usually do without undefined behaviour is destruction or the equivalent of clear). That's very different from null references.


Right; you can't have an array of references, and it's "unique_ptr", not "unique_ref". But pointers do get set to null.

If p1 is a unique_ptr, and you do

    p2 = std::move(p1);
p1's pointer is set to null. Further uses of p1 will fail, or crash, or something.

    p1.get()
returns the underlying pointer or null, apparently escaping the unique_ptr protection.

It's far inferior to Rust's borrow checker.


This mistake is fixed in Haskell.


Rust, too!

Edit: Btw, I disagree with how the article categorizes Rust in comparison to Haskell. It shows that Rust has std::ptr::null, but neglects the fact that Haskell has Foreign.Ptr.nullPtr. Either both should be "5 stars" or both should be "4 stars".


Came here to say this. In fact, I think every language that they give 5 stars to has some form of "foreign pointer", "raw pointer", "unsafe pointer" or the like that is nullable, for FFI and other low level tasks.

I think that anything which has no null in normal, idiomatic code, outside of "unsafe", "ffi", or similar subsets, should get 5 stars. The distinction is really about whether you need to worry about any possible value, or any possible reference, being null, which you do in languages like C or Java where all references are nullable.

Giving Java and Rust the same 4 star rating because they both have some form of null and some form of Maybe/Option is a bit misleading. In Rust, Option is what you use in any normal code, and so you don't need to check for Null everywhere. In Java, it's the other way around; nullable references have existed since the beginning and are not segregated in any particular way, while Optional is a recent addition.

Likewise, I think that in Scala and Swift, null is only present for compatibility purposes, and idiomatic code does not use them. I'm not sure about F#. Clojure does use nil idiomatically, and also has '(), which may even count as "multiple NULLs" according to this rubric, though I guess that in Clojure '() is just treated as an empty list, rather than a null value like it is in other Lisps.


> Likewise, I think that in Scala and Swift, null is only present for compatibility purposes, and idiomatic code does not use them. I'm not sure about F#.

I believe though that with the exception of Swift, null can infect these language's quite easily via the FFI, and the guarantees aren't as strong. I haven't used them much though, so I could be wrong.


Yeah, I guess it depends on how often you use the FFI, and I'm not familiar enough with these languages in practice to know how often that comes up. Part of the point of each of these is that they can utilize an existing ecosystem in which null is common (Java/JVM for Scala, C#/CLR for F#, Objective-C/Cocoa for Swift).

I know that in practice in Rust, the general approach is to write safe wrapper around any C libraries you're using, so the use of nullable pointers is generally confined to those wrappers.


Author here. I didn't know that about Haskell.

Admittedly, the "rating" is pretty rough, maybe even a bad idea.

I've seen std::ptr::null more than I ever have Foreign.Ptr.nullPtr.

But really, both are usually used for compatibility with external libraries/programs/runtimes, not for idiomatic language programming.

Both great languages in my book :)


> I've seen std::ptr::null more than I ever have Foreign.Ptr.nullPtr.

Probably because we need to work with the C API a great deal right now, but that should change as more and more things are written in Rust. Still, the unsafe boundary helps by removing the ability to dereference a raw pointer in safe code - something that Haskell doesn't have.


FYI, I fixed it. std::ptr::null isn't really comparable, so I removed it. You can't really use it in Rust like you could in other languages.


Ya, all of the 4 star ones basically include it in their ffi compatibility layers. The five star ones do too.



The star chart at the end of the article lists Haskell, OCaml, and Standard ML as languages that omit NULL. (Where no null means five stars, of course.)

I think Rust should also have 5 stars in this comparison as it forbids null pointers.


That's mentioned in the article. It gives Haskell's null handling a score of 5 stars, along with OCaml and Standard ML.

EDIT: The articles definition of 5 star null handling is "Does not have NULL." :)


Not completely. non-nullable by default is nice, ignoring possible nulls is nice, but Haskell's Maybe still suffers from premature generality by conflating all forms of absence. A 'Maybe T' is fundamentally, context-sensitively, not equivalent to any other 'Maybe T' in the same way all 'T's are. This is bad.

edit: carsongross beat me to what I'm talking about with a better explanation: https://news.ycombinator.com/item?id=10149129


Can't you just handle this with something like `Either ErrCode T`?

(And if not, please do explain why!)


You can, but the existence (and use in libraries!) of Maybe is still bad for the same reason having 'null' is bad. It's an anti-feature that would enrich the language by not existing.


When you need to distinguish missing values, that's why Haskell has Either. Maybe is for when you only care if there is a good result are not, Either covers the case when you need to distinguish different kinds of "not".


Haskell has `undefined` bombs, which are a similar problem.


>NULL is a terrible design flaw, one that continues to cause constant, immeasurable pain

Exaggeration much? Certainly not "the worst mistake of computer science". IPv4 is much worse, just for one example. NULL isn't even visible to end-users, many mistakes in CS are quite visible and really impact non-programmers' lives. NULL is just the color of the wallpaper in the engine room.


It is vastly more destructive than IP v4. This affects end users directly, every single day. The number of times applications have crashed due to NULL related errors is probably in the tens or maybe hundreds of billions. Each time is an interruption of people's work and in some cases it destroys hours of work.


> NULL isn't even visible to end-users

In the same way HIV isn't visible; sure, the cause isn't visible to most people, but the adverse effects are.


> IPv4 is much worse

How is it worse, technically, beyond a constrained keyspace?

IPv4 is an unfortunate, entrenched reality.

NULL doesn't need to be, and isn't, in some ecosystems.


No mention of SQL, where NULL's behavior in the standard can lead to some quirky behavior that I've seen bite back in poorly designed systems. I've included a brief illustrative example. The expectation is that the UNION of two WHERE clauses, one using IN and the other using NOT IN should be equivalent to the same SELECT without any WHERE:

    WITH NullCTE AS
        (SELECT a.*
        FROM
            (VALUES (NULL), (1), (2)) a (Number)
        )
    ,One AS
        (SELECT Number = 1)

    SELECT *
    FROM NullCTE nc
    WHERE nc.Number NOT IN
        (SELECT Number
        FROM One)
    UNION
    SELECT *
    FROM NullCTE nc
    WHERE nc.Number IN
        (SELECT Number
        FROM One)
Running this will give you back a two-row table, containing 1 and 2, but the NULL is excluded from both WHERE conditions.

The naive expectation is that the combination of a condition and the NOT of that condition cover all possible circumstances.


I thought of including SQL NULL. Unfortunately, I felt I would do such a poor job of enumerating the problems or describing them in a somewhat comprehensive way, I didn't even try.


I was with the article until the part about null terminators on strings. Null terminators are nothing like a NULL reference. We could just as easily have dollar-sign-terminated strings and no one would be conflating that idea with the concept of NULL.

Also, NULL pointers are the least problematic type of pointer if you ask me. If a pointer is NULL it is not likely a security problem, and certainly not on its own. Accidentally using a NULL pointer will cause a crash, but accidentally using any other pointer could cause unlimited damage.


To quote the article:

> This is a bit different than the other examples, as there are no pointers or references. But the idea of a value that is not a value is still present, in the form of a char that is not a char.

It has nothing to do with the similarity in name (NULL / NUL). As you said, it could be terminated with $. The similarity is that they are both sentinel values. NULL is a sentinel for types; NUL is a sentinel for char arrays.

In both cases, they create exceptional and non-composable situations.

Good explanation here: https://www.reddit.com/r/programming/comments/3j4pyd/the_wor...


I know the author claims that there's a meaningful similarity beyond the names. But that's where I disagree.

I get that a NULL pointer is in some meaningful way not a pointer. Because it literally isn't pointing to anything. It is not the case that NUL is a char that is not a char. NUL is a perfectly good char. Some functions treat it specially, but other than that it's just a regular char.

Even the sentinels similarity is quite weak. NULL is a sentinel that indicates "no valid object" whereas NUL is a sentinel that indicates the end of a perfectly valid object. Notice how we're not having this discussion about '\n' or the whitespace chars in general, even though they're also treated as sentinels by some functions.


Many modern languages (Kotlin, Rust, Swift) handle this problem well.

Though I'm not sure if that problem is really that huge. Bad code will break in many ways. And breaking on nulls usually isn't that dangerous, data isn't corrupted and stack is safe.

There's another mistake of computer science in my opinion is inefficient array bounds checking and implicit integer overflow behavior. And those mistakes are more dangerous, they lead to data corruption and exploits.


Well the article doesn't talk about performance, but at least for the double-maybe in the store example, there is one level of extra indirection in the machine code: return pointer to maybe, then return pointer from string from it.

If you care about performance, you really need two predicate values to cover this case. You could have (void * )0 for no entry, and (void * )1 for entry exists, but is blank.


I've never had a problem with NULL as a "value". Null is the absence of content, a container that is truly empty, and that is frequently useful. As an example it allows you to differentiate between a numerical field that was never populated vs. one specifically set to Zero. This is often an important distinction.


> If x is known to be a String, the type check succeeds; if x is known to be a Socket, the type check fails.

So the author is trying to explain what the null pointer exception is, but uses the concept of sockets in his example. I wonder, how many people know what a socket is but don't know what null pointer exception is?


NULL is okay as long as you pretend it doesn't exist. I mean, in these languages, uninitialized variables exist at some point (fields start uninitialized in constructor bodies, etc), and that's why there's null, instead of defining them with some garbage value that has undefined behavior. But the right solution for users is to just pretend that it can't exist, and that uninitialized variables have a garbage value.

Of course, that's idealistic, most languages don't have some Option<T> type without a performance penalty, and pre-existing libraries exist. (Usually you should design around needing Option<T> too.)

The right language design decision for these languages (managed languages like Java) might have been to make uninitialized references have a garbage value that reliably throws an exception when used (i.e. null) -- you can copy the reference and pass it around, but you can't compare it for equality and any attempt to inspect its value results in an exception.


> The right language design decision for these languages (managed languages like Java) might have been to make uninitialized references have a garbage value that reliably throws an exception when used (i.e. null) -- you can copy the reference and pass it around, but you can't compare it for equality and any attempt to inspect its value results in an exception.

That's just a different kind of null. You still have the original problem: this thing is declared as a T, but sometimes it's not a T, so you have to inspect how the value is used before you can conclude whether it's a T or not.

The correct solution is to structure the language so access can't occur before initialization. For example, you could place severe constraints on constructors so they must initialize all the fields and do nothing else until that's done. Think initializer lists from C++ (but stricter), or tagged unions from Haskell.


That is not an option. Things like arrays need to be allocated with a default initialization and then filled.


You can absolutely work around that:

- Require an explicit initial value to be provided.

- Or require that a collection/iterator/generator of values be provided (and fail if it's not large enough).

- Or have a built-in ArrayBuilder<T> type that accumulates all the needed values before allowing you to retrieve the array.

- Or make your language dependently typed, and change the signature of Array.get to take both an index and a proof that said index has been initialized.


- Require an explicit initial value to be provided.

Let's say you want an array of FileHandles. Under this proposal, you'd need to create an "empty" FileHandle value that you can use to fill arrays with and such. You'd have to create some "empty" state for every type. This is worse software engineering than having the possibility of a null pointer exception. (Also, you couldn't write generic code to implement an ArrayList<T> efficiently without some way to generically default-construct the T to fill most of the array entries, or the constructor would have to take a filler value.)

- Or require that a collection/iterator/generator of values be provided (and fail if it's not large enough).

This is a needlessly complicated way to use an array, and it still has the same downsides.

- Or have a built-in ArrayBuilder<T> type that accumulates all the needed values before allowing you to retrieve the array.

And how do you implement your own ArrayBuilder? What if you need something with a slower growth rate, or some other data structure that you could build out of an array with uninitialized elements, such as a deque?

- Or make your language dependently typed, and change the signature of Array.get to take both an index and a proof that said index has been initialized.

A far worse alternative.

Late edit: Generally speaking I think you're completely missing the problem of null pointers. The problem is not that a class of error exists. It's that people try to use null values in surprising ways in their APIs. If you make the null value impractical as a sentinel, you remove the mismatch between what programmers expect about whether a value can be null.


(1) It's true that a sentinel value like NULL can cut cycles. That's low-level code though, and not high-level code. I'd suggest that most code is (should be high level).

(2) It is true that you may have uninitialized values. There are several ways to address it. Java will not allow you to use uninitialized local variables. C/C++ leaves the values of some uninitialized variables undefined. There are several possible approaches. Treating null as a normal thing is not a good approach, though.


You can pretend all you want, and sure if you force yourself to always use Option then many problems will go away. It still is a huge hole in your type checking and will still cause mistakes to occur.


It's not a huge hole and not such a big deal. Just don't design APIs to use null. I've done it, and I can tell, you, it works! (But to be fair, I haven't worked on database-backed software in half a decade.) When mistakes do occur they're the trivial kind where you get a stack trace and can work it out.

It's wrong to call null pointers a billion dollar mistake, too -- if we didn't have null (and in the absence of generics that permit Option, as things were long ago), programmers would end up using in-band sentinel values instead. That would be a much worse mistake.


Another reason that, even despite sharing a lot of syntax with PHP, Hack is one of my favorite languages:

http://docs.hhvm.com/manual/en/hack.nullable.php


As a mathematician I still think making "-1 % 7 == 6 % 7" evaluate to false is worse.


So you like Python.


For statically typed languages this is definitely an issue, but for dynamic languages less so. In Python, I wouldn't use an optional value,

  x is None
seems to be just fine. I'm still waiting for

  std::optional
for C++.


It's only not an issue in dynamic languages in the sense that you have the same issue anyway without null because you have no type safety at all.


You have a ton of type safety in Python, which is one reason why Python programmers are cranking out large bodies of working code representing all kinds of applications that Just Work.

"No type safety at all" means you can misuse a value of one type using an operation that is appropriate for a different type, and some nonsensical behavior silently occurs (or perhaps some nonportable behavior that you sneakily intended). This characterizes assembly languages, certain machine-oriented languages like BCPL, and some immature dynamically typed languages which omit run-time-type checks (the type information is there, but not checked, so that string_length happily tries to operate on an integer, so that a type problem occurs in the interpreter's kernel itself.)


Python has some safety checks that are missing in other languages, and this is a good thing. But it's misleading to call them "type safety", because they aren't connected to anything that can be meaningfully called a type system - all those checks happen at the value level (e.g. you can make a particular integer amenable to string_length, by monkeypatching the relevant methods onto it).


"Python: What You Gain In Prototyping Efficiency And Speed, You Lose In Having To Exhaustively Unit Test Every Line Of Executable Code, Because the Interpreter Cannot Tell You That You Typo'd A Function Name Until It Tries And Fails To Call It At Runtime(TM)" ;)


In dynamically typed languages, there are still problems with flat Null/Nil/None that are addressed by optional values (the biggest comes when you use multiple operations that can return null but the single null value loses the source of the null; using optional values these often are differentiated easily as being either an "outer" null -- e.g., None -- or an "inner" null -- Some(None).)

In fact, the example in the article of the Ruby K/V store illustrates this problem. If optional values were used instead of an ambiguous nil for both cases, there would be two distinct results:

* Some(nil) -- or something of similar shape -- where there is a value for the key, and it is nil,

* nil where there is no value.

Obviously, the typesafety issue doesn't exist for dynamic languages, but the composability/API quality issues do -- from the numbered issues in the article, I'd say at least #s 2, 3, 4, and 7 apply to dynamic languages in general.


But as the writer mentions, optionals are a lot like lists with either zero or one elements, so it seems like many of these issues could be dealt with by simply returning a list, which would have zero elements if (in that example) the key was missing, or one element is the key was present. (And that single element would be nil if that was the value for that key.)


Sure, lists make a decent "poor man's Maybe"; and many dynamic languages have facilities that make that a decent solution. In dynamic OO languages, there's some Maybe/Optional-specific operations you might want to have to more clearly express certain patterns which make it with having specific types.



Well, for now we're using something like

    template<class T>
    using std::optional = boost::optional<T>;


Assuming x is not a boolean, I think

    not x
would be sufficient in the case of Python.


in Python (even for non-Booleans)

  not x 
is not equivalent to "X is not None", since empty lists, zero, etc., are not truthy in a Boolean context.


Ok, yeah, you're right -- I forgot those other cases. Thanks for the clarification.


lol the null / undefined issue in javascript is further exacerbated by the fact there is no int type and everything is just a "number".... yet the number 0 is still treated the same as null/undefined by js special forms like "if" and "==". This is particularly hilarious because it leads to some null-check bugs like:

if ( user.getScrollPosition() ) { whatever(); } else { die(); }

99% of the time, the code would be fine, but if the user scrolls just right, the whole thing would die. Stuff like this is literally death to debug because this bug is effectively indeterministic and can't be reproduced.


Why would you check the value instead of the presence of the value? Rookie mistake.

user.getScrollPostion() != undefined

or in CoffeeScript just use:

user.getScrollPosition()?

Arrays have various types of ints and floats of various word lengths and signs.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Type...


> 4. NULL makes poor APIs

Then don't use it. The example given is a bad design. No phone number and not in cache should not return the same value. That bad design has nothing to do with nil, Return '' for no phonenumber.


When I look at my coworker's code, or an open source project's code, there is going to be usage of NULL. Your strategy is not pragmatic. The problem with language features is that people use them.


Reading this article is like listening to a republican arguing against taxes.


I'm still not entirely convinced that NULL is a problem. But how NULL (and pointer types in general) work in C leads to much, much greater problems.

Though, I think uninitialized variables or memory might be just as bad.


NUL-terminated strings aren't that bad:

* unlike Pascal-style strings, they can be usefully sliced, especially if you can modify them strtok-style.

* unlike (ptr,len) "Modern C buffers"/Rust-style strings, references to them are pointer-sized, and they can be used as a serialization format.

This makes the kind of application that is based on cutting pieces of a string and passing them around a good measure faster, especially compared to say C++'s "atomically reference-counted, re-allocating at the slightest touch" std::string.

This style of programming is not particularly popular nowadays, so buffer-strings are better-fitting. Its main problem is its multitude of edge-cases, which tend to demonstrate C's "every bug is exploitable" problem well.


> unlike Pascal-style strings, they can be usefully sliced, especially if you can modify them strtok-style.

Slicing Pascal-style strings is also easy and constant-time: just track the buffer, offset, and length of the slice of characters you want. Java used to do it implicitly whenever you called `substring`.

> unlike (ptr,len) "Modern C buffers"/Rust-style strings, references to them are pointer-sized, and they can be used as a serialization format.

Every C method that takes a character buffer either a) has a corresponding length parameter or b) is avoided because of the security risks. In practice this means that C also stores the length information, just on the side instead of combined into a struct with the buffer.


> Slicing Pascal-style strings is also easy and constant-time: just track the buffer, offset, and length of the slice of characters you want. Java used to do it implicitly whenever you called `substring`.

That's just coercing into a "modern C buffer" and slicing it. It has the disadvantage that coercion is not equality or subtyping - i.e. you will have to do lots of wrappings and unwrappings in mixed code.

> Every C method that takes a character buffer either a) has a corresponding length parameter or b) is avoided because of the security risks. In practice this means that C also stores the length information, just on the side instead of combined into a struct with the buffer.

You are surely talking about the buffer's capacity, not the string's length. These are distinct concepts. Anyway, functions that only read strings, and structs that only store them read-only, aren't interested in the capacity of any buffer.

Anyway, C strings aren't responsible for the fixed-size buffers of Cold War-era code - that code uses fixed-size buffers for everything. Their main claim to fame is their popularity in parsing code, which is edge-case- and bug-prone.


std::string isn't reference-counted in a conforming implementation (that doesn't do atomic ops just for fun).


well C++11 strings are just "reallocate when you look at them funny". Or you use shared_ptr and are back to square 1.


I'm not a c++ expert, but doesn't

  char *myChar = 0;
  std::cout << *myChar << std::endl; // runtime error
compile because it's undefined behavior?


the point is that it's a straightforward-looking piece of code that has really strange behavior.


In case the author is reading this thread:

The entry for C++ in the final tables is:

    C++  | NULL  | boost::optional, from Boost.Optional 
It does ignore that C++ has nullptr since C++11


> if (str == null || str == "")

Why not have null (and emptystring) eval to false? And

    if not str:
       str = 'wordup'


The best phrase from the article: "confusing sources of sloppy nullish behavior."


[deleted]


You don't need multiple return for this - that's exactly what Maybe is for. (You should have multiple return via tuples, but for other reasons.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: