Wait. I love standards and I do believe the world is far better with them. Just sayin' that no standard is perfect and motivating something that looks like a bug with “that's standard behavior” is pretty lame.
The (SQL) problem
While investigating a database issue today I ran the following queries:
SELECT id, "user", event FROM payment_log WHERE id = 3210; id | user | event ------+-------+------- 3210 | 11545 | 51613 (1 row)
Okay, so it exists.
SELECT * FROM participant WHERE payment_log = 3210; (0 rows)
So it's not referenced in "participant".
SELECT id, "user", event FROM payment_log WHERE id NOT IN (SELECT payment_log FROM participant); (0 rows)
After fiddling with it perplexed for like 20 minutes, I realized that I had NULL values in "participant.payment_log" and I remembered I've hit the issue before (about an year ago maybe; for some reason it didn't stick to mind, but I hope this time it will). So I had to write the last query like this, in order to fix it:
SELECT id, "user", event FROM payment_log WHERE id NOT IN (SELECT payment_log FROM participant WHERE payment_log IS NOT NULL);
I didn't care to investigate this last year, but today I was a bit intrigued. Same issue, re-occurring, because I do something the intuitive way, but for some reason it doesn't work. It's true that I have NULL values in "participant.payment_log", but it's also damn obvious that the value 3210 is NOT IN there. Which part of NOT IN wasn't clear in the first query?
I was determined today to join the PostgreSQL mailing list and report the problem, but I thought I'd google it first. Sadly, I found an explanation. I understand now that it's “standard behavior”, that most (all?) databases behave similarly and that there's a sound, beautiful technical reason behind it—my only problem is that, as far as I am concerned, it's a blatant bug.
We couldn't live without them, but we must understand and accept that standards are generally flawed. That's the only way to move forward. Current standards were driven by implementations—you know, people who spent years getting some stuff done, who maybe didn't have time to establish families or to shave their beards very often. Common Lisp was a tremendous effort to standardize among a few different Lisp implementations that existed prior the standard. There is no way you can standardize something before it exists.
For this reason, standards should be revised (a lot more often than every decade) and that's the reason why Common Lisp seems obsolete these days—it wasn't revised in about 20 years, and it's NOT perfect. However, it is so good that it's still relevant after 20 years of technical advances.
This little SQL problem (which is “standard behavior”) could proliferate into millions of bugs in client applications. I don't care about the technical reasons behind it, it's just too unintuitive, seems plain wrong and should be addressed. And in order for the standards to change, with their elephantine committees and political slowness, implementations should lead the way.
Wait, one more example: if Douglas Crockford didn't invent JSON, we would all be using XML today (and SOAP! the horror...). When did JSON become a standard? After he wrote a spec, and implemented parsers in a few languages, and a lot of people started to use it.
It would be okay if the story finished here, but I think the sad conclusion is that we, developers, suck. Since, as I said, I believe that it's developers who push standards forward, then the reason why standards suck is that developers suck. Now of course, there's not much I could personally do to address this specific SQL issue; there are only a few solid relational databases, and PostgreSQL is one of them. If at least one would care to break the standard and fix this stupidity, it's likely that in the next 10 years someone will consider fixing the standard. But developers seem to believe that the standard is authoritative and should not be broken, forgetting the fact that it's them who should drive the standard. And that's pretty sad.