Big-O Misconceptions

In computer science and sometimes mathematics, big-O notation is used to talk about how quickly a function grows while disregarding multiplicative and additive constants. When classifying algorithms, big-O notation is useful because it lets us abstract away the differences between real computers as just multiplicative and additive constants.

Big-O is not a difficult concept at all, but it seems to be common even for people who should know better to misunderstand some aspects of it. The following is a list of misconceptions that I have seen in the wild.

But first a definition: We write

$f(n) = O(g(n))$

when $f(n) \le M g(n)$ for sufficiently large $n$, for some positive constant $M$.

Misconception 1: “The Equals Sign Means Equality”

The equals sign in

$f(n) = O(g(n))$

is a widespread travestry. If you take it at face value, you can deduce that since $5 n$ and $3 n$ are both equal to $O(n)$, then $3 n$ must be equal to $5 n$ and so $3 = 5$.

The expression $f(n) = O(g(n))$ doesn’t type check. The left-hand-side is a function, the right-hand-side is a … what, exactly? There is no help to be found in the definition. It just says “we write” without concerning itself with the fact that what “we write” is total nonsense.

The way to interpret the right-hand side is as a set of functions:

$ O(f) = \{ g \mid g(n) \le M f(n) \text{ for some \(M > 0\) for large \(n\)}\}. $

With this definition, the world makes sense again: If $f(n) = 3 n$ and $g(n) = 5 n$, then $f \in O(n)$ and $g \in O(n)$, but there is no equality involved so we can’t make bogus deductions like $3=5$. We can however make the correct observation that $O(n) \subseteq O(n \log n)\subseteq O(n^2) \subseteq O(n^3)$, something that would be difficult to express with the equals sign.

Misconception 2: “Informally, Big-O Means ‘Approximately Equal’"

If an algorithm takes $5 n^2$ seconds to complete, that algorithm is $O(n^2)$ because for the constant $M=7$ and sufficiently large $n$, $5 n^2 \le 7 n^2$. But an algorithm that runs in constant time, say 3 seconds, is also $O(n^2)$ because for sufficiently large $n$, $3 \le n^2$.

So informally, big-O means approximately less than or equal, not approximately equal.

If someone says “Topological Sort, like other sorting algorithms, is $O(n \log n)$", then that is technically correct, but severely misleading, because Toplogical Sort is also $O(n)$ which is a subset of $O(n \log n)$. Chances are whoever said it meant something false.

If someone says “In the worst case, any comparison based sorting algorithm must make $O(n \log n)$ comparisons” that is not a correct statement. Translated into English it becomes:

“In the worst case, any comparison based sorting algorithm must make fewer than or equal to $M n \log (n)$ comparisons”

which is not true: You can easily come up with a comparison based sorting algorithm that makes more comparisons in the worst case.

To be precise about these things we have other types of notation at our disposal. Informally:

$O()$:Less than or equal, disregarding constants
$\Omega()$:Greater than or equal, disregarding constants
$o()$:Stricly less than, disregarding constants
$\Theta()$:Equal to, disregarding constants

and some more. The correct statement about lower bounds is this: “In the worst case, any comparison based sorting algorithm must make $\Omega(n \log n)$ comparisons”. In English that becomes:

“In the worst case, any comparison based sorting algorithm must make at least $M n \log (n)$ comparisons”

which is true. And a correct, non-misleading statement about Topological Sort is that it is $\Theta(n)$, because it has a lower bound of $\Omega(n)$ and an upper bound of $O(n)$.

Misconception 3: “Big-O is a Statement About Time”

Big-O is used for making statements about functions. The functions can measure time or space or cache misses or rabbits on an island or anything or nothing. Big-O notation doesn’t care.

In fact, when used for algorithms, big-O is almost never about time. It is about primitive operations.

When someone says that the time complexity of MergeSort is $O(n \log n)$, they usually mean that the number of comparisons that MergeSort makes is $O(n \log n)$. That in itself doesn’t tell us what the time complexity of any particular MergeSort might be because that would depend how much time it takes to make a comparison. In other words, the $O(n \log n)$ refers to comparisons as the primitive operation.

The important point here is that when big-O is applied to algorithms, there is always an underlying model of computation. The claim that the time complexity of MergeSort is $O(n \log n)$, is implicitly referencing an model of computation where a comparison takes constant time and everything else is free.

Which is fine as far as it goes. It lets us compare MergeSort to other comparison based sorts, such as QuickSort or ShellSort or BubbleSort, and in many real situations, comparing two sort keys really does take constant time.

However, it doesn’t allow us to compare MergeSort to RadixSort because RadixSort is not comparison based. It simply doesn’t ever make a comparison between two keys, so its time complexity in the comparison model is 0. The statement that RadixSort is $O(n)$ implicitly references a model in which the keys can be lexicographically picked apart in constant time. Which is also fine, because in many real situations, you actually can do that.

To compare RadixSort to MergeSort, we must first define a shared model of computation. If we are sorting strings that are $k$ bytes long, we might take “read a byte” as a primitive operation that takes constant time with everything else being free.

In this model, MergeSort makes $O(n \log n)$ string comparisons each of which makes $O(k)$ byte comparisons, so the time complexity is $O(k\cdot n \log n)$. One common implementation of RadixSort will make $k$ passes over the $n$ strings with each pass reading one byte, and so has time complexity $O(n k)$.

Misconception 4: Big-O Is About Worst Case

Big-O is often used to make statements about functions that measure the worst case behavior of an algorithm, but big-O notation doesn’t imply anything of the sort.

If someone is talking about the randomized QuickSort and says that it is $O(n \log n)$, they presumably mean that its expected running time is $O(n \log n)$. If they say that QuickSort is $O(n^2)$ they are probably talking about its worst case complexity. Both statements can be considered true depending on what type of running time the functions involved are measuring.