That is actually my point. Remember the goal is to nub an array into a set.
Perfect hash tables where there are no collisions are O(1) but they're not suitable for casual use, you need to guarantee your data set never collides hash buckets. That is why most implementations of "maps" in programming languages map to very wide trees after a certain size. A common example is HAMTs (Bagwell is famois for these).
They're flexible and perform well on modern hardware for arbitrary key lookiup.
If your constant time hash tables is taken to the worst case (as we do in O analysis) then that operation would collide with the same bucket and require a potentially full linear search of the data every time to see if it places into your oversubscribed bucket. That's O(n^2).
You could make it better by sorting on insert. That's O(n log n) because it's comparison based.
I've provided a much better solution and links to code that does it. Maps and hash tables don't solve this problem any better than just sorting the array.
It's worth noting that if we're ignoring preprocessing and the domain is well-constrained, we could probably make O(1) to do a bunch of these ops in bulk if we really wanna go off the deep end. But that's cheating because we'd be ignoring construction costs, as you are here.
Perfect hash tables where there are no collisions are O(1) but they're not suitable for casual use, you need to guarantee your data set never collides hash buckets. That is why most implementations of "maps" in programming languages map to very wide trees after a certain size. A common example is HAMTs (Bagwell is famois for these).
They're flexible and perform well on modern hardware for arbitrary key lookiup.
If your constant time hash tables is taken to the worst case (as we do in O analysis) then that operation would collide with the same bucket and require a potentially full linear search of the data every time to see if it places into your oversubscribed bucket. That's O(n^2).
You could make it better by sorting on insert. That's O(n log n) because it's comparison based.
I've provided a much better solution and links to code that does it. Maps and hash tables don't solve this problem any better than just sorting the array.
It's worth noting that if we're ignoring preprocessing and the domain is well-constrained, we could probably make O(1) to do a bunch of these ops in bulk if we really wanna go off the deep end. But that's cheating because we'd be ignoring construction costs, as you are here.