A better implementation could be to perform a full hash (not CRC32 though; maybe even a byte-by-byte comparison) when the fuzzy hashes match, which is a small probability anyway.
As the author of the tool, thanks a lot for wonderful inputs! Many comments are actionable. I'll incorporate them in code soon.
Now to address a few concerns:
# The tool doesn't delete anything -- As the name suggests, it just finds duplicates. Check it out.
# File uniqueness is determined by file extension + file size + CRC32 of first 4KiB, middle 2KiB and last 2KiB
# Above seems not much. But, on my portable hard drive with >172K files (mix of video, audio, pics and source code), I got the same number of collisions as that of "SHA-256 of entire file" (By the way, I'm planning to add an option in the tool to do this)
This sounds interesting and should probably be able to run on whole system. What if you run in the files of the OS itself? e.g. Whole C drive or where Linux system files are. Will there be any collisions?