Anti Twin's technical details

Anti-Twin ... works

Technical Details

Users often ask me ...
„Why is Anti-Twin so fast?“
or „Why is Anti-Twin so slow?“

The users' varying sensation of speed heavily depends on the type of usage and in particular of the number of files Anti-Twin has to compare.

Anti-Twin is particularly fast when comparing names and 100% duplicates. However, similarity comparison (95% or less) and image comparison (pixel-based) are considerably slower.

In addition, the number of files is a special multiplier - for just 100 files, Anti-Twin has to run through 5,000 file comparisons. The first file is compared with the remaining 99 files, the second one with the subsequent 98 files, and so on.
For 1,000 files, Anti-Twin has to perform half a million comparisons, and for 10,000 files the program must even examine 50 million files. If you want to search an entire hard disk with e.g. 100,000 files, the number of file comparisons comes up to 1.2 billion!

Since loading and inspecting file content is not necessary for the comparison of names, this is the easiest and quickest function. However, file names may not be very meaningful, because e.g. a text file and a video file might have the same file name without having a similar content.

Another quite easy task for Anti-Twin is the 100% content comparison. For this type of comparison, Anti-Twin uses a trick: firstly, the program only has to compare files of the same length and secondly, the content can be abstracted with checksums. On the basis of this checksum, Anti-Twin is able to detect "in passing" if the content could be similar at all. An individual byte-by-byte comparison is only effected if required to make absolutely sure if the files are duplicates.

Since unfortunately it is not possible to use checksums when effecting a similarity comparison, the entire content must be compared with every other content. This also means that every single file must be loaded anew again and again - a real effort for Anti-Twin and for the hard disk. For this reason, you should not carry out a similarity comparison with many thousand files - or if you do, please remember that this will take a long time.

Comparing images on the basis of pixels is a very special method. When the search process starts, all images are loaded and this takes a lot of time especially for large pictures - just like it can also take several seconds to load e.g. just one large picture in a graphics program. When all images are loaded, the comparison continues relatively quickly. Since Anti-Twin works with miniature depictions (thumbnails), the program may not detect small image details, but this is absolutely sufficient for a rough comparison.