DupScout Logo
Flexense Data Management Software

Duplicate Files Search Performance Options

DupScout is optimized for modern multi-core and multi-CPU systems and is capable of searching duplicate files stored on multiple disks, directories or network shares in parallel using all CPUs installed in the computer. DupScout provides a number of different performance optimization options allowing one to tune the duplicate files search operations for user-specific hardware and storage configurations.

Duplicate Files Search Performance Options

In order to customize the duplicate files search performance optimization options, open the duplicate files search operation dialog, press the 'Options' button and select the 'Advanced' tab. The 'Dup Files Search Threads' option controls how many parallel threads are used to search duplicate files. The 'Directories Scanning Threads' option controls how many parallel threads are used to scan input disks, directories and network shares. In the 'Fault-Tolerant' directory scanning mode, DupScout uses an individual processing thread for each input disk, directory or network share, but limits the maximum number of parallel scanning threads to the specified value. In the high-performance directory scanning mode, DupScout always uses the specified number of parallel directory scanning threads even when processing a single input disk, directory or network share.

NVMe SSD Disks Duplicate Files Search Performance

For example, when searching duplicate files stored on a high-speed NVMe SSD disk, DupScout reaches up to 3,000 files/sec using a single search thread. With two parallel search threads, the performance scales up to 6,000 files/sec and with four parallel search threads, the performance increases up to 11,000 files/sec showing a very good level of multi-threaded performance scalability. With six processing threads the duplicate files search performance reaches up to 14,000 files/sec and with eight processing threads the performance increases up to 16,000 files/sec allowing one to quickly process large numbers of files and identify how many files are duplicates and how much duplicate disk space these files are using.

When searching duplicate files stored on regular SATA SSD drives, which are significantly slower than NVMe SSD drives, the performance of the duplicate files search process reaches up to 2,000 files/sec using a single process thread and scales up to 4,400 files/sec with four parallel duplicate files search threads. With eight parallel threads, the performance reaches up to 5,900 files/sec, which allows to process large numbers of files relatively fast.

SATA SSD Disks Duplicate Files Search Performance

Searching duplicate files stored on a NAS storage device via a network is a more complicated task because the user needs to take into account the speed and the latency of the network. If the computer, on which DupScout is installed, is connected to the NAS storage device via a high-speed, low-latency network, the performance of the duplicate files search operations may reach up to 200 files/sec with one duplicate files search thread, scale up to 684 files/sec with four parallel search threads and increase up to 1,057 files/sec with eight parallel duplicate files search threads.

NAS Server Duplicate Files Search Performance

On the other hand, if DupScout will need to access network shares via the Internet or via a long-distance, high-latency network, the performance of the duplicate files search operations will be relatively slow. One of the options to increase the performance of the duplicate files search operations in such configurations is to set the 'High-Performance' directory scanning mode and increase the number of parallel duplicate files search threads to 16 or even 32 disregarding how many CPUs are actually installed on the computer.

Searching duplicate files stored in one or more NAS servers may be a very time consuming operation and one of the ways to speed-up the duplicate files search process is to use a 2.5 Gigabit Ethernet network. With 2.5 Gigabit Ethernet the performance of the DupScout duplicate files search operations continues to scale up to 3,800 Files/Sec with 8 parallel duplicate files search threads, which represents a 69% improvement compared to the standard Gigabit Ethernet.

2.5 Gigabit NAS Server Duplicate Files Search Performance

Due to a very wide adoption of laptops and NAS servers with built-in WiFi network interfaces, many users may consider searching duplicate files stored in NAS servers via the wireless network. But, the latency of the wireless network is much higher and therefore it will take much more time to complete the duplicate files search operation via the wireless network. The question is how much longer the user will need to wait and if it will save any significant amount of time to search duplicate files via a wired network.

WiFi NAS Server Duplicate Files Search Performance

Based on our benchmarks, via a 5 GHz wireless network, DupScout reaches up to 54 Files/Sec with a single duplicate files search thread and scales up to 400 Files/Sec with 8 parallel duplicate files search threads, which is approximately 6 times slower compared to the standard Gigabit Ethernet and approximately 10 times slower when compared to the 2.5 Gigabit Ethernet. So, if the user needs to search duplicate files in a NAS server with 100,000 files or more, a low-latency Gigabit Ethernet or 2.5 Gigabit Ethernet is required.

Modern USB flash drives provide plenty of the storage space and are reasonably fast allowing one to store vast amounts of data for backup purposes. Sometimes, it may be required to search duplicate files on a USB flash drive in order to free the used disk space. When searching duplicate files stored on a USB flash drive, DupScout can reach up to 357 files/sec with a single search thread. With two parallel search threads, the performance increases up to 644 files/sec, with four parallel threads the performance increases up to 1,008 files/sec and with eight parallel duplicate files search threads the performance scales up to 1,174 files/sec.

USB Flash Drive Duplicate Files Search Performance

Today, modern IT environments widely deploy virtual servers and/or virtual workstations. Most of the popular virtualization platforms provide a high level of performance, but depending on the target hardware and software platforms, significant performance degradations are inevitable when a duplicate files search operation is executed on a guest virtual machine compared to the same duplicate files search operation executed directly on the host computer.

Virtual Machine Duplicate Files Search Performance

For example, when a virtual machine with 4 virtual CPUs is stored on an SSD disk and searching duplicate files stored on a virtual local disk drive, which is physically stored on the same SSD disk, the performance of the duplicate files search operations reaches up to 223 files/sec using a single search thread. With two parallel search threads, the performance of the duplicate files search operations scales up to 443 files/sec and with four parallel search threads, the performance of the duplicate files search operations increases up to 770 files/sec.