• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Duplicate file finder?

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Silver_Pharaoh

Likes the big ones n00b Member
Joined
Sep 7, 2013
So my OS was corrupted beyond repair last night after benching my memory. (This is why benchers have a separate OS unlike me :p )

So I re-imaged my rig and all is good. But to keep my recent files I downloaded, I copied my user profile folder from my backup and copied over my old profile folder.
Now I'm extremely limited on disk space on my backup drive so I'd like to know if there is a program that can identify duplicates of files with different names.

This is because I make copies of files and windows names them like this: File1.zip --> Copy --> File1(1).zip
But they both are the same size, so there has to be some program that can find files of the same size right?


I don't think the duplicate files finder in Glary Utilities will work because it searches via filename not size...
Maybe you guys have a favorite tool you use?
 
Sorting the path by file size will show you that...click on the column for file size to sort it that way.
 
Sorting the path by file size will show you that...click on the column for file size to sort it that way.

True, but that's an awful lot of work to comb through my entire drive, let alone my profile folder :p
That's why I'm hoping there's a program for that.
 
It seems like any time I set out to accomplish something myself, within just a couple days there's a relevant post here about it.

Just yesterday I began working on a powershell script to do exactly this. I have several copies of my pictures and music scattered across a few different harddrives, copies of copies, etc. My initial thought was to check file name and bytesize, but it's possible that files have the same name (For example DSCXXXX.JPG on a new camera or SD card), and it's possible (though perhaps unlikely) that binary files have identical bytesizes. Instead, I plan on using a new feature in Powershell 4 to calculate the checksum (Get-Filehash) of a file to verify that it is in fact a duplicate (only if file name or bytesize are identical). It's probably an extra unneeded step in my case, but oh well.

I didn't have much time to work on it yesterday, but when I finish it I'll post it if you want to modify it to fit your needs. I'm sure there's already software out there for this, but I generally regard that sort of thing as malware and just write it myself.
 
It seems like any time I set out to accomplish something myself, within just a couple days there's a relevant post here about it.

Just yesterday I began working on a powershell script to do exactly this. I have several copies of my pictures and music scattered across a few different harddrives, copies of copies, etc. My initial thought was to check file name and bytesize, but it's possible that files have the same name (For example DSCXXXX.JPG on a new camera or SD card), and it's possible (though perhaps unlikely) that binary files have identical bytesizes. Instead, I plan on using a new feature in Powershell 4 to calculate the checksum (Get-Filehash) of a file to verify that it is in fact a duplicate (only if file name or bytesize are identical). It's probably an extra unneeded step in my case, but oh well.

I didn't have much time to work on it yesterday, but when I finish it I'll post it if you want to modify it to fit your needs. I'm sure there's already software out there for this, but I generally regard that sort of thing as malware and just write it myself.

That sounds great!

MY only issue is the powershell 4 requirement. Win 7 only goes up to Powershell 3.0 right?
Or did M$ release a 4.0 update for Windows 7?

@mimart7
Thanks for the program, I'm letting it scan over my drives right now.
 
I'm pretty sure it's available for W7 as well. But even so, you can still get checksums in PS 2 or 3, it's just a few more lines of code rather than the Get-FileHash command.
 
Here's what I came up with:

Code:
Add-Type -Language CSharp @"
public class item{
    public string Fullname;
    public string Hash;
}
"@;

$ITEMS = New-Object System.Collections.ArrayList
$DUPES = New-Object System.Collections.ArrayList
#$BASEDIR = "D:\Music\"
$BASEDIR = "C:\users\justin\documents"

$START = [DateTime]::Now

echo "Building file list..."
Get-ChildItem -Path $BASEDIR -Recurse | Where-Object { !$_.PSIsContainer } | Select-Object FullName | 
% {
    $temp = new-object item
    $temp.FullName = $_.FullName.toString()
    $temp.Hash = (Get-fileHash($_.fullName)).Hash
    $ITEMS.Add($temp) > $null
  }
echo " " > dupeshash_music.txt
$count = $ITEMS.count
$inc = [int]($count/80)
$step=0
echo "Looking for duplicates in $($count) files..."
Write-Host "|=" -NoNewline
foreach ($item in $ITEMS)
{
    if($step -eq $inc)
    {
        Write-Host "=" -NoNewline
        $step = 0 
    }
    $step+=1
   
    if($DUPES.Contains($item.FullName))
    {
        continue
    }
    else
    {
        for ($i=0;$i -lt $ITEMS.Count; $i++)
        {
            if( ($item.hash -eq $ITEMS[$i].hash) -and ($item.FullName -ne $ITEMS[$i].Fullname))
            {
                $DUPES.Add(($ITEMS[$i]).FullName) > $null
                echo "Original: $($item.FullName)`tDupe: $($ITEMS[$i].FullName)" >> dupeshash_music.txt
            }
        }
    }

}
Write-Host "|"
#$DUPES | % { "$_ is a duplicate file" }
$STOP = [DateTime]::Now
$E = ($STOP - $START)
echo "Found $($DUPES.count) dupes in $E"

Sample console output
Code:
PS C:\Users\Justin\Desktop\dedupe> C:\Users\Justin\Desktop\dedupe\dedupe2.ps1
Building file list...
Looking for duplicates in 1063 files...
|==================================================================================|
Found 33 dupes in 00:00:11.5479961

I have it dumping the contents out to a file so I can review it before removing. It runs surprisingly quickly, I deduped a folder that was 21gb and had 24k files down to 14gb and 18k files, it took 18 minutes to run. It's not polished, but it works for my purposes. I'm about to make a pass over my 146gb music collection, I anticipate it'll drop significantly after deduping.
 
Last edited:
I'll be honest, I have no idea what that script does. Looks confusing in my eyes o_O
Scripting/programming is not my strength at all. :p

Thank-you for posting it! I think can modify it to my directories and probably get the user to enter a DIR instead of specifying one for them.
At least, I think I can program that in :shrug:
 
Thanks Janus!

I ran the auslogic one yesterday, and it found a few GB of dupes lying around. So :thup: to that program.
Now I'm going to run the one Janus linked to to see how it looks and works.
 
Back