Duplicate file finder?

Silver_Pharaoh · Sep 11, 2014

So my OS was corrupted beyond repair last night after benching my memory. (This is why benchers have a separate OS unlike me

)

So I re-imaged my rig and all is good. But to keep my recent files I downloaded, I copied my user profile folder from my backup and copied over my old profile folder.
Now I'm extremely limited on disk space on my backup drive so I'd like to know if there is a program that can identify duplicates of files with different names.

This is because I make copies of files and windows names them like this: File1.zip --> Copy --> File1(1).zip
But they both are the same size, so there has to be some program that can find files of the same size right?

I don't think the duplicate files finder in Glary Utilities will work because it searches via filename not size...
Maybe you guys have a favorite tool you use?

EarthDog · Sep 11, 2014

Sorting the path by file size will show you that...click on the column for file size to sort it that way.

Silver_Pharaoh · Sep 11, 2014

EarthDog said:
Sorting the path by file size will show you that...click on the column for file size to sort it that way.

True, but that's an awful lot of work to comb through my entire drive, let alone my profile folder

That's why I'm hoping there's a program for that.

mimart7 · Sep 11, 2014

Haven't used this myself, but it is worth checking out. http://www.auslogics.com/en/software/duplicate-file-finder/

pcgamer4life · Sep 11, 2014

It seems like any time I set out to accomplish something myself, within just a couple days there's a relevant post here about it.

Just yesterday I began working on a powershell script to do exactly this. I have several copies of my pictures and music scattered across a few different harddrives, copies of copies, etc. My initial thought was to check file name and bytesize, but it's possible that files have the same name (For example DSCXXXX.JPG on a new camera or SD card), and it's possible (though perhaps unlikely) that binary files have identical bytesizes. Instead, I plan on using a new feature in Powershell 4 to calculate the checksum (Get-Filehash) of a file to verify that it is in fact a duplicate (only if file name or bytesize are identical). It's probably an extra unneeded step in my case, but oh well.

I didn't have much time to work on it yesterday, but when I finish it I'll post it if you want to modify it to fit your needs. I'm sure there's already software out there for this, but I generally regard that sort of thing as malware and just write it myself.

Silver_Pharaoh · Sep 11, 2014

pcgamer4life said:
It seems like any time I set out to accomplish something myself, within just a couple days there's a relevant post here about it.

Just yesterday I began working on a powershell script to do exactly this. I have several copies of my pictures and music scattered across a few different harddrives, copies of copies, etc. My initial thought was to check file name and bytesize, but it's possible that files have the same name (For example DSCXXXX.JPG on a new camera or SD card), and it's possible (though perhaps unlikely) that binary files have identical bytesizes. Instead, I plan on using a new feature in Powershell 4 to calculate the checksum (Get-Filehash) of a file to verify that it is in fact a duplicate (only if file name or bytesize are identical). It's probably an extra unneeded step in my case, but oh well.

I didn't have much time to work on it yesterday, but when I finish it I'll post it if you want to modify it to fit your needs. I'm sure there's already software out there for this, but I generally regard that sort of thing as malware and just write it myself.

That sounds great!

MY only issue is the powershell 4 requirement. Win 7 only goes up to Powershell 3.0 right?
Or did M$ release a 4.0 update for Windows 7?

@mimart7
Thanks for the program, I'm letting it scan over my drives right now.

pcgamer4life · Sep 11, 2014

I'm pretty sure it's available for W7 as well. But even so, you can still get checksums in PS 2 or 3, it's just a few more lines of code rather than the Get-FileHash command.

Mpegger · Sep 11, 2014

mimart7 said:
Haven't used this myself, but it is worth checking out. http://www.auslogics.com/en/software/duplicate-file-finder/

I have used this, and it works great. Handles thousands of files easily, and has options on what kind of attributes to compare before it decides and presents it to you as a duplicate.

pcgamer4life · Sep 11, 2014

Here's what I came up with:

Code:

Add-Type -Language CSharp @"
public class item{
    public string Fullname;
    public string Hash;
}
"@;

$ITEMS = New-Object System.Collections.ArrayList
$DUPES = New-Object System.Collections.ArrayList
#$BASEDIR = "D:\Music\"
$BASEDIR = "C:\users\justin\documents"

$START = [DateTime]::Now

echo "Building file list..."
Get-ChildItem -Path $BASEDIR -Recurse | Where-Object { !$_.PSIsContainer } | Select-Object FullName | 
% {
    $temp = new-object item
    $temp.FullName = $_.FullName.toString()
    $temp.Hash = (Get-fileHash($_.fullName)).Hash
    $ITEMS.Add($temp) > $null
  }
echo " " > dupeshash_music.txt
$count = $ITEMS.count
$inc = [int]($count/80)
$step=0
echo "Looking for duplicates in $($count) files..."
Write-Host "|=" -NoNewline
foreach ($item in $ITEMS)
{
    if($step -eq $inc)
    {
        Write-Host "=" -NoNewline
        $step = 0 
    }
    $step+=1
   
    if($DUPES.Contains($item.FullName))
    {
        continue
    }
    else
    {
        for ($i=0;$i -lt $ITEMS.Count; $i++)
        {
            if( ($item.hash -eq $ITEMS[$i].hash) -and ($item.FullName -ne $ITEMS[$i].Fullname))
            {
                $DUPES.Add(($ITEMS[$i]).FullName) > $null
                echo "Original: $($item.FullName)`tDupe: $($ITEMS[$i].FullName)" >> dupeshash_music.txt
            }
        }
    }

}
Write-Host "|"
#$DUPES | % { "$_ is a duplicate file" }
$STOP = [DateTime]::Now
$E = ($STOP - $START)
echo "Found $($DUPES.count) dupes in $E"

Sample console output

Code:

PS C:\Users\Justin\Desktop\dedupe> C:\Users\Justin\Desktop\dedupe\dedupe2.ps1
Building file list...
Looking for duplicates in 1063 files...
|==================================================================================|
Found 33 dupes in 00:00:11.5479961

I have it dumping the contents out to a file so I can review it before removing. It runs surprisingly quickly, I deduped a folder that was 21gb and had 24k files down to 14gb and 18k files, it took 18 minutes to run. It's not polished, but it works for my purposes. I'm about to make a pass over my 146gb music collection, I anticipate it'll drop significantly after deduping.

Silver_Pharaoh · Sep 11, 2014

I'll be honest, I have no idea what that script does. Looks confusing in my eyes

Scripting/programming is not my strength at all.

Thank-you for posting it! I think can modify it to my directories and probably get the user to enter a DIR instead of specifying one for them.
At least, I think I can program that in :shrug:

Janus67 · Sep 12, 2014

http://sourceforge.net/projects/yadfr/

Silver_Pharaoh · Sep 12, 2014

Thanks Janus!

I ran the auslogic one yesterday, and it found a few GB of dupes lying around. So :thup:

to that program.
Now I'm going to run the one Janus linked to to see how it looks and works.

Duplicate file finder?

Silver_Pharaoh

Likes the big ones n00b Member

EarthDog

Gulper Nozzle Co-Owner

Silver_Pharaoh

Likes the big ones n00b Member

mimart7

Member

pcgamer4life

Member

Silver_Pharaoh

Likes the big ones n00b Member

pcgamer4life

Member

Mpegger

Member

pcgamer4life

Member

Silver_Pharaoh

Likes the big ones n00b Member

Janus67

Benching Team Leader

Silver_Pharaoh

Likes the big ones n00b Member

Similar threads