欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

fdupe 查找重复文件的Perl脚本代码

程序员文章站 2022-04-10 22:33:27
图: 复制代码 代码如下:#!/usr/bin/perl## fdupe tool - finding duplicate files## $id: fdupe,v 1....
图:

fdupe 查找重复文件的Perl脚本代码

复制代码 代码如下:

#!/usr/bin/perl
#
# fdupe tool - finding duplicate files
#
# $id: fdupe,v 1.7 2011/10/14 20:11:21 root exp root $
#
# source code copyright (c) 1998,2011 bernhard schneider.
# may be used only for non-commercial purposes with
# appropriate acknowledgement of copyright.
#
# file :        fdupe
# description : script finds duplicate files.
# author:       bernhard schneider <bernhard@neaptide.org>
# hints, crrections & ideas are welcome
#
# usage: fdupe.pl <path> <path> ...
#        find / -xdev | fdupe.pl
#
# how to select and remove duplicates:
#   redirect output to >file, edit the file and mark lines you
#   wish to move/delete with a preceding dash (-)
#   use following script to delete marked files:
#   #!/usr/bin/perl -n
#   chomp; unlink if s/^-//;
#
# history:
# 12.05.99 - goto statment replaced with next
# 14.05.99 - minor changes
# 18.05.99 - removed confusing 'for $y'
#            included hash-search
# 20.05.99 - minor changes
# 02.03.00 - some functions rewritten, optimized for speed
# 10.01.01 - hint-fix by ozzie |ozric at kyuzz.org|
# 05.03.02 - fixed hangups by reading block/char-devices
# 08.09.11 - skips checking of hard links
# 14.10.11 - accept file names from stdin
#
#use strict; # uncomment for debugging

$|=1;
local (*f1,*f2); my %farray = (); my $statf1;

# ------------------------------
# traverse directories
sub scan ($) {
    my ($dir) = $_[0];
    opendir (dir, $dir) or die "($dir) $!:$@";
    map {
          (-d) ? scan ($_) : push @{$farray{-s $_}},$_
             unless (-l or -s  or -p or -c or -b);
    } map "$dir/$_", grep !/^\.\.?$/, readdir (dir); closedir (dir);
}

# ------------------------------
# get chunk of bytes from a file
sub getchunk ($$) {
  my ($fsize,$pfname) = @_;
  my $chunksize = 32;
  my ($nread,$buff);

  return undef unless open(f1,$$pfname);

  $statf1 = [(stat  f1)[3,1]];
  binmode f1;
  $nread = read (f1,$buff,$chunksize);
  ($nread == $chunksize || $nread == $fsize) ? "$buff" : undef;

# ------------------------------
# compare two files
sub mycmp ($) {
  my ($fptr) = $_[0];
  my ($buffa, $buffb);
  my ($nread1,$nread2);
  my $statf2;
  my ($buffsize) = 16*1024;

  return -1 unless (open(f2,"<$$fptr"));

  $statf2 = [(stat  f2)[3,1]];

  return 0
   if ($statf2->[0] > 1 && $statf1->[1] == $statf2->[1]);

  binmode f2;
  seek (f1,0,0);

  do {  $nread1 = read (f1,$buffa,$buffsize);
     $nread2 = read (f2,$buffb,$buffsize);

     if (($nread1 != $nread2) || ($buffa cmp $buffb)) {
         return -1;
        }
  } while ($nread1);

  return 0;
}

# ------------------------------

print "collecting files and sizes ...\n";

if (-t stdin) {
 $argv[0] = '.' unless $argv[0]; # use wd if no arguments given
 map scan $_, @argv;
} else { 
 while (<stdin>)  {
  s癧\r\n]$鞍g;
  push @{$farray{-s $_}},$_
   unless (-l or -s  or -p or -c or -b);
 }
}

print "now comparing ...\n";
for my $fsize (reverse sort {$a <=> $b} keys %farray) {

  my ($i,$fptr,$fref,$pnum,%dupes,%index,$chunk);

  # skip files with unique file size
  next if $#{$farray{$fsize}} == 0;

  $pnum  = 0;
  %dupes = %index = ();

  nx:
  for (my $nx=0;$nx<=$#{$farray{$fsize}};$nx++) # $nx now 1..count of files
  {                                             # with the same size
 $fptr = \$farray{$fsize}[$nx];          # ref to the first file
    $chunk = getchunk $fsize,$fptr;
    if ($pnum) {
   for $i (@{$index{$chunk}}) {
         $fref = ${$dupes{$i}}[0];
      unless (mycmp $fref) {
            # found duplicate, collecting
         push @{$dupes{$i}},$fptr;
   next nx;
      }
   }
    }

    # nothing found, collecting
    push @{$dupes{$pnum}},$fptr;
    push @{$index{$chunk}}, $pnum++;
  }
  # show found dupes for actual size
  for $i (keys %dupes) {
    $#{$dupes{$i}} || next;
    print "\n size: $fsize\n\n";
    for (@{$dupes{$i}}) {
        print $$_,"\n";
    }
  }
}

close f1;
close f2;