Differences

This shows you the differences between two versions of the page.

--- tips:threelinestip [2009/05/25 00:34]
127.0.0.1 external edit
+++ tips:threelinestip [2010/10/21 23:28]
a
@@ Line 145: / Line 145: @@
   for file in `ls *.pdf`; do convert -verbose -colorspace RGB -resize 800 -interlace none \
     -density 300 -quality 80 $file `echo $file | sed 's/\.pdf$/\.jpg/'`; done
+===== Find duplicate files in Linux =====
+Let’s say you have a folder with 5000 MP3 files you want to check for duplicates. Or a directory containing thousands of EPUB files, all with different names but you have a hunch some of them might be duplicates. You can cd your way in the console up to that particular folder and then do a
+<code>
+find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
+</code>
+This will output a list of files that are duplicates, according tot their HASH signature.
+Another way is to install fdupes and do a
+   fdupes -r ./folder > duplicates_list.txt
+The -r is for recursivity. Check the duplicates_list.txt afterwards in a text editor for a list of duplicate files.