

Maybe you also want to delete symlinks and such. exec test -f "$reference"/ \ ), not so easy to replace -empty, I won't elaborate. This is only to give indication of progress you can omit -print if you want. print prints the pathname of the currently considered file. we don't want to use cmp with directories. The purpose of this test is to avoid giving files of other types to cmp later. type f is a test that checks if the currently considered file is a regular file. , descending to subdirectories of any depth. Our find will try to test all files under (and including). We don't use "$mutable" as the starting point, because we need find to consider relative paths so we can concatenate them with the path to the reference directory later. Thanks to the prior cd this will be the mutable directory. defines our starting point, the current working directory.

This is a command that does the real work: find. If the below command fails for any reason, abort. (Single-quotes are not necessary in this particular case however users without experience who want to process directories with spaces in names will probably appreciate the quotes being already in the right places.) Run this snippet to set the variables: reference='/cygdrive/c/MyData' If you ever need to apply this answer to other directories then it's enough to change the variables, while commands that follow are static. (The shell is important, see this question.) I mean in a shell (like bash) provided by Cygwin. Test the solution on some expendable pair of directories first. (I've posted this question on StackOverflow today as well by accident) I am using Windows, but have Cygwin, so I can use bash magic as well. with restriction to the same path inside the respective folder/directory?.with complete comparison (byte-by-byte).in two folders/directories, finding only pairs over both and not within each.they mark C:\MyData\task1\done.txt as identical to C:\MyDataBackup\task1\done.txt and C:\MyDataBackup\task57\done.txt. not doing complete byte-by-byte comparisons but just relying on hash sums. And since the data piles are huge, it would slow down the search for weeks. That is not allowed! Those files must not be deleted, since they are intended. searching for duplicates inside C:\MyData and C:\MyDataBackup as well. How can I achieve that? The duplicate detection tools I have used so far usually have the shortcoming of. I want to get rid of C:\MyDataBackup, so I have to find all the files in there that are identical to their siblings in C:\MyData and delete them, and then have the owner handle the handful of remaining files manually. I have two folders/directories: C:\MyData and C:\MyDataBackup and the person that owns those two folders/directories and does not remember if they have edited the files in the original or in the Backup.
