skip to Main Content

Good evening,

I am trying to figure out a way to automatically delete files with duplicate filenames from a Google Drive folder. I’d like to keep the file with the oldest creation date, treat the others as duplicates, and delete the duplicates.

I feel close to achieving this using the below AppScript code (from this site: https://hackernoon.com/how-to-find-and-delete-duplicate-files-in-google-drive). The only issue is the code is seemingly keeping the newest file instead of the oldest one. Do you know what I can do to shift this code to keep the oldest file instead of the newest one? If you have other code in mind to achieve my goal, please also share that.

Thank you!


const FOLDER_ID = "INSERTIDHERE";

// Add id of the folder to check for duplicate

/*
 * Function looks for duplicate file names in designated folder and removes them.
 * @param {String} fileName
 */
function removeDuplicateFile() {
  let folder = DriveApp.getFolderById(FOLDER_ID);

  let files = folder.getFiles();

  let fileList = [];

  // if no file is found return null
  if (!files.hasNext()) {
    return;
  }

  // else
  while (files.hasNext()) {
    let file = files.next(),
      name = file.getName(),
      size = file.getSize();

    // checking this way always leaves first file not deleted
    if (isDuplicateFile(fileList, name, size)) {
      file.setTrashed(true);
    } else {
      fileList.push([name, size]);
    }
  }
}

/*
 * Function is helper function of removeDuplicateFile function.
 * It checks if theres already a file in the given lst with same name and size and returns true or false
 * @param {List} lst
 * @param {String} name
 * @param {Number} size
 * @returns {Boolean}
 */
function isDuplicateFile(lst, name, size) {
  for (let i = 0; i < lst.length; i++) {
    if (lst[i][0] === name && lst[i][1] === size) return true;
  }
  return false;
}


/*
 * Delete all the triggers if there are any
 */
var deleteTrigger = () => {
  let triggersCollection = ScriptApp.getProjectTriggers();
  if (triggersCollection.length <= 0) {
    console.log(`Event doesnot have trigger id`);
  } else {
    triggersCollection.forEach((trigger) => ScriptApp.deleteTrigger(trigger));
  }
  return;
};

/*
 * Create a trigger function for file which also deletes previous triggers if there are.
 */
function removeDuplicateFileTrigger() {
  // First Delete existing triggers
  deleteTrigger();

  // now remove duplicate files 
  removeDuplicateFile();
}

3

Answers


  1. I believe your goal is as follows.

    • You want to remove the duplicate files. In this case, you want to leave the oldest files as the created date.

    In your script, the script for checking the created date is not included. So, in this case, how about the following modification? In this modification, the function removeDuplicateFile() is modified as follows.

    Modified script:

    function removeDuplicateFile() {
      let folder = DriveApp.getFolderById(FOLDER_ID);
      let files = folder.getFiles();
      if (!files.hasNext()) {
        return;
      }
      
      // Retrieve files by parsing the filename and the file size.
      let list = {};
      while (files.hasNext()) {
        let file = files.next(),
          name = file.getName(),
          size = file.getSize(),
          date = file.getDateCreated().getTime();
        let key = name + size;
        list[key] = list[key] ? [...list[key], { file, size, date }] : [{ file, size, date }];
      }
    
      // Keep the oldest files.
      let removeFiles = Object.values(list).reduce((ar, v) => {
        if (v.length > 1) {
          let [, ...f] = v.sort((a, b) => a.date > b.date ? 1 : -1);
          ar = [...ar, ...f.map(({ file }) => file)];
        }
        return ar;
      }, []);
    
      // Remove files except for the oldest files.
      removeFiles.forEach(f => f.setTrashed(true));
    }
    
    • When this script is run, the file list is retrieved by checking the filename and the file size. And, the duplicated files are removed while the oldest created files are left.
    • In this modification, your isDuplicateFile is not used.

    References:

    Login or Signup to reply.
  2. function removeDuplicateFile() {
      const folder = DriveApp.getFolderById("fid");
      const files = folder.getFiles();
      let fObj = { pA: [] };
      //collect all files with same names into an object of arrays 
      while (files.hasNext()) {
        let f = files.next();
        let n = f.getName();
        let dv = f.getDateCreated().valueOf();
        let id = f.getId();
        if (!fObj.hasOwnProperty(n)) {
          fObj[n] = [{ name: n, value: dv, id: id }];
          fObj.pA.push(n);
        } else {
          fObj[n].push({ name: n, value: dv, id: id });
        }
      }
      fObj.pA.forEach(p => {
        fObj[p].sort((a, b) => b.value - a.value);//sort descending by date created
        fObj[p].forEach((ob,i) => {
          if(i > 0) {
            Drive.Files.remove(ob.id);//deletes files  permanently
          }
        });
      });
    }
    

    You may need to enable Drive API

    Login or Signup to reply.
  3. Alternative Solution to retain the oldest file and delete duplicate files in Google Drive


    Please note to enable Drive API.

    function keepOldestFilesOfEachNameInAFolder() {
      const folder = DriveApp.getFolderById("INSERT FOLDER ID");
      const files = folder.getFiles();
      let fO = { pA: [] };
      let keep = [];
      while (files.hasNext()) {
        let file = files.next();
        let n = file.getName();
        //Organize file info in fO
        if (!fO.hasOwnProperty(n)) {
          fO[n] = [];
          fO[n].push(file);
          fO.pA.push(n);
        } else {
          fO[n].push(file);
        }
      }
      //Sort each group with same name
      fO.pA.forEach(n => {
        fO[n].sort((a, b) => {
          let va = new Date(a.getDateCreated()).valueOf();
          let vb = new Date(b.getDateCreated()).valueOf();
          return va - vb; // I have modified this line to retain the oldest one instead of the newest files created
        });
        //Keep the oldest one and delete the rest
        fO[n].forEach((f, i) => {
          if (i > 0) {
            Drive.Files.remove(f.getId());
          }
        });
      });
    }
    

    Reference: find duplicate files in one folder and removing the oldest one? (Google Script)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search