Efficiently delete a million files on Linux servers

Data Delete


It happens to the best: some script rockets to the skyline resulting in an instand system administrator headache because some folder – typically, sessions – was stuffed with millions of files. Linux is not quite happy with that, deleting the folder is not an option and the loyal “rm -rf” command decides to call it a day. To make things even worse: you want to remove only files of some days ago… what are the options?

Find is you friend

The Linux “find” command is a possible solution, many will go for:

find /yourmagicmap/* -type f -mtime +3 -exec rm -f {} \;

The command above will give a list of files older than 3 days and will pass each found file to the rm command. The rule above has one problem though: it will take some time, since calling the rm command a million times is not exactly what you can call efficient.

A better option is:

 find /yourmagicmap/* -type f -mtime +3 -delete

This adds the delete flag to the find command giving it the command to throw it all away. Do the right thing and pass along your command in a cronjob if you need to clean out the folder on a regular basis.

The rsync alternative!

rsync is without doubt one of the most handy commands when it comes to file operations. Rsync can do any kind of volume sync – as you may know – but it also provides a way to empty a folder.
The example below assumes you have a folder named /tmp/empty/ which contains no files, and a folder /tmp/session/ that contains too much crap. The rule below allows you to remove those files:

rsync -a --delete /tmp/empty /tmp/session/

Which is the fastest? 

rm: deleting millions of file is a no-can-do!

find -exec: an option, but slower!

find -delete: fast and easy way to remove loads of files.

rsync –delete: without doubt the quickest!

Leave a Reply