Whatever I feel like the whole Internet should know.

Friday, April 11, 2008

how to back up a 50GB directory to DVD's with bash one-liners

The following one-liner produces a bunch of numbered text files in the current directory containing lists of directories under the current directory to put on each DVD.

rm -f *-disc.txt; n=1; a=0; for i in *; do size=$(du -s $i | awk '{print $1}'); tent=$[ $a + $size ]; if [ $tent -ge 4700000 ]; then echo "$i pushes disc size to $tent; pushing to next disc"; n=$[n+1]; a=$size; else a=$tent; fi; nn=$(printf "%02d" $n); echo $i >> $nn-disc.txt; done

Variables:

  • n is the number of the text file we're presently writing to.

  • a is the (accumulated) size of this disc so far.

  • i is the directory we're on (i doesn't stand for anything, it's just what my fingers type after "for.")

  • size is the size of a directory. (Or file, I suppose: my directory only contains other directories.)

  • tent is how big this disc would be if we stuck that directory on it (tentative).

  • 4589843 isn't really a variable - it's 4.7 * 1000 * 1000 * 1000 / 1024, i.e. the number of kibibytes in 4.7 gigabytes. (See wiki:IEEE 1541 for more; the issue is whether sizes are expressed in powers of ten or of two.)

  • nn is n padded to two digits with leading zeros as necessary (e.g. 01, 04, 07, 10).


The following one-liner tests the work of the first.

(for d in *-disc.txt; do du -ch --si `cat $d` | sed "s/total/all in $d/"; done) | grep 'all in'

Note that the --si makes (GNU) du give numbers as powers of 10, so if the output says something greater than "4.7G" you have a disc that's too big. But it doesn't, because the previous one-liner worked properly.

The following one-liner burns the discs.

for i in *-disc.txt; do echo "******** insert blank disc ********"; read; nn=${i%-disc.txt}; growisofs -Z /dev/dvd -r -V flac-backup-$nn -graft-points $(cat $nn-disc.txt | sed 's:^\(.*\)$:/\1=\1:'); done

${i%-disc.txt} is bash-specific, I believe, and evaluates to $i with the -disc.txt stripped off the end.

growisofs runs genisoimage to create the image that goes on the DVD. genisoimage takes the directories handed to it and puts all their files in the image. So to make the directories be directories on the DVD, we have to tell genisoimage to graft their contents under their names, by specifying -graft-points and giving parameters like /divine-discontent=divine-discontent. That's what the cat-sed thingy does.

So. Things about bash that average shell users may not know:

  • $(command) evaluates to the output of command, just like `command` does; but the $() are nestable.

  • ${variable#prefix} strips prefix off of the value of variable.

  • ${variable%suffix} strips suffix off of the value of variable. There's more to these two; see the bash man page, and look for the "Parameter Expansion" section.

  • $[arithmetic expression] does math. - Oops, you're supposed to use $((expression)). See the "Arithmetic Expansion" section of the man page.


Some people complain that bash is bloatware. I think they're right, but as long as we've nearly all got it on our Linux boxen, might as well use its full capability, right?

Finally, I'd like to note that I tried to do all of this in Haskell first. Instead of calling du, I wrote my own. It was pretty, but long. Instead of merely tossing something onto the next disc when this one's too big, I tried to find an optimal packing. It was going to be my next Haskell project. Well, that was too big and involved, and when it didn't happen in one night, it got postponed indefinitely, and now, months later, I got it done in two hours including blogging it. Maybe next time, Haskell.

No comments:

Blog posts I think are cool