Friday, May 13, 2011

Multiple, Recursive file Search/Replace and Corrupting Your Repository

Today I was working on a project of mine, and I needed to rename a function. The function name was fine when I wrote it, but I needed to make a function with some very similar functionality. Naturally I broke it down, abstracted the old function and created a few new ones which called it in slightly different ways. Anyways, that's not important. What's important is I needed to rename a function that's used all over the place in a package.

Now, I'm developing this aforementioned project in a linux environment, using emacs. While I can edit proficiently with emacs, I'm not an M-x butterfly using guru by any means. So I thought I'd just do a recursive search and replace using find and sed:

find ./ –type i | xargs sed –i 's/dict_format/dict_to_object/'

I found it online somewhere. I figured, "what's the harm? My project is under version control at BitBucket, and I'll learn some bash skills." So I ran the command, did an hg diff and saw that it worked well. I was pleased. "Time to commit these changes," I thought.

carson@vmdev:~/py/aforementioned_project@ hg ci –m "Renamed that pesky function"
abort: index data/.../format.py.i is corrupted!

Dear lord, no!

Of course, I knew what happened. "What's an index file, anyway? I'll just change it back." It turns out an index file is a binary file—since I couldn't update to an earlier revision due to the corrupted index, I cloned a different copy of the repository and took a look. There is no human-readable text in there whatsoever. No word as to how the new name of my function wound up smudged in there.

"Oh well," I said, "I have this new index file, let's see what happens if I just copy it into my .hg." Well, that doesn't work.

"OH WELL," I said, "I'll just delete everything, and use this new clone." Except that I had been committing to my local repo, and hadn't bothered to push the changes to BitBucket before I did the search/replace. D'oh!

I only lost about an hour and a half of work, but it could have been much worse. I've gone several days without pushing before. I realize there are probably several things I could have done to rectify the situation without losing the work. Such as the obvious one, copying my changed files to the new clone and re-committing them. I'm sure there's more solutions that are more technical, involving some hg-fu, but I'm not aware of them at this point.

My lesson is this: push often! Learn to use your IDE to do automated refactoring for you, and make sure it ignores your .hg/.git/.svn directory. Things like this happen to everyone, at some point, but it still sucks.

The worst part is, there weren't even any changes that needed to be made in the top-level directory of my project. I could have just run the command in the affected package and avoided this whole catastrophe. Oh well. We live, we learn.