So, the other Steph and I were kvetching earlier about the lousy Word HTML we have to clean up all day… and I remembered something. A long time ago, I’d tried to use AppleScript to make the Word Unmunger’s batch mode easier to use. At the time, AppleScript defeated me… but now it’s Automator, and it’s a lot better.
Voila… the Word Unmunger Automator script. You’ll need to grab the Unmunger itself, of course, and edit the workflow to match your path to the script. (I had renamed mine fix.py because I was constantly typing the file name in Terminal.)
Now if only Dreamweaver’s commands were available to Automator. See, the Unmunger sometimes can’t handle HTML from Word files created on a Mac, and running it through Dreamweaver’s Clean Up Word HTML command first solves the problem.
Oh well. This is still going to make my professional life a lot easier.
Update, July 2009
Luke has kindly updated the Unmunger to work in newer versions of Python, which means the script now works in Leopard. I’ve also added a Growl notification to let you know when the files are done.
So, here’s how to use this thing. Download the Automator script and the Unmunger. Open up the script in Automator and adjust the path in the last step to match your preferred file location. Save as an application. Run your new application, and choose the Word-created HTML file(s) you want to clean up. It will save over the originals.
If you just need to fix one or two files, or you can’t run Python scripts, wordoff.org is a great alternative.
Phil Freo says
The .zip file to the script is a 404. I’d love to see this once the link is updated.
Stephanie says
I seem to have misplaced the file entirely. I’ll keep looking….
Stephanie says
Found it! I’ve fixed the link.