Automator script for Word HTML cleanup

Standard

So, the other Steph and I were kvetching earlier about the lousy Word HTML we have to clean up all day… and I remembered something. A long time ago, I’d tried to use AppleScript to make the Word Unmunger’s batch mode easier to use. At the time, AppleScript defeated me… but now it’s Automator, and it’s a lot better.

Voila… the Word Unmunger Automator script. You’ll need to grab the Unmunger itself, of course, and edit the workflow to match your path to the script. (I had renamed mine fix.py because I was constantly typing the file name in Terminal.)

Now if only Dreamweaver’s commands were available to Automator. See, the Unmunger sometimes can’t handle HTML from Word files created on a Mac, and running it through Dreamweaver’s Clean Up Word HTML command first solves the problem.

Oh well. This is still going to make my professional life a lot easier.

Update, July 2009

Luke has kindly updated the Unmunger to work in newer versions of Python, which means the script now works in Leopard. I’ve also added a Growl notification to let you know when the files are done.

So, here’s how to use this thing. Download the Automator script and the Unmunger. Open up the script in Automator and adjust the path in the last step to match your preferred file location. Save as an application. Run your new application, and choose the Word-created HTML file(s) you want to clean up. It will save over the originals.

If you just need to fix one or two files, or you can’t run Python scripts, wordoff.org is a great alternative.

3 thoughts on “Automator script for Word HTML cleanup

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>