• Skip to primary navigation
  • Skip to main content
  • Skip to footer
  • Books
    • Content Strategy for WordPress (2015)
    • WordPress for Web Developers (2013)
    • Beginning WordPress 3 (2010)
  • Blog
    • Content Modeling for WordPress
    • WordPress Hidden Gems
    • Web Design
  • Work
    • Presentations and Interviews
    • on GitHub →
    • MLIS Class Projects (2019-2022)
    • Portfolio (2002-2019)
    • WordPress Plugins

Stephanie Leary

Writer, Front End Developer, former WordPress consultant

  • About
    • Press Kit
    • Presentations and Interviews
  • Contact Me

Automator script for Word HTML cleanup

November 22, 2005 Stephanie Leary 3 Comments

So, the other Steph and I were kvetching earlier about the lousy Word HTML we have to clean up all day… and I remembered something. A long time ago, I’d tried to use AppleScript to make the Word Unmunger’s batch mode easier to use. At the time, AppleScript defeated me… but now it’s Automator, and it’s a lot better.

Voila… the Word Unmunger Automator script. You’ll need to grab the Unmunger itself, of course, and edit the workflow to match your path to the script. (I had renamed mine fix.py because I was constantly typing the file name in Terminal.)

Now if only Dreamweaver’s commands were available to Automator. See, the Unmunger sometimes can’t handle HTML from Word files created on a Mac, and running it through Dreamweaver’s Clean Up Word HTML command first solves the problem.

Oh well. This is still going to make my professional life a lot easier.

Update, July 2009

Luke has kindly updated the Unmunger to work in newer versions of Python, which means the script now works in Leopard. I’ve also added a Growl notification to let you know when the files are done.

So, here’s how to use this thing. Download the Automator script and the Unmunger. Open up the script in Automator and adjust the path in the last step to match your preferred file location. Save as an application. Run your new application, and choose the Word-created HTML file(s) you want to clean up. It will save over the originals.

If you just need to fix one or two files, or you can’t run Python scripts, wordoff.org is a great alternative.

Macs, Techy Goodness, Web Design

This is an excerpt from Content Strategy for WordPress.My latest books are Content Strategy for WordPress (2015) and WordPress for Web Developers (2013). Sign up to be notified when I have a new book for you.

Reader Interactions

Comments

  1. Phil Freo says

    December 20, 2008 at 7:00 pm

    The .zip file to the script is a 404. I’d love to see this once the link is updated.

    Reply
  2. Stephanie says

    December 20, 2008 at 9:22 pm

    I seem to have misplaced the file entirely. I’ll keep looking….

    Reply
  3. Stephanie says

    February 10, 2009 at 9:34 pm

    Found it! I’ve fixed the link.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Footer

My Books

I’m a front end developer at Equinox OLI, working on open source library software. I was previously a freelance WordPress developer in higher education. You can get in touch here or on LinkedIn.

Copyright © 2025 Stephanie Leary · Contact