HTML Import 2.0 beta

settings-taxonomies
Standard

The new version of the HTML Import plugin is just about done. It’s completely rewritten and has several big new features:

It imports linked images. (OMG, I KNOW.) It can handle all of the following paths, as long as they lead to actual image files:

<img src=”http://example.com/images/foo.jpg” />
<img src=”/images/foo.jpg” />
<img src=”../../images/foo.jpg” />
<img src=”foo.jpg” />

Using custom post types? The importer now has options to import posts as any public post type. This means the plugin now requires at least WordPress 3.0.

It has supported custom taxonomies for several versions now, but in 2.0, the hierarchical ones (including categories) are displayed as nice little lists of checkboxes. Much nicer.

There’s now an option to enter your old site URL, which will be used to generate accurate .htaccess redirects. You can also retrieve the redirects again later if you need to, or regenerate them after you’ve changed your permalink setting, because the old URLs are now stored as custom fields of the imported posts.

The user interface is completely different. The settings have their own page, and the importer has been moved into Tools → Import. The settings that could make things go disastrously wrong, like a beginning directory path that doesn’t exist, will now trigger some error messages.

There’s a help tab on the settings page, and there’s a new User Guide (still in progress; please forgive the giant images).

Last-minute feature addition: default/index files for parent pages! In the past, the importer created empty placeholder pages when it encountered a directory, and then it would import all the files in that directory as child pages. In 2.0, you can specify the name of your default/index file (usually index.html on Apache servers or default.htm on IIS), and the importer will replace the empty placeholder with the index file’s contents. If there is no index file, the parent page remains empty.

I need testers!

If you’re willing to try out the beta, grab the development version from the Download page.

Here’s what it looks like…

34 thoughts on “HTML Import 2.0 beta

  1. Awesome! I used your plugin in the past and am excited to seen an update to make it even better. :)

    I tried it out today and the import is great. I love the new title feature that lets us remove certain parts from the title of each page.

    I am having some problems getting the image upload to work. It seems to be working for about 15 items, but not working for all the rest (300 or so pics). This is the error I am getting is similar to this:

    Could not find the right path to 4277d749212993f5293f6d71b08ea732.jpg (tried /Users/jeffvandrimmelen/Desktop/exss/files/cache/4277d749212993f5293f6d71b08ea732.jpg). It could not be imported. Please upload it manually.

    I am using MAMP on a local machine. I scraped the live site to test this. The .jpg is certainly at that location. It shows up when navigating through Finder. It also shows up when browsing the html site locally: file:///Users/jeffvandrimmelen/Desktop/exss/exss.unc.edu/files/cache/4277d749212993f5293f6d71b08ea732.jpg

    Maybe I am doing something wrong in the settings. Any thoughts you have would be appreciated. I am running wordpress 3.2 locally on a mac using MAMP.

    • Stephanie

      Jeff, I’ve fixed all the bugs except the image path problem. What’s the live site you scraped for your test? I’d like to do the same.

  2. I am also having a second problem that I’m not sure is connected to his, or if there is a workaround. If you look at this picture of the file structure of the site.

    http://vanswebsites.com/uploads/2011-07-07_1457.png

    You will see that there is a file that is about 4 levels deep, but because there are no index (or html) pages above that in the structure it put’s it as a top level page. Is there any way way to create the top levels? Do I need to add blank index.html pages to each folder to help keep the hierarchy?

    • Stephanie

      Definitely not your CMS! I had a reeeeeally dumb typo. Embarrassingly stupid and obvious once I looked at it!

      I just ran a perfect import on that site. Grab a fresh download — it should say it’s beta 3 — and see if all your problems are solved. :)

  3. Stephanie

    All — beta 3 is up, and fixes all the problems that have been reported here and on wordpress.org. I’m going to be out of town later this week, so I’m planning to release this on Monday. If you find any more issues in the meantime, please feel free to report them here, or you can try out the shiny new support forum!

    Thanks for all your help! You’ve found some great (embarrassing) bugs.

  4. savitha

    Hi,

    When I am trying to import the html page containing image using HTML import 2 plug-in, html content is imported but image is not imported to the media library.

    Done all the settings given in the user guide for the plug-in.

    Image import gives the following error

    “Could not find the right path to sample.jpg (tried /Users/savitha/Sites/savitha/sample.jpg). It could not be imported. Please upload it manually.”

    Given proper path for the image in the html.

    Can you please give inputs where I am missing steps.

    Thanks in advance.

  5. Kiku

    Hi Stephanie, thanks for creating this fantastic plugin !!

    I´m kiku from Spain, and i have a big problem;

    (Before importing) my post titles and permalinks are;

    ¿Y Tú CóMO ORDENAS LOS PAPELES DE OTROS AñOS? (2 PARTE)

    http://www.mydomain.com/y-t-cmo-ordenas-los-papeles-de-otros-aos-2-parte/index.html

    (permalink eliminated characters including accents)

    When importing with plugin HTML Import 2.0 … The result is;

    ¿Y Tú CóMO ORDENAS LOS PAPELES DE OTROS AñOS? (2 PARTE) (is perfect)

    but the permalink …

    http://www.mydomain.com/y-tu-como-ordenas-los-papeles-de-otros-anos-2-parte/index.html

    Characters with accents were replaced by letters :(

    ¿ this big problem is caused by your plugin, or new versions of wordpress?

    ¿ know how I can fix ?

    Thank you very much and happy life for you and your family :)

  6. I’m excited to see such a robust import tool for WordPress. Thank you for creating this plugin and making it available. I downloaded the beta and tried out the slug — works well.

    I have a few questions. Is there any way to make the new import simply overwrite the existing files? Or do you have to first delete the old pages completely before importing a new set of files? I ask because I want to use WordPress as a help platform, but I want to author content in another tool and continually run an import process into WordPress.

    Thanks,

    Tom

  7. Allan

    My import of over 700+ pages worked wonderfully, but for some reason when the imported pages show up in the WordPress “search results”, the content excerpt is always displayed simply as the text “Array”. However, if I copy one of those pages’ HTML from within the editor and past it into a new page that I manually add, that same content is able to display just fine in the “search result” content excerpt. Any ideas what might be the difference between the “HTML Import 2″ created page versus a native WordPress one? Thanks for creating such a powerful plugin!

  8. jad

    i have installed your bluing but i can’t work with it so if you please upload a video of your plugin so every one can understand how it’s work and if you have already one can you please send the link please .

  9. David

    My pages are all php with a php variable in the code called “$content_title”. I’d like to be able to pull the value of $content_title into the title.

    Any thoughts on how I can do that with this plugin? or does there need to be a field or programming put in to your plugin to get that to work?

  10. We are converting our old web site of about 3700 pages to WordPress. I have written a program to strip out all of the old included information, etc. Once the files are “clean” I am sending them through your program to import them into WordPress. The only tags I am leaving in the files are html, title, strong, br , p, meta text, head, body, div id=main, doctype, and rel.

    When I attempt to input a large amount of files, the program seems to “hang”. In actuality, the program errors and exits with no clue as to what the problem is. The message in the apache error log is: “[Sat Dec 29 10:56:21 2012] [error] [client 192.168.2.4] PHP Fatal error: Call
    to a member function xpath() on a non-object in
    /usr/share/wordpress/wp-content/plugins/import-html-pages/html-importer.php on
    line 407, referer:
    http://www.archivexparanormalstories.com/wp-admin/admin.php?import=html

    There is no clue as to what the problem is, what file was being operated on, and what caused the error. My only recourse is to feed the files in individually to identify the file that is causing the problem. At this point, I am forced to create the page by hand, using cut and paste. As this is happening frequently, this is causing quite a problem. The source file where the error occurred is available, should you want it.

    Please note, I am an experienced PHP, html, and Perl programmer (just not with the WordPress API). If you could tell me exactly what is causing the problem, I can correct the old files appropriately.

  11. You’re doing great work, I used earlier version of the html importer last year, after the first use, I ran into problems that we exchanged mails several time but the problem was unsolved. By mistake I came across development version of the plugin today, I followed your instructions as usual and all the works left undone after how many months were just uploaded at a click of button.

    This tells me you are spending sleepless night working on perfecting the plugin. We are still gathering articles for my website, Hope to get stabilised and send you some thing for coffee (a promise).

  12. Rik

    Great plugin! Have anyone tried converting HTML pages with tags to video files? I’m converting HTML -> PHP but pages use tags to mp4 video files.

    Any hint…. ?

    Thank you…

  13. Tommy Klausen

    Hi!
    First of all I would like to say that this is a great plugin. Helps me a lot.

    I have an issue with regards to language characters.
    I`m from Norway and we have e.g. the letter ø. It seems like the parser stops at some point when this characters is found and leaves the imported page broken.
    Is this issue allready supported or do I need to fix it at another end. Any advice?

  14. Tommy Klausen

    Hi.

    Can you take a look at the code below and see what that can be wrong with it. I suspect somethin with the stylesheet part messes things up. A sentence starting with P.mypar is displayed for the page and after that the page is broken. Is there something I haven`t configured the right way?

    P.mypar {text-align: center}
    {font-style: italic;size=small} {display: block}
    TH {background-color:#FFFF66}

    Serierenn 2
    Serierenn 2
    Tylldal IL Skigruppa.

    13.02.13

    Resultatliste G09Return

    Plass
    Navn

    Klubb                 
    Tid
    Horten,Brage Tylldal IL 06:49

    Nystuen,Iver Tylldal IL 05:15

    13.02.13 21:00:58 eTiming versjon 3.0 Emit as Lisensen tilhører: TYLLDAL IL

  15. David

    Hi Stephanie! Thanks so much for making this plugin! It’s a big bright ray of hope in my current situation. I’m trying to do a conversion of a Dreamweaver-created HTML site to WP and this utility has got it 90% of the way there :)

    A quick note before I ask a question about the remaining 10%. Did you know your link to the HTML Importer on your code/wordpress/ page goes to the delicious page instead? Just wanted to fyi you on that.

    Re my question, I found that the site I’m converting does not always have an HTML file in its directory structure. This causes the page structure the importer is creating to “skip” that part, resulting in a broken page structure.

    Example: there are HTML files in directories 1 and subdirectory 1/1, but not in 1/2. But directory 1/2 has subdirectories 1/2/1 and 1/2/2 and they do have HTML files in them.

    The page structure the importer creates has top parent 1 and child 1/1, then the structure breaks and you get top parents 1/2/1 and 1/2/2. Directory 1/2 is skipped because it has no HTML file, and the importer basically treats the next HTML file under the empty directory as a top parent.

    I am working on a way to populate the empty directories with blank index files so I get a good import, but I wanted to ask if there’s an easy change I could make to the plugin instead? The basic case would be “if there’s a subdirectory, make a blank page even if it has no HTML file in it”.

    Thanks again for making such a cool tool!

    David

    • Hi, David. I keep meaning to write up a long guide to preparing a site for use with this importer, which would include things like seeding empty directories with dummy HTML files to force a parent page to be created.

      But if you’re comfortable mucking around in the code, you can definitely adjust the importer to create pages for all the directories. (And I do mean all; this might create more than you really want.)

      In html-importer.php, starting at line 292, replace your elseif block with this:

      elseif(is_dir($path) && is_readable($path)) { 
      	        if(!in_array($val, $this->skip)) {
      /*
      			  $createpage = array();
      			  // get list of files in this directory only (checking children)
      				$files = scandir($path);
      				$exts = array();
      				foreach ($files as $file) {
      					$ext = '';
      					$filename_parts = pathinfo($file);
      					if (isset($filename_parts['extension']))
      						$ext = strtolower($filename_parts['extension']);
      					/*
      					$ext = strrchr($file,'.');
      					/**/
      					$ext = trim($ext,'.'); // dratted double dots
      					if (!empty($ext)) $exts[] .= $ext;
      				}
      
      				// allowed extensions only, please. If there are files of the proper type, we should create a placeholder page
      				$createpage = @array_intersect($exts, $this->allowed); // suppress warnings about not being an array
      
      				if ( !empty($createpage) &&  is_post_type_hierarchical($options['type'])) { 
      					$this->get_post($path, true);
      				}
      /**/
      
      				// create a parent page for all directories whether or not files with the allowed extensions are present
      				if ( is_post_type_hierarchical($options['type'])) { 
      					$this->get_post($path, true);
      				}
      				
      				// handle the files in this directory -- recurse!
      				$this->get_files_from_directory($path); 
      	        }
      	      }
      

      I think that should work.

      • David

        Thank you so much for the quick response Stephanie! “Mucking around” exactly describes my approach to code at this stage :) I’ll give that a shot straightaway.

        It’s fine if I get more empty pages than necessary to keep the structure.I have to review everything anyways so cleaning up extra blank pages is much less an issue than recreating the content structure by hand :)

        I’ll let you know how things go!

        David

      • David

        Hi Stephanie. Sorry to report it didn’t work :( I’ll take this over to the support forums now per your suggestion to someone above.

        Thanks so much for the quick response!

        David

  16. kea

    Hi Stephanie,

    I have an HTML site and want to import all my html content to my subfolder into wordpress…

    I have a question regarding importing images… for some reason importing content goes well, but when I look at my media… none gets imported??? How to go around this issue?

    I am trying to import 1000 pages…. do you think i should just import one directory at a time to make things easier so it doesnt hang?

    Lastly, for replacing internal links would it be possible to massively replace these internal links?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>