How to Convert WordPress Database to UTF-8

There is a very good plugin UTF-8 Database Converter by Yihui Xie that can help you to convert all the MySQL tables in a WordPress installation to use UTF-8. Even if the plugin is not updated for long time, I have partially used it to update a large (and old) WordPress database with a few minor changes. Basically I have used the plugin to get the list of tables and changes to do and then run the queries manually in the database.

The plugin URL is here: http://yihui.name/en/2009/05/convert-mysql-database-to-utf-8-in-wordpress/

The reason why I ran the queries manually to update the MySQL database is that the first time I ran the plugin in a local database with all the tables replicated from the production site, I faced an error in one of the most important tables wp_posts.

Column 'post_content' cannot be part of FULLTEXT index

In order to solve this issue, I needed to temporarily remove the index, run the queries and then create the indexes again. Hopefully this worked, but this made me to get a list of all the queries and prepare a .sql file with the ALTER queries.

How I converted a WP database to UTF8?

For my own records (in case I need to do it again in the future), here are the steps I followed to convert my DB tables to UTF-8.

  • Installed the plugin
  • I have edited the plugin and added:
    • An $sql array to save all the ALTER queries and changed identified by the plugin.
    • At the end of the script, generated an output with all these previous SQL queries into a textbox so it is easier to copy and paste. Remember to add the “;” at the end so you have the queries prepared for later. If not, then you can update in mass using a text editor like Notepad++ to add the colon at the end.
  • Run the script in localhost the same replicated database I had in production.
    • I keep track of all the queries running and then created a master .sql with changes.
    • For every failed query, I identified what was the error. For example, the “Column ‘post_content’ cannot be part of FULLTEXT index” was resolved by removing the indexes before running this ALTER query and then re-generating the indexes.
  • Once everything is running in localhost, I ran the master .sql again in localhost just to make sure everything was running smoothly again.
  • Once I confirmed everything is running well, I run the script in production.

Why is important to use UTF-8 tables, fields and database in WordPress

You can read a bit of History here and realize that if you are using a very old installation of WordPress it might have the original latin1 tables instead of UTF-8. In recent versions of WP two additional settings were added to the configuration file.

Setting the DB_CHARSET and DB_COLLATE values in wp-config.php causes WordPress to create the database with the appropriate charset settings. The default is UTF8, the standard charset for modern data which supports all internet-friendly languages.

If you plan to support multi-language in your WordPress blog, for example using the WPML Plugin for WordPress and translate all your content to Spanish, Chinese, Italian, Portuguese, etc. then having UTF-8 format in your database consistently will be a good move.

With the original latin1 tables instead of UTF-8 the output was full of question marks and strange characters. If you want to fix this problem when translating the content from English to Spanish or any other language, then this change will have immediate results.

Resources to check:

UTF8 DB Converter with Changes (254)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>