开发者

Migrate from MySQL to PostgreSQL on Linux (Kubuntu)

A long time ago on a system far, far away...

Trying to migrate a database from MySQL to PostgreSQL. All the documentation I have read covers, in great detail, how to migrate the structure. I have found very little documentation on migrating the data. The schema has 13 tables (which have been migrated successfully) and 9 GB of data.

MySQL version: 5.1.x

PostgreSQL version: 8.4.x

I want to use the R programming language to analyze the data using SQL select statements; PostgreSQL has PL/R, but MySQL has nothing (as far as I can tell).

A New Hope

Create the database location (/var has insufficient space; also dislike having the PostgreSQL version number everywhere -- upgrading would 开发者_开发问答break scripts!):

  1. sudo mkdir -p /home/postgres/main
  2. sudo cp -Rp /var/lib/postgresql/8.4/main /home/postgres
  3. sudo chown -R postgres.postgres /home/postgres
  4. sudo chmod -R 700 /home/postgres
  5. sudo usermod -d /home/postgres/ postgres

All good to here. Next, restart the server and configure the database using these installation instructions:

  1. sudo apt-get install postgresql pgadmin3
  2. sudo /etc/init.d/postgresql-8.4 stop
  3. sudo vi /etc/postgresql/8.4/main/postgresql.conf
  4. Change data_directory to /home/postgres/main
  5. sudo /etc/init.d/postgresql-8.4 start
  6. sudo -u postgres psql postgres
  7. \password postgres
  8. sudo -u postgres createdb climate
  9. pgadmin3

Use pgadmin3 to configure the database and create a schema.

The episode continues in a remote shell known as bash, with both databases running, and the installation of a set of tools with a rather unusual logo: SQL Fairy.

  1. perl Makefile.PL
  2. sudo make install
  3. sudo apt-get install perl-doc (strangely, it is not called perldoc)
  4. perldoc SQL::Translator::Manual

Extract a PostgreSQL-friendly DDL and all the MySQL data:

  1. sqlt -f DBI --dsn dbi:mysql:climate --db-user user --db-password password -t PostgreSQL > climate-pg-ddl.sql
  2. Edit climate-pg-ddl.sql and convert the identifiers to lowercase, and insert the schema reference (using VIM):
    • :%s/"\([A-Z_]*\)"/\L\1/g
    • :%s/ TABLE / TABLE climate./g
    • :%s/ on / on climate./g
  3. mysqldump --skip-add-locks --complete-insert --no-create-db --no-create-info --quick --result-file="climate-my.sql" --databases climate --skip-comments -u root -p

It might be worthwhile to simply rename the tables and columns in MySQL to lowercase:

  1. select concat( 'RENAME TABLE climate.', TABLE_NAME, ' to climate.', lower(TABLE_NAME), ';' ) from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA='climate';
  2. Execute the commands from the previous step.
  3. There is probably a way to do the same for columns; I changed them manually because it was faster than figuring out how to write the query.

The Database Strikes Back

Recreate the structure in PostgreSQL as follows:

  1. pgadmin3 (switch to it)
  2. Click the Execute arbitrary SQL queries icon
  3. Open climate-pg-ddl.sql
  4. Search for TABLE " replace with TABLE climate." (insert the schema name climate)
  5. Search for on " replace with on climate." (insert the schema name climate)
  6. Press F5 to execute

This results in:

Query returned successfully with no result in 122 ms.

Replies of the Jedi

At this point I am stumped.

  • Where do I go from here (what are the steps) to convert climate-my.sql to climate-pg.sql so that they can be executed against PostgreSQL?
  • How to I make sure the indexes are copied over correctly (to maintain referential integrity; I don't have constraints at the moment to ease the transition)?
  • How do I ensure that adding new rows in PostgreSQL will start enumerating from the index of the last row inserted (and not conflict with an existing primary key from the sequence)?
  • How do you ensure the schema name comes through when transforming the data from MySQL to PostgreSQL inserts?

Resources

A fair bit of information was needed to get this far:

  • https://help.ubuntu.com/community/PostgreSQL
  • http://articles.sitepoint.com/article/site-mysql-postgresql-1
  • http://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#MySQL
  • http://pgfoundry.org/frs/shownotes.php?release_id=810
  • http://sqlfairy.sourceforge.net/

Thank you!


What I usually do for such migrations is two-fold:

  • Extract the whole database definition from MySQL and adapt it to PostgreSQL syntax.
  • Go over the database definition and transform it to take advantage of functionality in PostgreSQL that doesn't exist in MySQL.

Then do the conversion, and write a program in whatever language you are most comfortable with that accomplishes the following:

  • Reads the data from the MySQL database.
  • Performs whatever transformation is necessary on the data to be stored in the PostgreSQL database.
  • Saves the now-transformed data in the PostgreSQL database.

Redesign the tables for PostgreSQL to take advantage of its features.

If you just do something like use a sed script to convert the SQL dump from one format to the next, all you are doing is putting a MySQL database in a PostgreSQL server. You can do that, and there will still be some benefit from doing so, but if you're going to migrate, migrate fully.

It will involve a little bit more up-front time spent, but I have yet to come across a situation where it isn't worth it.


Convert the mysqldump file to a PostgreSQL-friendly format

Convert the data as follows (do not use mysql2pgsql.perl):

  1. Escape the quotes.

    sed "s/\\\'/\'\'/g" climate-my.sql | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" > escaped-my.sql

  2. Replace the USE "climate"; with a search path and comment the comments:

    sed "s/USE \"climate\";/SET search_path TO climate;/g" escaped-my.sql | sed "s/^\/\*/--/" > climate-pg.sql

  3. Connect to the database.

    sudo su - postgres
    psql climate

  4. Set the encoding (mysqldump ignores its encoding parameter) and then execute the script.

    \encoding iso-8859-1
    \i climate-pg.sql

This series of steps will probably not work for complex databases with many mixed types. However, it works for integers, varchars, and floats.

Indexes, primary keys, and sequences

Since mysqldump included the primary keys when generating the INSERT statements, they will trump the table's automatic sequence. The sequences for all tables remained 1 upon inspection.

Set the sequence after import

Using the ALTER SEQUENCE command will set them to whatever value is needed.

Schema Prefix

There is no need to prefix tables with the schema name. Use:

SET search_path TO climate;


If you've converted a schema then migrating data would be the easy part:

  • dump schema from PostgreSQL (you said that you've converted schema to postgres, so we will dump it for now, as we will be deleting and recreating target database, to have it cleaned):

    pg_dump dbname > /tmp/dbname-schema.sql
    
  • split schema to 2 parts — /tmp/dbname-schema-1.sql containing create table statements, /tmp/dbname-schema-2.sql — the rest. PostgreSQL needs to import data before foreign keys, triggers etc. are imported, but after table definitions are imported.

  • recreate database with only 1 part of schema:

    drop database dbname
    create database dbname
    \i /tmp/dbname-schema-1.sql
    -- now we have tables without data, triggers, foreign keys etc.
    
  • import data:

    (
       echo 'start transaction';
       mysqldump --skip-quote-names dbname | grep ^INSERT;
       echo 'commit'
    ) | psql dbname
    -- now we have tables with data, but without triggers, foreign keys etc.
    

    A --skip-quote-names option is added in MySQL 5.1.3, so if you have older version, then install newer mysql temporarily in /tmp/mysql (configure --prefix=/tmp/mysql && make install should do) and use /tmp/mysql/bin/mysqldump.

  • import the rest of schema:

    psql dbname
    start transaction
    \i /tmp/dbname-schema-2.sql
    commit
    -- we're done
    


Check out etlalchemy. It allows you migrate from MySQL to PostgreSQL, or between several other databases, in 4 lines of Python. You can read more about it here.

To install: pip install etlalchemy

To run:

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# Migrate from MySQL to PostgreSQL
src = ETLAlchemySource("mysql://user:passwd@hostname/dbname")
tgt = ETLAlchemyTarget("postgresql://user:passwd@hostname/dbname",
                          drop_database=True)
tgt.addSource(src)
tgt.migrate()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜