Not long ago, Amazon added PostgreSQL support to their RDS (Relational Database Service).  It seemed to be a perfect solution for start ups and small companies who do not have resources to give a lot of attention to relational database management such as scaling, replication, and tuning. Also, their multi-AZ support, provisioned IOPS and monitoring features looked phenomenal. You can check this page if you want to learn more about it.

This morning I migrated all the data residing in a database server running on an EC2 instance. The importing process wasn’t as smooth as the instruction on the official Amazon help page. It was mainly because of the differences of the two environments. The major incompatibility came from the fact that the master user that you log in as to the RDS instance is not a superuser, and certain statements in the dumped SQL cannot be run because of that. I’ll explain this as I go through each steps.

Here are the steps for the migration process that eventually worked out for me. The RDS’s PostgreSQL version was 9.3 and the old server’s version was 9.1, so some things I had to do in these steps may have come from the non-matching versions.

 

1. Preparation from the old database that you are migrating from.

I don’t know if this step is really necessary, but doing this made the process a lot easier for me.

Just make sure all your tables in the database have the same owner. Because you will not run the dumped statements as a super user, if your user name does not match the table owner’s name, the COPY statement will fail, and it will spew a billion lines of errors at you because all the lines that come after this COPY statement will also fail. The work around is that you grant INSERT and other privileges to all users by issuing the following statement as the owner of that table.

GRANT ALL PRIVILEGES ON table_name TO PUBLIC;

I still prefer having all the tables have the same owner, and maybe change it later when all the data are migrated over.

 

2. Creating the RDS instance.

This is the easiest part. I did it through the RDS console (the web interface), and I’ve not found any ambiguous or unexplained fields to fill out. The only important thing to remember in this step is that you want to put the correct database name and the master user name. The master user name is what you will be using to run psql remotely, and if it does not match with your old database (that you are migrating from) some statements can fail. This is what I talked about in step 1. You can put the master user name anything you want and later create a role that matches the table owner, but I rather save myself that step by having the master user name same as the table owner name in step 1. Database name should also match the name in your old database server.

UPDATE: I almost forgot another really important thing in this step. Do not turn on Multi-AZ or automated backup. It will make the import process very slow. Turn them off for now, and turn them on after the migration is complete.

 

3. Creating the back up files with pg_dump and pg_dumpall.

I initially only did it with pg_dumpall, but later realized that the resulting file needs some editing, and it’s not fun to edit 20 gb back up dump file. ALTER ROLE statements are the ones that needed editing so I made two separate files for globals and data. (Also make sure you are running this in a mounted storage that has enough space for the resulting SQL dump files).

You can get the global only SQL dump file with the following command.

pg_dumpall --globals-only > globals.sql

And get the actual database portion with the following command.

pg_dump {name_of_your_db} > data.sql

That globals.sql file needs some editing for my case. I just had to remove the following attributes:

NOSUPERUSER, SUPERUSER, NOREPLICATION

Because superuser related attributes are not to be run from a non superuser and NOREPLICATION attribute cannot be altered with other attributes (so did the error messages say).

 

4. Exporting the data to the RDS instance using the SQL dump files.

Simply run these commands. It can take a while.

psql -f globals.sql --host={host} --port=5432  --username={username} --dbname={dbname}
psql -f mb_data.sql --host={host} --port=5432  --username={username} --dbname={dbname}

After they are done, you have a fully restored database in your RDS instance except some failed statements… For me, my database had some custom functions written in C language, and since you cannot define functions based on anything other that SQL as a non superuser, these failed and the functions were not created. :( Also, some extension related statements also failed because of the permission issue as well. For my case, these weren’t that big of issues.

 

Some extra advices

  • If you have errors that say “invalid command”, most likely, your earlier COPY statement failed and all the lines that come after that make no sense to PostgreSQL now. You should hunt down the original error and fix it. This happened to me when the the user name did not match the table owner name (things I explained in step 1).
  • I was really looking forward to having the Read Replica feature, but it turned out it is not currently supported for PostgreSQL. The vanilla PostgreSQL 9 has excellent replication features that support read only slave servers so you can do this yourself if you are not using RDS. My guess is that RDS will eventually support it because it is theoretically doable. But now, only MySQL can do this.