All right, friends, it’s time for some serious development talk. You and a couple of your friends are building the next it, the Web 2.0 app of the future. You have a source control system set up, you even have a bug tracking system, you even have a spec and someone (not you!) who tests your software (may I remind you of the famous Joel Test?). You’re heading for success.
Your web app has a database back end, and you already have a few users out there, playing with your system; their data lives in your “production” database. You also have local dev boxes, with complete copies of the production environment, with development versions of the database. Anyone can mess with the development databases without causing havoc to real users’ data.
If you don’t have any of that, don’t read this article. Instead, go read Joel Spolsky’s blog or book “Joel on Software”. He’ll explain why all of these pieces are necessary.
If you do, however, you’re on the right track. But.
There’s always a “but” (or a butt?).
How do you propagate the changes to the database schema, if there are any, to the production version of the database? You’re still doing development on the system, schema and data are changing with each build, you can’t afford a DBA to do crazy DIFF’s between your dev database and production database…
Simple, and flawed, approach
You already have a version control system. Let’s assume you’re using MySQL as your database (I am). You can just use mysqldump to dump the schema and data of your development database, and check in those .sql files into source control. Then, your build script just drops the production database, and recreates it from scratch using the .sql files from the source tree.
Note that you definitely want two dumps: one for the schema, and one for the data. This way it’s much easier to track schema changes in your source control system.
Only one, but very big problem: user-generated content gets lost.
You may say “copy over user generated content to the dev box before checking stuff in”. But the build isn’t instantaneous, and everyone who’s been doing something on your site during the time between your “sync” and the build will lose their data. Not good.
You may say “screw user content”. No, you didn’t say that. I just heard it from some weirdo in the back of the room. Screwing your users is a safe way to lose early adopters, who are the only way for you to deliver a quality app fast. If you don’t get feedback from those enthusiasts (and you won’t if you piss them off by continuously throwing away the data that they’ve put in), you’ll lose.
Complex, and successful, approach
Identify which tables in your database will contain user-generated content. For the cocktail builder site, these are the mainly the tables with user ratings and user actions, and table for reporting crashes.
Your build script will no longer drop the entire database and rebuild it from scratch; it will drop all tables except for tables with user content. That is, as a part of the check-in process, split your output .sql files as following:
schema.sql: DDL for all tables except those with user-generated content
schema-extra.sql: DDL for tables with user-generated content
data.sql: actual INSERT statements for all tables other than those with user-generated content
data-extra.sql (optional): you get the drill. This is optional because there’s not always a point in including user-generated data into your build – just back it up on the production database side, and sync it to the dev boxes once in a while.
Your script to dump the dev databases may look like this file below (this is a batch script – I do my development on a Windows box; please don’t flame me, I’m built of a fire-retardant):
REM Full path to the MYSQL utilities. Required.
REM Database settings
REM Producing schema and data files for the build. They are
REM called "schema.sql" and "data.sql".
REM Using MySQLDump. Path is specified in the variable above.
REM ----- Delete old versions of the files.
echo Generating core schema and data files...
REM ----- Using settings specified in the configuration file, ignore
REM ----- a few tables that shouldn't go into the incremental build
REM ----- (user-generated content should not be discarded on upgrade).
REM ----- The schemas for those tables will go into a separate file
REM ----- that can be used to initiate a "full from scratch" build.
REM ----- Content for those is not a part of the build.
echo The following tables will not be included in the core build:
FOR /F %%t IN (userContentTables.conf) DO (
echo - %%t
SET ResultSkipString=!ResultSkipString! --ignore-table=%DBNAME%.%%t
REM ----- Dump the actual SQL schema/data creation scripts...
SET CommonDumpSettings=--comments=false --user=%DBUSER% --single-transaction --no-set-names --skip-add-locks -B %DBNAME%
"%MYSQLDUMP%" --result-file=schema.sql --no-data %ResultSkipString% %CommonDumpSettings%
echo USE `%DBNAME%` > schema-extra.sql
"%MYSQLDUMP%" --no-data %CommonDumpSettings% %ResultIncludeString% >> schema-extra.sql
"%MYSQLDUMP%" --result-file=data.sql --no-create-db --no-create-info --skip-extended-insert --skip-disable-keys %ResultSkipString% %CommonDumpSettings%
And the mentioned file userContentTables.conf is just a newline-separated list of table names that contain user content.