Jenkins + Drush + Dropbox = Easily share sanitized database projects

I recently wrote about setting up Jenkins. My next step was making it do something useful to help our team become more efficient. In most any team it's likely that you'll get some folks for whom "just use drush sql-sync" is not a reasonble solution.

My goal: get a database backup into dropbox on a regular basis and make sure no sensitive customer data is in that backup.

Make a Database backup of the live site

We're running jenkins on a non-production server (for a variety of reasons). So, we get a backup of the live database into a temporary scratch database using the drush aliases feature. That process sanitizes it a bit using the sql-sanitize feature of drush. Then we dump out that database.

  1. Start with an up to date checkout of your live site's Drupal code
  2. Use the multisite feature and create a sites/example.prod/settings.php where the $databases array has a set of read-only credentials to the production database
  3. A second "site" at sites/example.scratch/settings.php
  4. Setup a Drush alias that points to those two sites inside the Drupal - be sure to use the 'uri' element so that drush knows which set of credentials to use inside the sites/ folder:

    $aliases['example.prod'] = array(
    'root' => '/var/lib/jenkins/example_scripts/example_com_checkout_for_drush/',
    'uri' => 'example.backup',
    );
    $aliases['example.scratch'] = array(
    'root' => '/var/lib/jenkins/example_scripts/example_com_checkout_for_drush/',
    'uri' => 'example.scratch',
    );
  5. The example.scratch credentials should point to a "scratch" database that is used just for these purposes.
  6. Finally a line in the Jenkins job to copy the database from the live site to the backup.
    drush sql-sync @sitename.prod @sitename.backup
  7. Run the "drush sql-sanitize" command on it to get drush's default sanitization. If you have the paranoia module installed it will do a few extra sanitization steps. For more on some ways to sanitize a Drupal database checkout this Drupal Scout article.
  8. Bonus points: Write your own sitename_drush_sql_sync_sanitize hook just like the paranoia module does - you can clear out data that is important to your site
  9. Once that's done, you export the sanitized database:
    drush sql-dump > sitename.prod_sanitized_backup.sql

Send the result to Dropbox

There's Dropbox page for Linux that even has command line instructions. If you're not sure if you need 64 or 32 bit do a "uname -a" on your server. x86_64 is 64 bit.

I also downloaded their dropbox.py script to manage dropbox. I wanted the script to work regardless of whether the dropbox daemon has properly restarted (it feels like a likely point of failure). So, we have a Dropbox account that is connected just to the server and frequently our jobs will have a last step of moving their contents to ~/Dropbox/something and then they kick off the command ~/bin/dropbox.py start

To be honest, the Dropbox CLI mode has been a real pain. Some random TIF files don't synch to the server.

Bits and bobs

For us, this script runs every morning. After a few minutes Dropbox has synched it to all their machines and they can easily import it to a local environment using drush, mysql command line, PHPMyAdmin, or whatever their favorite tool is.

It does a few extra things:

  • It bzips the output before putting it in Dropbox.
  • It does an "rm -f *.sql; rm -f *.bz2" at the beginning and end to remove any leftover cruft.
  • A downstream project uses the output of this step to rebuild a test site where we get the latest code from the testing branch, the latest prod database and people can test it out without having to have a local Drupal installation.

Related: Jenkins is CARD.com open source project spotlight article.

Thanks to c4rl for his help editing.

Comments

What's wrong with sql-sync?

Hi! Thanks for sharing. We have almost the same setup, including Jenkins and Dropbox. But we store the alias files in Dropbox, not the actual dumps: http://www.bariswanschers.com/blog/working-drush-alias-files-teams

Just wondering why saving dumps instead of the alias files would be better.

I think it depends a bit on

greggles's picture

I think it depends a bit on the team. My assumption is that eventually every team will get someone who is comfortable dealing with a dump file, but not comfortable dealing with the command line. Or at least, that's been the case with most every team I've worked on ;)

Read-only access

If your prod site is set up with read-only access to the database, how will you be able to create content on the site? Or is your prod site in this example a clone of the actual prod site connecting to the prod database remotely?

Those credentials are for the

greggles's picture

Those credentials are for the drush user in the sandbox environment. It connects to the prod db to make a dump, but shouldn't have the ability to write to the database.

The settings.php of the prod site has credentials for the production database that are read/write.

Thanks for sharing this

Thanks for sharing this program with us. My wife is a software engineer and I will refer this to her. She could give a better explanation on this. Keep posting more updates in your blog. We would like to hear more from you.
http://seomarketingsrv.com/

Post new comment

The content of this field is kept private and will not be shown publicly.