Disasters happen. When they do, are you ready to handle it with grace? In general people get good at handling events that they experience regularly, but high-risk disasters are managed so they don't happen often. It's not every day that power goes out at the primary data center, but when it does you want to be sure that your auto-fail-over actually works. You want to be sure your backups actually work.However, testing disaster preparedness often takes too much time time and creates little organizational value.
Once you've gone beyond a trivial number of Jenkins jobs you can get into a situation of not knowing which job does which thing. You might say "I know some job is running a query on a table, but which one?" in that case it can be helpful to search your Jenkins jobs.
Managing Jobs by script size
We have two strategies for managing Jenkins jobs: put short scripts into the job itself, but move longer jobs into code somewhere else.
Most jobs use the "execute shell command" Build Step. I also use and recommend the JobConfigHistory plugin to see how a job has changed over time (or who fixed/broke things). When the number of lines in your "shell script" section of the job gets over a maybe 3 or 4 I think it's time to move things to "real code" that is managed by a revision control system with more power than the JobConfigHistory plugin (i.e. git). So, we tend to put things into one of four places: R scripts, Pentaho jobs, Drush commands or bash scripts. All of the R scripts, drush commands and bash scripts are managed with git. The working checkouts of those directories are updated periodically (...by Jenkins jobs :)). So, if I want to grep that external code to see where a particular table is being modified it is very easy to do that. But...what about the one or two line shell scripts that are inside Jenkins jobs?
Searching Jenkins Job config files
First, you have to know about the Jenkins directory structure. The job configurations are stored in files called config.xml located in the Jenkins home directory (often /var/lib/jenkins/). So, if you have a job named production_deploy then the config file for it is located at /var/lib/jenkins/job/production_deploy/config.xml
For the past 1 year, 1 month, 1 week and 1 day I've been working at CARD.com. I love it. I've had a lot of great jobs in my career so far, but this is one that is truly extraordinary.
I'm currently pretty enthusiastic about a set of quotes from Jeff Bezos compiled at fool.com, so I'm sprinkling some of those through this post.
What is CARD.com doing?
Our CEO put it like this in a recent interview he gave:
CARD.com is the world’s first likeable financial company. We make payments fun, fair and fashionable. CARD.com offers Visa cards and MasterCard cards featuring card art and amazing perks from the best brands in the world, like Star Trek, Elvis or The Walking Dead.
And...that's a good description of what we do. But, what do I think we're doing that is exceptional?
- We're using a ton of open source software and contributing back where we can. That just warms my heart :)
- We're doing everything with an eye towards scalability. We have a lot of card designs and many more are coming. Some of our designs are big and some are small. We still want to delight the people with a "small" brand because to them that brand is their life.
- Bezos said "Your margin is my opportunity." and we're following that. We aren't aiming to be the cheapest provider, but we are undercutting a lot of other providers with what we believe is a much better product. That will help us scale and as we scale big we win. It feels great to provide a product that is competitive with other options available to our typical cardholder.
- Since we're scaling big, we sweat the small stuff. We review contracts to see how we can squeeze pennies or fractions of pennies out of different transactions.
We recently were reviewing proposals from two vendors. One vendor claimed 100% uptime. Another vendor claimed 99.95% uptime. Our SLA to customers is below both of those numbers, but 100% feels better than 99.95% right? So we should go with 100% right?
My experience is that the uptime number in an SLA is purely for marketing purposes. Pure. Marketing. Purposes. If you read 100% and think the service will be online for 100% of the time? Shame on you.
The really important thing is the detail behind the SLA. Here are a few tricks I've seen that make a 99.999% SLA roughly worth nothing.
- What are the exclusions? Most service providers are hosted somewhere (Amazon? Physical space?) that has it's own uptime guarantee. If that provider goes down is your SLA still in effect? Many SLAs exclude acts of nature like a hurricane that can take down a single provider.
- What do you get when the number is broken? Some contracts give you a credit. Some give you cash. Some give you a credit that is worth your monthly cost multiplied by the percent of time they were offline. Is that worth much to you?
- Do you get more if the outage is persistent? If a service dies for an hour that's a problem. If it dies for a day that is horrible. I want to be compensated more if the outage is prolonged.
- Whose monitoring counts? What kind of monitoring? I've had times where my monitoring (Pingdom) showed a site was offline for hours, but internal monitoring showed it was fine. I got no credits.
- What counts as "down" - if the service is online but taking 10 times longer than normal to process requests, is that OK? What if the service is online but network connectivity is degraded?
- How are periods of downtime calculated? An SLA I read only counted a full hour of continuous downtime as real downtime. Many outages are 10 minutes here, 20 minutes there. I want to be compensated for those as well.
Below is a ladder of feedback, ranked by usefulness in ascending order:
Levels of useful feedback:
- Silence (i.e. not giving feedback)
- X is bad.
- X is bad because Y
- ...instead I suggest Z
- ...instead I suggest Z because Q
- ...instead I suggest Z because Q. I'm happy to help with that.
- ...instead I suggest Z because Q. I've already done some (or all) of the work.
Earning bonus points
Regardless of which level you're on, you can get bonus points by following some simple tips:
OK, so it's not really "new." It's a little over a year old, but the Drupal community on Gittip just recently got over 150 people which is the threshold where Gittip starts displaying members of the community.
So, congratulations to the Drupal Community! (and to Gittip).
Why is Gittip a useful tool for the Drupal community?
I think a lot of people want to give a little support (via money) to the people who work on Drupal but there is a lot of friction in the way of giving money. Does the person use Paypal? What is their PayPal email? What if you only want to give someone $10? The friction of that whole transaction far outweighs the $10 gift.
There's also a mismatch in the action: most payment systems like a check or PayPal are used one-time (again, because of the friction) but the benefit from a contribution and the desire to pay back that benefit live on for as long as you have a Drupal site. A lot of these topics are discussed in an old Lullabot Podcast where they point out that if every active user of Views gave Earl a few dollars for their use of he'd have a bit over $2 million.
My perspective is that it hasn't been easy enough for people to give that money and the friction is holding them back. Gittip solves a lot of these friction problems and the timing problem by making it easy to do small recurring payments to anyone on github or twitter.
What does the evolution of our use of Gittip look like?
Among the people who have joined the Drupal community on Gittip:
I recently had to setup OpenSWAN on Ubuntu to be part of a site-to-site VPN with a Cisco ASA 5520. There are a few resources I used to get me there. It was hard to find these resources so I'm keeping track of them for myself and in the hopes it helps someone else.
- this amazing page, about the same thing on ec2 vpc
- man ipsec
- configuring openswan ipsec server - great advice including some "gotchas" and troubleshooting ideas
- linuxjournal backgrounder on ipsec and vpns
- openswan installation and configuration tutorial
- A nice chart about cidr notation in ip addresses
- A gist about setting up openswan site-to-site on ec2
- A serverfault article about two people doing this behind some routers
My requirements were:
- local ike peer IP address: 22.214.171.1241
- remote ike peer IP address: 126.96.36.199
remote: also want all addresses in 123.45.0/24 to be addressable
Authentication: pre shared key
- Encryption Scheme IKE
- Diffie Hellman Group: Group 2
- Encryption Algorithm: AES-256
- Hashing Algorithm: SHA1
- IKE Negotiation Mode: Main mode
Lifetime (for renegotation): 480 minutes
Phase 2 Encapsulation: ESP
- Phase 2 Encryption Algirithm: AES-256
- Phase 2 Hashing Algorithm: SHA1
- Perfect Forward Secrecy: No PFS
- Lifetime (for renegotiation): 480m
And here is roughly what my /etc/ipsec.d/connection.conf looks like: