MTTR > MTBF

To paraphrase John Allspaw:

it’s more important to have good monitoring of your system, so that you are aware when an error occurs and therefore act on it. A lot of systems fail without the relevant parties ever being aware that something’s wrong.

The MTTR > MTBF idea is true for most types of failures.
The more complex a system is, the more difficult it is to predict failures, making it more important to be notified of failures, & recover quickly from there
Worth noting, MTTR is one of the key metrics used in State of Devops reports, & the book Accelerate, when measuring some stats from high performing vs low performing teams.

Other takeaways from the Infrastructure As Code video

Immuatble server, rebuilt on each config change. Less likely to have a snowflake server if you keep on rebuilding it.
- A snowflake server is one that people avoid touching, in fear of messing things up.
- What we’re after is a phoenix server, one you can confidently destroy & rebuild. It allows you to lean into the strengths of deploying in the cloud. Create & destroy as you need, as opposed to long lived servers that are never rebuilt.
No ssh’ing into the server to change configs
small changes rather than batches: results in less errors, easier rollback, less risk
Keep services available continuously. No going down for maintenance
- Blue green deployment