Operability.io 2016 – Operations is crucial (Day 1)

This post was written by Jovile Bartkeviciute.

After the success last year, the Operability.io returned to London on Monday 19th September. The two-day conference was single line, same as last year, and the speakers were hand-picked and carefully chosen by the organisers. Day 1 started with an opening speech by Marco Abis, with Mark Burgess delivering the first talk.

Buy the Team Guide to Software Operability book – operabilitybook.com

Here are some quotes and key ideas from each of the talks:

HOW DO YOU KNOW YOUR SYSTEM IS WORKING WELL? – Mark Burgess

How do you know your system is working well? Generally, the answer is more complicated than the usual “The system is working well when the users are happy”. First, we need to define every single word in this sentence – what does “we”, “know”, “system”, “working” and “well” mean?
Modularity is a red herring – we are taught that modularity is a good thing, but in practice it often runs flat. (Tidiness is not godliness)
Basics of promise theory:
- The agent can only make a promise about it’s own behaviors.
- Obliging others is an ineffective strategy.
- An agent can only assess other’s promises from its own perspective.
- Any reliance (dependency) on another agent invalidates a promise.
Clearly defined desired state and desired outcome allows the system to converge.
- Clear definitions are essential – need to have a clear set of promises.
Local time trumps space, because space is non-local time!
“Systems are made of people as well as machines”.
When designing a system, we usually base it on what a human can do, but to scale it, we need to take the human out of the picture and design for the machines.
Repairing systems quickly is the best thing you can do to know if your system is working well.
A promise is not a guarantee – there are no guarantees.
The Netflix approach keeps systems human-sized so that they always have clear human intent.

To conclude:

Know – a relationship between the system curator and its agents
System – a collection of agents collaborating by promises
Working – promises made and kept
Well – what stats about promise keeping?

If we cannot formalize these, we cannot answer the question.

Slides – http://www.slideshare.net/MarkBurgess32/september16

SERVICES: BEYOND MICROSERVICES – Kaimar Karu

Popular view of operations team – “No” people.
If the developers can do everything themselves, why do we need operations?
- Why operations? -good definition by Charity Majors
  - Scalability
  - Resiliency
  - Availability
  - Maintainability
  - Simplicity in complex systems
  - Instrumentation and visibility
  - Graceful degradation
Operations is still relevant – NoOps is not the best idea. While your system might be working fine without operations, when you will want to scale – you will most likely fail.
“Blameless is hard to achieve, but we can start by being more blame-aware.”
When you try to implement something new, you need to make sure that the changes match the vision of the entire organisation.
Scientific method for experimenting with operations:

“WHY WOULD ANYONE DO OUT-OF-HOURS SUPPORT FOR FREE?” – Sarah Wells

Sarah Wells talked about why DevOps, what they did and the hard stuff.

To do Devops properly you need to be ready to change the culture, which is the most difficult part.
Main ideas of how they implemented DevOps:
- Automation and Infrastructure as code
- Collaboration – tech ops joined other teams.
- Created “Ops cops” – and dedicated Kanban line
- You build it – you run it – if you aren’t doing this, you aren’t doing DevOps

Operations spend their entire career stopping developers from breaking stuff. Development teams needed to prove that they care to the operations. Trust has to be built.
Tools and standards are better than platforms – they let you pick the things for you, platforms are too generic.
Don’t treat every decision like it’s irreversible.

Slides – https://speakerdeck.com/sarahjwells/why-would-anyone-do-out-of-hours-support-for-free

ACHIEVING CLOUD-NATIVE OPERABILITY – Casey West

Old and busted? – Not cloud native. New and good? – Cloud native.
Definition:
- Continuous delivery, DevOps, and microservices descrive the why, how, and what of being cloud-native. In the most advanced expression of these concepts they are intertwined to the point of being inseparable.”
“A microservice is an application small enough that an engineer new to the source code can reason about it in a day or less.” – Kenny Bastani
“There is nothing more expensive than going to the wrong direction.”
Cloud-Native Operability is:
- Microservices architecture
- Continuous delivery process
- Devops culture
- Platform automation
Microservices are for people, not machines.
It does not matter how beautiful your architecture is, how easy deployment is, or how great your culture is if production is tire fire.
DevOps culture – you can’t buy this: collaboration, automation, learning, measuring, sharing.
It is important that the people who build the system take responsibility for the operations.

Slides – https://speakerdeck.com/caseywest/operabilityio-2016-achieving-cloud-native-operability

TAMING YOUR CONTAINERS – Liz Rice

Liz Rice started her talk with a poem illustrating the power of naming things (and pets) clear and run a live demo on making a container image.

Naming is important, it makes promises.
Tags are good, but in your mind, read ‘latest’ as ‘default’, ‘latest’ – is not necessarily most recent.
Using Docker tags to mess with people’s minds – https://medium.com/microscaling-systems/using-docker-tags-to-mess-with-peoples-minds-367bb2c93bd0#.u7ti5y6lq
Labels – they are great, because they are immutable – people are more likely to be clear when using them.

THE ROAD TO GROWNOPS – Daniel Otte & Tom Shacham

Daniel Otte and Tom Shacham delivered the most creative talk of the day – they have prepared a double act retrospective on the topics of operability of the team, operability of the devops and operability of the community.

To improve the culture of an organisation – ‘kill it with kindness’ always works!
If, as a user, you see any bad things, errors, etc., – contribute, give criticism – this is how systems improve.
GrownOps:
- Working together over delivering fast
- Grow up and get to know each other
- Contribute instead of criticise
- You are always in a state of partial knowledge
- Ask for information don’t state judgement
- Systems are built on trust

Slides – http://www.slideshare.net/DanielOtte3/the-road-to-grownops

LESSONS FROM DATABASE FAILURE – Colin Charles

Colin Charles talked about backups (and verification), replication (and failover), security (and enryption).

Learning from database failures – “I think what we mostly learn from database outages is what NEVER TO DO AGAIN”
Why replicate:
- scale out,
- (automatic) [master] failover
- geographical redundancy across multiple data centres
- online schema changes.
In conclusion, to avoid system failure:
- Use semi-sync replication with failover solution that ensures you don’t failover too often.
- Make good backups. Test them. Save them.
- You’ll most definitely need to shard your data, use proven frameworks and get a proxy involved. Complete backups with multi-source replication when needed.
- Use mysqldump and xtrabackup together (and mydumper for parallel backup/restore; mysqlpump)
- Security is key: prevent SQL injections, encrypt your data at rest.

And the last talk of the day:

CENTRALISING THE RIGHT THINGS – Tom Booth

Build a central team to empower and support others.
Deploying continuously reduces risk and is better for users.
A team should be in control of its own architecture and infrastructure.
You need a team that enables great Ops not owns it directly.
Give teams room to experiment, to find what works for them.
Make the tools so easy to understand and so good that teams WANT to use them.
Work together and not separately – human aspect cannot be underrated.

Overall, containers are becoming the norm and many of the speakers mentioned that Ops is still relevant and NoOps is not the best idea, human interaction and improving company culture is key.

>> Day 2 <<

Here is what other people thought:

Great first day at @OperabilityIO , good talks and a really nice crowd 🙂 shoutout to @capotribu for an awesome single track conference!

— Daniel Otte (@MoanOps) September 19, 2016

The importance of name "promises" for autonomous agents (containers) by @lizrice – this is an important topic, much underrated! #OIO16

— Mark Burgess (@markburgess_osl) September 19, 2016

Enjoying @tombooth talking about devops at uSwitch. Lots of common ground with my experience at the FT #OIO16

— Sarah Wells (@sarahjwells) September 19, 2016

Funny double act retrospective from @tomshacham and @MoanOps at #OIO16 pic.twitter.com/Eeggyti4lL

— Matt Saunders (@cm6051) September 19, 2016

Following #OIO16 and #LGDSS from afar is really brightening my Monday. I'm learning a lot and really appreciate the sharing.

— Emma Allen (@_allenemma) September 19, 2016

Tools/standards vs platforms is a good debate. Answer depends on rate of scale-up of people/system I think. From @sarahjwells at #OIO16

— Gareth Rushgrove (@garethr) September 19, 2016

The good old self-immolation test of continuous delivery 😂😂😂 #OIO16 pic.twitter.com/crgjvNerfk

— Denise Yu (@deniseyu21) September 19, 2016

Recurring theme of operations as a platform at @OperabilityIO. Finally!

— Mark Imbriaco (@markimbriaco) September 19, 2016

Again, a strong focus on the need for measurement to understand system behaviour, from @markburgess_osl #OIO16 #operability @worldofchris

— Rob Thatcher (@robtthatcher) September 19, 2016

My favourite talk at #OIO16 today. Funny since that I generally consume the FT in print form. https://t.co/UnAyV3yK7O

— Scott Myц (@ScottMuc) September 19, 2016

First day at @OperabilityIO over. Good speakers, good vibe. Very informal and very friendly. Definitely recommended. #OIO16

— Kaimar Karu (@kaimarkaru) September 19, 2016

#Containers really becoming the norm? Almost everyone at #OIO16 raised their hands to say they're using them.

— Liz Rice 🇪🇺 (@lizrice) September 19, 2016

This may well be the database talk we deserve #operability #OIO16 @ Barbican Centre https://t.co/GN99evZjQY

— Meri “Exiled for the Good of the Realm” Williams (@Geek_Manager) September 19, 2016

Now onto Day 2!

Share this:

Related

Leave a comment Cancel reply