We already have plenty of evidence, empirical and anecdotal, to indicate that use of automation and orchestration in production environments is not an anomaly. In fact, it appears to be accelerating as NetOps teams try to catch up to their DevOps counterparts.
The pressure to reach automated parity with app development environments can lead to skipping the strategy and going right for the tactical approach to adopting a more agile, automated means of making changes to the production pipeline.
That’s not a good thing. Production is not development, and the blast radius is significantly larger in production where there are hundreds -- sometimes thousands -- of applications and business processes relying on shared networking services. You can’t fail fast enough to avoid incurring damages when something goes wrong.
So as automation and orchestration become the norm in production environments, NetOps teams should be mindful of which DevOps practices they embrace and which they don’t. Because when bad habits are really hard to break, the best option is simply to avoid forming them in the first place.
To help you out, here are the top three bad habits you should avoid when adopting DevOps for production network automation and orchestration:
3 Bad Habits NetOps Should Avoid
1. Skipping the code reviewThe State of Code Review 2017 from SmartBear, a supplier of software-quality tools for teams, notes that 74% of developers participate in code reviews. That sounds good, until you realize that means the other 26% aren’t. Unsurprisingly, the No. 1 reason cited for not reviewing code at desired levels is workload.
This is how defects and bugs (excuse me, "undocumented features") creep into software. These are logic and security-based mistakes that can lead to crashes, outages, memory leaks, and even breaches. When you’re writing scripts, and integrating multiple services to automate and orchestrate a process, you are writing code. And if you are writing code, it needs to be reviewed by someone other than you.
Remember, this isn’t testing or QA where you can mess up and it doesn’t impact the business’ bottom line. This will be production, and a single mistake can lead to all sorts of problems. Make the time to conduct code reviews. The benefits are well-documented and include:
- increased quality of code with higher chance of identifying and eliminating security flaws
- knowledge sharing -- others learn the process along with the code
- compliance (ISO 9000/9001)
According to a 2016 survey conducted by Software Improvement Group and O’Reilly, 70% of respondents "believe that maintainability is the most important aspect of code to measure, even as compared to performance or security."
I hate PERL, and I’m not all that fond of Python. So I’m going to use node.js instead. Or maybe I’m just going to craft some incomprehensible command-line magic with sed, awk, and my friend grep to push this change to that router. Problem is, no one else uses node.js and that command line relies on my system-specific configuration.
That is not maintainable, and using “whatever language/tool/system” you want to build scripts and services to automate networking makes embracing code reviews really, really hard. It won’t go well for you. If no one else can maintain that code, it becomes yours. For life.
It’s like the goldfish you begged for when you were eight and now you’re stuck with it.
Standardizing on languages, tools, and systems early is important.
3. Ignoring security Rule Zero
Every AD&D (Dungeons and Dragons) player, at least all the ones I play with, know about Rule Zero: “The Dungeon Master is the final arbiter of all rule decisions.” It supersedes all other rules in the game, hence the reason it is numbered as zero. In security, we also have a rule zero: “Thou shalt never trust user input. Ever.”
A number of high-profile outages were caused by ignoring this rule because command-line parameters passed to any script are, by default, user input. Ignoring this rule may trigger one a resume-generating event by accidentally causing an outage of extreme proportions.
Never trust user input explicitly.
Whether that’s the IP address of a wiring closet switch or a variable passed to inform a firewall script which port to open or close, don’t blindly execute on it. Instead, always validate input and, if necessary, force the human invoker of the script to verify the input. After all, they might not have meant to push that configuration change to every switch.
As you proceed with efforts to automate IT in 2018, pay close attention to the habits you’re forming. Avoiding these three bad habits will go a long way toward ensuring a successful and productive year.