Crowdstrike CEO George Kurtz was only CTO at McAfee when he was responsible for delivering the world his first tech disaster. The scenario was the same as the current debacle. An update for security software, released without basic testing crashed host operating systems. As I browse Crowdstrike’s careers page today, I see no openings for Quality Managers. There are a few openings for SDETs and one SDET manager.
What I find most interesting about the current debacle is that it could have been discovered by a simple test. This test could have been designed, implemented, and executed by anyone in a development team. All it would have taken is a single person to think “What would happen if someone shipped an empty or malicious .sys file?” It’s even possible that someone wrote an automated test for this simple condition. However, if that case exists, it wasn’t a part of strict release criteria. It should have been.
We may never know the details of why the update was shipped. I’ve been in these meetings as engineering leaders in unhealthy companies scramble to shift the blame to another team. All too often teams that are primarily tasked with testing are the favorite targets. I’ve also seen the process handled in healthy organizations where the root-cause analysis is done with the objective of correcting the underlying system issues, instead of blaming individuals. Have no illusion, there is some poor individual who made the decision to roll out this update. Clearly, they did so without enough information.
The economic cost of this particular failure will be counted in the billions or trillions of dollars. Crowdstrike may experience a short term brand and sales damage in the wake of this entirely avoidable outage, but they will not change their corporate structure. They won’t fix the problem. Whenever you have software quality professionals beholden to managers with a strictly engineering focus, they do not have enough leverage to create change. The ultimate culprit, the CEO (or CTO) pushing for lower and lower costs, will escape culpability. Many times they choose to staff their quality departments with the cheapest (and most inexperienced) possible options, because they know that they are protected by the pages and pages of legalese in the user agreements. I’m certain (without even looking) that the user agreements for both Windows and Crowdstrike software absolve them of any actual financial responsibility for disrupting productivity for their worldwide customers.
I also know there will not be a new executive board member dedicated to enterprise quality process in either company, because we consumers have no leverage. That said, in the future I’m going to be adamant about asking for proof of testing when executing global releases or evaluating vendor solutions. You should question any corporation that does not offer proof of test although very few could deliver it.