Sunday 27 March 2022

Mainframes and wicked problems

I know what mainframes are, but what’s the definition of a wicked problem? Well, Wikipedia tells us: “In planning and policy, a wicked problem is a problem that is difficult or impossible to solve because of incomplete, contradictory, and changing requirements that are often difficult to recognize. It refers to an idea or problem that cannot be fixed, where there is no single solution to the problem; and ‘wicked’ denotes resistance to resolution, rather than evil. Another definition is ‘a problem whose social complexity means that it has no determinable stopping point’. Moreover, because of complex interdependencies, the effort to solve one aspect of a wicked problem may reveal or create other problems.”

So, what is the wicked problem that I have in mind? The answer is ransomware. What happens when the IT team suddenly realize that there has been a breach on their network? How do they react to this crisis situation?

According to IBM’s “Cost of a Data Breach Report” 2021, the average cost of a breach increased from $3.86 million to $4.24 million. The report also said that on average there are 212 days between an attack starting and a ransomware demand appearing. And then it takes on average another 75 days to contain the breach.

And if you think that your mainframe is probably safe because hackers usually focus on fintech companies, think again. The annual X-Force Threat Intelligence Index 2022 from IBM Security looks back over 2021. It points out that manufacturing became the most attacked industry in 2021 suffering 23% of all ransomware attacks. In previous years, financial services and insurance companies had always been the biggest targets. The report also found that the exploitation of unpatched software accounted for 44% of ransomware attacks in 2021.

So, why is ransomware a wicked problem? The answer is all down to ambiguity. For the IT team, there are a lot of things they don’t know, such as when the attack started, what data has been affected, and, perhaps more importantly, what system and application software (infrastructure) have been compromised by the hackers.

It’s a bit like the Johari window. There are things you know you know. Things you don’t know you know. Things you know that you don’t know. And thing you don’t know you don’t know. These are the unknown unknowns. And that’s what makes dealing with a hack so hard. It’s even worse if the ransomware messages start arriving on every terminal and printer because management (and everyone else) is shouting at IT for answers. Who is responsible for the breach? When did it occur? What files have been corrupted? Has our infrastructure been compromised? And answers to these (and many other questions) need to be answered very quickly because senior management have called a meeting to decide whether to pay the ransom or not.

In addition to the wicked problem, there’s now the stress of everyone blaming IT and demanding results. How do the brains of IT staff respond? In an ideal world, people would weigh up the most logical answers and come to a conclusion. They would channel their inner Mister Spock. But the ambiguity and pressure mean that they are not in the logical part of their brain (the cerebral cortex), they are in the more primitive part, the limbic system. Here, messages reaching the brain are run past the amygdala, which then decides on the appropriate emotional response – like joy, anger, disgust, sadness, surprise, and fear – and in an ambiguous situation, fear is the likely response. And a fear response causes the hypothalamus to respond by getting the body ready for a fight-or-flight response. In fact, the hypothalamus does two things. Firstly, it uses autonomic nerves to cause the adrenal medulla to start releasing adrenalin and noradrenalin. This is called the SAM pathway (sympathomedullary). Secondly, and slightly slower, it releases hormones that cause the pituitary gland to release hormones that cause the adrenal cortex to produce cortisol – better known as the stress hormone. This is called the HPA axis (HypothalamusPituitaryAdrenal). And, as your body fills with these fight-or-flight hormones, you get amygdala hijack – where the logical part of your brain gradually gets cut off from this primitive limbic part. And, so, logical thought becomes almost impossible. Your body has gone into survival mode. Some people call it cognitive narrowing.

What can be done to prevent the people who are needed to logically solve the problem of a breach going into survival mode and becoming almost incapable of logical thought? What do the fire service do so that they can send men into dangerous situations regularly?

The first thing to do is to practice the situation over-and-over again until it become second nature. It’s often suggested that organizations set up an incident response team that trains together regularly. That way they know what the other team members are going to do in a situation and they can do their part. Habits are stored in part of the brain called the basal ganglia. Accessing them doesn’t involve the logical part of the brain. So, they can be used in the worst situations – like having the CEO standing behind you while you try to resolve the situation.

A useful addition is the use of decision trees, particularly fast and frugal trees. These are like flowcharts, where the consequence of one event leads either to another choice or to a decision. These are often used in high-stress situations (like fire fighting or operating theatres) to make decisions that don’t ignore important information that could otherwise result in a tragedy. The fast and frugal tree provides a simple way to indicate that each task has been completed and nothing important to the task has been forgotten. One of the first tasks for the incident response team would be to create the fast and frugal tree. And then keep it up to date as training proceeds.

Organizations – ransomware is a whole company problem – need to create and train incident response teams to deal with potential breaches in a way that can prevent what would otherwise be a wicked problem causing them to experience amygdala hijack and stopping the ransomware attack being dealt with effectively. And IT teams need to ensure that they have taken all possible preventative measures to stop a breach occurring in the first place.

No comments: