The cost of human error

The UK Club identified in its first Analysis of Major Claims in 1990 that ‘human error’ accounted for 58% of all its claims over US$100,000. In the years since, despite marked falls in certain identified causes (for instance, structural failure), human error has remained stubbornly high as the prime cause of accidents and claims. As a consequence, the Club has for some years sought a methodology for both defining and analysing human error in the maritime context, in the hope of finding ways of tackling this seemingly intractable problem. A close study of the work of researchers at Manchester and Leiden Universities in the 1990s, on behalf of Shell, has resulted in the Club adopting their methodology and producing its most complex DVD to-date in order to illustrate the underlying concepts.

The DVD No Room for Error graphically illustrates a proven methodology that shipowners can use to identify the propensity for human error to arise in their shipping operations. It also gives the Club the tools with which to analyse its extensive statistics to show where – and why – human error is most likely to arise.

No Room For Error seeks to differentiate between the acts and omissions of people at the sharp end, and latent system faults generated by the culture created and the decisions made by those in authority in the shipowner’s offices. Unlike the Club’s previous videos and publications aimed at addressing trade specific issues, No Room for Error is intended to form part of, or to supplement, a company’s long-term training programme as well as to stimulate debate within the company aimed at reinforcing the effectiveness of the ISM Code through the development of a permanent and robust safety and environmental protection culture.

Human error

Over the past two decades, there has been a growing appreciation of the many and varied ways that people contribute to accidents in hazardous industries, or simply in everyday life. Not long ago most of these would have been lumped together under the catch-all label ‘human error’. Nowadays it is apparent that this term covers a wide variety of unsafe behaviours.

Most people would agree with the old adage ‘to err is human’. Most too would agree that human beings are frequent violators of the ‘rules’ whatever they might be. But violations are not all that bad – through constant pushing at accepted boundaries they got us out of the caves!

Assuming that the rules, meaning safe operating procedures, are wellfounded, any deviation will bring the violator into an area of increased risk and danger. The violation itself may not be damaging but the act of violating takes the violator into regions in which subsequent errors are much more likely to have bad outcomes. This relationship can be summarised quite simply by the equation:

The resultant situation can sometimes be made much worse because persistent rule violators often assume, somewhat misguidedly, that nobody else will violate the rules, at least not at the same time! Violating safe working procedures is not just a question of recklessness or carelessness by those at the sharp end. Factors leading to deliberate non-compliance extend
well beyond the psychology of the individual in direct contact with working hazards and include such organisational issues as:

  • The nature of the workplace
  • The quality of tools and equipment
  • Whether or not supervisors or managers turn a ‘blind eye’ in order to get the job done
  • The quality of the rules, regulations and procedures
  • The organisation’s overall safety culture, or indeed its absence

Violations are usually deliberate, but can also be unintended or even unknowing. They can also be mistaken in the sense that deliberate violations may bring about consequences other than those intended, as at Chernobyl. In this case, out of the seven unsafe acts (active failures) leading up to the explosion, six were a combination of a rule violation and an error (a misventure). Here was a sad and remarkable case in which a group of well-motivated and exceedingly expert operators destroyed an elderly but relatively well-defended reactor without the assistance of any technical failures.

The distinction between errors and violations is often blurred but the main differences are shown in the table below:

Contrasting errors & violations in human error incidents

ERRORS VIOLATIONS
Stem mainly from informational factors; incorrect or incomplete knowledge; either in the head or in the world. Stem mainly from motivational factors. Shaped by attitudes, beliefs, social norms and organisational culture.
They are unintended and may be due to a memory failure (a 'lapse') or an attentional failure (a 'slip'). They usually involve intended or deliberate deviations from the rules, regulations and operating procedures.
They can be explained be reference to how individuals handle information. They can be understood only in a social context.
The likelihood of mistakes can be reduced by improving the relevant information: training, roadside signs, the driver-vehicle interface etc. Violations can only be reduced by changing attitudes, beliefs, social norms, organisational cultures that condone non-compliance (culture of evasion).
Errors can occur in any situation. They need not, of themselves, incur risk. Violations, by definition, bring their perpetrators into areas of increased risk, i.e. they end up nearer the 'edge'.

While errors may be simple memory or attentional failures, they can be exacerbated by:

Routinisation – the mark of a craftsman whereby the individual becomes so expert at exercising a particular skill, that he/she no longer consciously thinks about it allowing the mind to wander and the unexpected to happen – drivers who regularly travel the same route to the station each day suffer from this – ‘am I here already?’

Normalisation – the process of forgetting to be afraid – interestingly most accidents on mountains happen on the way down from the summit – only a relatively small number happen on the way up.

Intrinsic hazard – no matter how well you defend yourself the dangers ‘out there’ never go away – move outside your protective ‘bubble’ and something or someone will get you!

Creeping entropy – systems, policies and procedures grow old or fail to adjust to changing external factors thus increasing the propensity for accidents to happen.

Murphy’s Law – if it can happen it will happen, but there is also Schultz’ Law. Mr Schultz merely said that Murphy was an optimist!

The rules

We have already spoken about breaking the ‘rules’ but what precisely are they? Basically they are procedures written to shape people’s behaviour so as to minimise accidents. They are, if you like, standards designed to form part of the system defences against accidents. Defences are installed to protect the individual, the asset or the natural environment (all ‘objects of
harm’) against uncontrolled hazards and generally appear in two forms:

  • ‘Hard’ defences, provided by fail-safe designs, engineered safety features and mechanical barriers.
  • ‘Soft’ defences provided by procedures, rules, regulations, specific safety instructions and training.

'Soft' defences are more easily circumvented than 'hard' defences and thus constitute a major challenge to any safety management system.

To remain totally relevant and to avoid falling victim to ‘creeping entropy’, procedures are continually being amended and updated to cover matters such as changed working conditions, new legislation, new equipment and most particularly, to prohibit actions that have been implicated in some recent accident. Following an accident how often have you heard people
(usually senior) exclaim “and what did the procedures say?” It is a matter of fact therefore that over time these procedural changes become increasingly restrictive yet the actions necessary to get the job done haven’t changed and often extend far beyond these permitted behavioural boundaries.

Ironically then, one of the effects of continually tightening-up procedures in order to improve system safety is to increase the likelihood of violations being committed. The scope of permitted or allowable action shrinks to such an extent that the procedures are either routinely violated or violated whenever operational necessity demands. In either case, the procedures are often regarded as unworkable by those whose behaviour they are supposed to govern. Whereas errors can arise from various kinds of informational under-specification, many violations are prompted by procedural over-specification – a classic own goal you might say!

Getting to grips with the human factor

Performance levels

Now we come to the scientific bit. Error types can be classified at three levels:

  • At the skill-based level, we carry out routine, highly-practised tasks in a largely automatic fashion, except for occasional checks on progress. This is what people are very good at for most of the time.
  • We switch to the rule-based level when we notice a need to modify our largely pre-programmed behaviour in line with some change in the situation around us. This problem is often one that we have encountered before and for which we have some pre-packaged solution. It is called rulebased because we apply stored rules of the kind: if (this situation) then do (these actions).
  • The knowledge-based level is something we come to very reluctantly. Only when we have repeatedly failed to find a solution using known methods do we resort to the slow, effortful and highly errorprone business of thinking things through on the spot. Given time, trial and error learning can often produce good solutions. In an emergency however, because consciousness is also very limited in its capacity to hold information, usually not more than two or three distinct items at a time, our brain behaves like a sieve, forgetting some things as we turn our attention to other matters. In addition, we can also be plain scared, and fear (like other strong emotions) has a way of replacing reasoned action with ‘knee-jerk’ or sometimes over-learned responses.

Classifying violations

Case and field studies suggest that violations can be grouped into four categories namely: routine violations, optimising violations, situational violations and exceptional violations. The relationship of these to both the performance levels and error types is summarised in the table below with definitions following:

Routine violations – almost invisible until there is an accident (or sometimes as the result of an audit), routine violations are promoted by a relatively indifferent environment; i.e. one that rarely punishes violations or rewards compliance – “we do it like this all the time and nobody even notices”.

Optimising violations – corner-cutting; i.e. following the path of least resistance, sometimes also thrill seeking – “I know a better way of doing this”.

Situational violations – standard problems that are not covered in the procedures – “we can’t do this any other way”. An excellent example concerns railway shunters: the rule book prohibits shunters from remaining between wagons when wagons are being connected. Only when the wagons are stopped can the shunter get down between them to make the necessary coupling. On some occasions however, the shackle for connecting the wagons is too short to be coupled when the buffers are fully extended. The job can only therefore be done when the buffers are momentarily compressed as the wagons first come in contact with each other. Thus the only way to join these particular wagons is for the shunter to remain between them during the connection. The result can be fatal.

Exceptional violations – unforeseen and undefined situations – “now this is what we got trained for”. A simple example on an oil-rig illustrates the point: a pair of engineers were inspecting a pipeline. One of them jumps into an inspection pit and is overcome by hydrogensulphide fumes. His companion fully trained to handle such situations raises the alarm but then jumps down to help his partner, whereupon he too is overcome. Exceptional violations often involve the transgression of general survival rules rather than specific safety rules. Survivors of such exceptional violations are often treated as heroes. Exceptional violations can also be seen as an exercise of initiative even sometimes provoking reward if, that is, you get away with it!

Given that human beings are able to circumvent both controls and defences with sometimes quite remarkable cunning, the problem can be summed up as follows:

Finally there is the theory of sheep and wolves. Studies have identified two sorts of people – sheep and wolves. Wolves accept rule violation as a norm, sheep do not. This results in:

  • Sheep in sheep’s clothing
  • Wolves in wolf’s clothing
  • Sheep in wolf’s clothing
  • But the largest group are wolves in sheep’s clothing – they haven’t violated the rules. Yet!

Accidents

An accident or incident is an unplanned chain of events which has, or could have, caused injury or illness and/or damage to people, assets, the environment or reputation. Modern research has shown that the basic components of an accident can be shown as the simple ‘formula’:

And that by adding the concept of breached, or missing, controls and defences a simple accident can be shown diagrammatically:

But accidents are not as simple as this, because usually there are several breached or missing controls and defences. More importantly almost all accidents consist of a series of interlinking ‘events’, in which each event becomes either a new hazard or a new target in its own right. In the presence of further targets or hazards and new and further breaches of
defences and controls, a second event is created and so on. During accident investigations it is not uncommon to identify five, six or even seven interlinking events before the final event or accident becomes a reality.

The concept of the ‘event chain’ or ‘incident trajectory’ is shown in the diagram below:

Note the original (first) event resulted in a fire. In the presence of two new ‘targets’, i.e. an operator and a piece of equipment, the resultant double event led to a badly burnt operator (injury) and damaged equipment (asset damage). Because the immediate aftercare of the injured operator (first aid or paramedic treatment) was ineffective (new hazard), the operator’s
injuries resulted in a partial disability (final event).

Reverting to the simple accident diagram and the ‘formula’ in the text box on the facing page, if one of the controls or defences had not been breached there would not have been an accident. If detected the resultant ‘near-miss’ or ‘dangerous occurrence’ could still have been reported, investigated and acted upon as if it were the real thing.

The usual mechanism, whereby controls and defences are breached, is an unsafe act by an individual at the sharp end. Occasionally, they may be breached by an inherent unsafe condition but these too will invariably have been caused by the acts or omissions of people, which may be nothing more than a simple and unintentional mistake. As has already been
mentioned, such unsafe acts or unsafe conditions are generally referred to as active failures.

While active failures are interesting – indeed much can be learnt from them – a lot more can be learnt, and more effective remedial measures put in place, by addressing the sick camel in the first place.

'Conventional' view of accidents

Conventional wisdom (above), dictates that in order for an accident to happen, defences of some kind will have been breached, usually by an unsafe act, carried out in a specific situation and in the presence of hazards of some kind.

What changed this long-established view, which as a basis for the new model is still correct, was some highly original research sponsored by one of the oil-majors and carried out at two major universities, one in the UK and one in the Netherlands. The research originally set out to establish the role of the human being in the accident equation but very quickly established an ‘alternative’ theory of accident causation. Because of the triangular shape of the basic model of the theory, it became known as the ‘Tripodian’ view of accident causation. Basically it uses the ‘conventional’ diagram shown opposite, but adds a third component general failure types (GFTs).

This ‘alternative’ model of accident causation is shown below:

The research accepts that, properly investigated, there is much in a reactive sense to be learnt from accidents. It also recognises, that unsafe acts or active failures can be reduced using tools aimed at modifying human behaviour. The research suggested that the problem with attempting to learn solely from active failures is that; (a) there are potentially millions of
them; (b) they will rarely be repeated in the same way, and; (c) the circumstances in which they occurred will never be exactly the same. More importantly the research established once and for all that the ‘sick camel’ could be made considerably healthier by managing what are called the general failure types (GFTs) of which there are just eleven. Using a medical
analogy, the GFTs could be considered as the vital organs of the ‘safety body’. If properly managed in terms of their inherent health or strength, these could actually help prevent large numbers of accidents from ever happening at all. Once again, in medical terms it’s a bit like having a healthy heart and preventing heart attacks, or being vaccinated against pneumonia
or ‘flu – all designed to prevent illness in the first place. Thus rather than acting in response to an incident we seek instead to act before an incident.

The research, delved deep into the causation theory in order to establish a concrete link between breached defences and controls, and active and latent failures, thus the Tripod causation model was born – see diagram below:

The interesting point about this model, is that it introduces two new elements into the causation chain. First it provides a linking mechanism, known as the precondition, though sometimes referred to as the ‘psychological precursor’, between the active and latent failures.

Secondly, it introduces the policy maker at the very start of the chain, thus illustrating the clear relationship between commitment by the policy makers at the beginning of the chain and the results at the end of the day.

By comparing the diagram of the Tripod causation model above and the simple accident diagram on page 2, it should become

obvious that the link between the two is established through failed defences (for the target) and failed controls (for the hazard), thus the combined accident model, known as the Tripod-BETA tree complete with all basic components looks like this:

Bearing in mind that any accident consists of a series of interlinking events, a completed accident tree can be exceedingly complex indeed.

Active failures

Both defences and controls are breached by ‘active failures’. Active failures are the failures close to the accident event that defeat the controls and defences on the hazard and target trajectories. In many cases, these are the actions of people, i.e. unsafe acts. Human errors are implicated in at least four out of five active failures, but human error as we have already
seen is a broad term that includes a number of different sources of error. Not all active failures are human actions. Physical failure of controls and defences also occur due to conditions such as over-stress, corrosion or metal fatigue. These are often referred to as ‘unsafe conditions’. Having said that, human actions are often implicated as contributory causes to this
form of active failure but they are not, in themselves, unsafe acts. For instance, a designer may have failed to identify the need to use a particular high-tensile material in a specific circumstance, thus sometime later causing component failure.


Latent failures

As already mentioned, latent failures are the ‘vital organs’ of the safety equation. Latent failures are deficiencies, or anomalies, that create the preconditions that result in the creation of active failures. Management (the so-called policy or decision makers) decisions often involve the resolution of conflicting objectives. Decisions taken using the best information available at that moment prove to be fallible with time. Also, the future potential for adverse effects of decisions may not be fully appreciated, or circumstances may change that alter their likelihood or
magnitude.

The accident-producing potential of latent failures may lay dormant for a long time, only becoming apparent when they combine with local triggering factors – active failures, technical faults, abnormal environmental conditions or abnormal system states; some of which even the best HSE management systems will have absolutely no control over whatsoever.

Rather than dealing with an infinite number of active failures, it is reassuring to note that there are just eleven latent failures on which to work to ensure absolute good health.

The eleven latent failures, which constitute the general failure types (GFTs) are:

  • HARDWARE
  • DESIGN
  • MAINTENANCE MANAGEMENT
  • PROCEDURES
  • ERROR-ENFORCING CONDITIONS
  • HOUSEKEEPING
  • INCOMPATIBLE GOALS
  • COMMUNICATIONS
  • ORGANISATION
  • TRAINING
  • DEFENCES

Preconditions

Preconditions are the environmental, situational or psychological ‘system states’, or even ‘states of mind’, that promote, or directly cause, active failures. Preconditions form the link between active and latent failures and can be viewed as the sources of human error. They are best summed up in the following table which shows the connection between unsafe acts and typical preconditions.

The Tripod causation model can be further expanded to show the various ways of learning from; (a) accidents themselves; (b) from what are called observed unsafe acts and; (c) by proactively measuring or assessing the state of health of the eleven GFTs.

Note that all the improvement loops go straight back to the decision or policy makers. Note also, the specific mention of ‘unsafe act awareness’’ which is only one of many safety tools aimed at modifying human behaviour.

The same basic model can be used to find where in the event chain accountabilities would normally lay, a useful factor to consider when carrying out accident investigations (see diagram opposite).

The research led to the development of two useable tools: a proactive safety (and HSE) health check called Tripod-DELTA and a reactive accident investigation and analysis tool called Tripod-BETA.

Tripod-DELTA seeks to measure the inherent health of each of the eleven latent failures and displays the information as a DELTA Profile.

Tripod-BETA is a highly disciplined tool, aimed at establishing the total event tree and then each of the active and latent failures, plus their linking preconditions. The principle differences between a ‘conventional’ investigation and an accident investigation carried out in line with the Tripod methodology is summarised in the table overleaf.

Trainer's guide

These two sections, Human error and Accidents, are taken from the ‘train the trainers’ guide – part of the workshop package (see page 4). For further information, please contact your local UK Club representative.

Calendar

The 2012 Loss Prevention Calendar continues to focus on the Pro Active Risk Management system carried out by the Club's own Risk Assessors inspectors. Based on a proven simple methodology the department has worked with the Club's claims executives, underwriters and Members crews to focus on those things that frequently cause claims and to assess the likelihood of those claims occuring.

Global Network

Bermuda
+1 441 292 4724

New Jersey
+1 201 557 7300

San Francisco
+1 415 956 6537

London
+44 20 7283 4646

Piraeus
+30 210 429 1200

Isle of Man
+44 1624 645200

Tokyo
+81 3 5442 6110

Hong Kong
+852 2832 9301

Singapore
+65 6323 6577

Shanghai
+86 21 6321 7001

Beijing
+86 10 6310 1147