teaching/learning systems

feature – to get the results we want from a system, or prevent it from generating undesirable events, we need to understand how it works and behaves in the real world. that’s exactly what jidoka does.

words: michael ballé

errors – let’s call them undesirable events – usually follow the same pattern and the analysis of them usually identifies three obvious culprits: a faulty part or equipment, a person at fault doing their job poorly due to wrong assumptions or confusion, or information that got lost, misdirected or simply wrong. the two last adverse events i witnessed in a hospital were a cardiac problem, undetected because the ecg machine was malfunctioning, and a psychiatric patient sent to a psychiatric ward with a badly deteriorating pneumonia because his psychiatrist remembered him healthy as a horse in their last visit three months prior and didn’t think to check there could be something biologically wrong with him. in both cases, the information was in the system but failed to connect with the decision-makers.

finding what went wrong is a matter of patiently going through every event and spotting when the actual deviated from the expected – relatively easy to do in hindsight, extremely hard in foresight because it’s impossible to know in advance which variation will have which effect. insisting to follow every procedure to the letter usually results in bogging things down until nothing at all happens and every step takes infinite time and heartache. on the other hand, procedures are there for a reason and one never knows which variance can or will have catastrophic effects.

however, when one asks why things went wrong, answers are immediately murkier. few people commit actual violations – deliberately doing something wrong for their own gain. mostly, they make mistakes, just like parts fail with use and information gets miscommunicated. looking for answers often leads to the “culture” or the “system”, both driven by incompetent, immoral or negligent leaders who have repeatedly ignored calls for more work on safety or pushed so hard for performance that people, understandably, have been driven to take short cuts and – wham. such systemic explanations are undeniably true in some instances and there are documented cases of noxious cultures encouraging toxic managers and bullying people into doing the wrong thing, but the wrong thing happens also when the culture is neither particularly bad nor obsessed with performance. culture-level explanations are riveting, yet hardly helpful since no one quite knows how to change cultures other than change leaders – and even that has unexpected uncontrolled effects (culture eats strategy – and leadership – for breakfast, as the saying goes).

what kind of “in between” explanations could we build to explain failure? if we put aside both the “broken part, incompetent person” and the “culture” accounts, how could we build a rationale for adverse events that avoids both extremes?

we can recognize that, first, systems do have a mechanistic goal-oriented aspect – they’re designed that way in any case. second, the system’s processes are always actuated in context, in local conditions, that can range from “in control” (far from the cliff’s edge) to “out of control” (one step away from chaos), and these local conditions will significantly affect how the system performs. finally, humans are voluntary agents: they act on their own will, according to their lights, and choose to get involved or not, to express their intuitions or feelings or not, to act out or not. by looking at how these three factors converge, we can draw a different picture of how things happen.

human systems rarely behave as they were designed, but are designed, nonetheless. an emergency ward is designed to process patients in terms of urgency and pathology type. it is not designed to handle large swings in influx, contrarily to a supermarket checkout where more tills can be opened or closed according to the size of the lines. systems have a mechanistic “process” dimension that gets them to achieve one or two main goals with long list of constraints, such as regulations, budget allocation, and available resources. the system is meant to handle generic cases and satisfy its client population overall – this also means that it is rarely conceived to take care of you and your specific circumstances. the emergency personnel will hardly care that you have a flight to catch if your complaint looks benign (to them).

systems have a baseline. they have goals and subgoals. they have defined processes and activities to achieve these goals. they have controls, to make sure performance, regulations and processes are carried out according to standards. they are designed for average cases in average conditions, with specific procedures to deal with special circumstances (which they rarely do well). to understand how the system works, we can start by outlining its mechanical set point: what it will try to achieve in any circumstance – whether it’s appropriate or not.

secondly, systems are actuated in local conditions. although the system is designed for average circumstances, very often real life happens, and conditions are not what was expected. conditions cannot be systematized (and included in the mechanics of the system) precisely because they vary. one needs to rely on human experience and judgement to figure out conditions: is it a normal day or an abnormal one? was the chief physician in a good mood or a bad mood? is the situation clear or are there so many things happening at the same time everyone is confused? is there rapid access to experts that understand these conditions or are the people left to make decisions on their own? is there a culture of high or low expectations? and so on, the questions can be endless. local conditions will translate into, first, friction: difficulties in achieving the process step due to ambiguity, chance, uncertainty, or simple entropy and all the wild cards the real world can throw at a process. friction is the key to understanding local conditions rather than having a priori ideas, because friction is recognizable, and it is possible to ask what condition creates this friction.

locally, friction can lead to repurposing: people adapting the system to local conditions for their own use. maintenance of the same type of equipment, for instance, can vary widely according to local conditions: how far are they from the manufacturer and how easy is it to get expert coaching and local training, how keen is the local manager to spend the maintenance budget on maintenance or is she using it for something else, etc. repurposing of the system is an adaptation to local conditions where people change the goals just to get by.

repurposing is normally gradual – it doesn’t happen all at once, but more in a slow drift as people are defeated by local conditions and progressively accept abnormal conditions as normal. habituation is a driving force in human beings and people will basically get used to everything and stop being aroused by situations they know should be intolerable, but which have become routine. starting with friction and leading into repurposing, local conditions are the second element that can help understand system behavior, if we look into them closely enough (and with enough of an open mind).

thirdly, people have intentional actions: they will willfully intervene or, conversely, willfully decide not to. silence or violence are typical reactions to conflict and frustration, and people can be expected to react very differently to the same situations. “don’t text and drive” is a simple rule and a no-brainer in terms of safety, and yet people will interpret that very differently according to how they see themselves, their identity or sense of their own character, according to their focus on the spot and the logic of the situation to their eyes (long stretch of road, slow traffic, etc.) and also according to their emotional state – whether they’re focused on driving, bored out of their skulls, or driven insane by an urgent problem that needs to be resolved right now, for instance.

we are, by nature, creatures of mixed feelings. what we want changes from one moment to the next. what we think about a situation in the morning can change by evening. it is not by accident that most ancient philosophies seek ways to find some kind of peace – our minds are simply not put together for it. in our current world of machines and systems, we often treat people as if they were machines themselves, following set instructions and processes like robots, but that requires extreme pressure – it’s simply not how people think and behave. there is no one way of describing the complexity of human behavior, motivation, and decision-making, but for lack of a better system we still often break it down in the three age-old components of pathos, logos, ethos:

pathos: how emotionally aroused is the person? are they calm and reasoning things through, weighing arguments and considering perspectives? or are they emotional and reducing the situation to what they need to do right now, impulsively and instinctively, driven by their mood and subjective experience of right now? the same person will react very differently according to their emotional state (feeling good/feeling lousy) and how intense it is (feeling calm/being emotional).
logos: how solid is their logic? according to their understanding of the situation (and how strong their emotions and intuitions are), people can make either logical cause-and-effect analyses, where their behavior will logically lead towards a desired goal based on facts, or wild claims and “hail mary” intuitive jumps based on shaky assumptions.
ethos: how responsible and trustworthy are they generally? some people can be depended on more than others to accept accountability and do what’s best for the group, encouraging cooperation and mutual support as opposed to letting others down by doing what’s best for them in the moment – acting impulsively or negligently and not completing tasks or assignments as promised, and then blaming others for their shortcomings. can this person be usually relied on – or not?

with these three situational drivers in mind, we can now go beyond the “bad part, human error, wrong information” analysis of incidents and look deeper at: 1) how far from its baseline did the system behave? 2) what were the specific local conditions? and 3) who were the people present and how did they react? did they choose to intervene or not? did their interventions make sense or not?

if we look at it this way, we can see that a fourth element makes a critical difference to the outcome: how much did the people understand how the system works? system knowledge of what its baseline is and how it reacts in different conditions will greatly impact both the emotionality of the situation and the logic of the responses. if people know what the system does, they are more likely to feel able to intervene and to do so sensibly in order to avoid (or mitigate) an accident.

in most of the hospital incidents i see, doctors and nurses often don’t know how the computer and administrative systems work, and when they choose to intervene, they go a work-around that can make things worse – or conversely ignore something that should be done. in the average hospital, nurses will rarely isolate equipment at the first sight of trouble and pursue aggressively a maintenance fix through the system. they are so overwhelmed with tasks and know that any maintenance action is likely to take long, so they will continue to use the equipment until it fails completely – adding uncertainty and risk as they do so. they usually have very vague mental models of how steeply equipment degrades (or how bad the impact on patient of minor measuring misfits) and what it needs to get the equipment fixed or replaced.

wrong theories about how things work lead to both illogical and more emotional reactions as people feel more threatened by the situation and will react more intuitively – a common path to adverse effects. what people know about the systems they use has a significant impact on outcomes the moment the system is no longer in its baseline conditions.

consequently, systems can be designed in two ways: either as a black box where the user has no way to understand how the system works – most complex systems these days – or as teaching devices where the user can look into the system to understand it as they use it. we have seen cars move from teaching systems, where car owners could fix basic stuff in their automobiles, to increasingly black boxes made to keep the driver – and indeed, to a large extent, the mechanic – out of the engine.

black boxes are attractive to system designers because they feel – rightly, no doubt – that the less people tinker with the system, the better the chances it achieves its baseline behavior. the assumption here is to use the system until it breaks and then either call maintenance or replace it. this assumption, however, does not take into account all the intermediate states users find themselves in when the system is neither in baseline conditions nor has catastrophically broken yet – exactly where action can avert disaster if only users knew what actions to take.

jidoka systems, with devices like andon, are designed to educate users. from sakichi’s original automatic loom that would stop when a thread in the cotton broke to today’s andon-driven factories, such systems are made to spot their own abnormality, stop-and-call a user so that they can look into and replace the faulty component (or retrain the operative). in doing so, such systems teach the users about themselves and how they work to build a more accurate mental picture of both baseline and, through experience, how the system behaves in different external conditions.

as systems become more complex and sophisticated with ever new tech, system designers must ask themselves a further question: how much should users learn about the system? many managers are now part of and in charge of incredibly complex systems that have been designed as black boxes. managers are now trained to follow processes, not understand system behaviors, and specifically not their system‘s behaviors. as a result, although systems increase in sophistication, outcomes are questionable everywhere – and certainly the sense of human agency and control is diminishing rapidly.

lean’s jidoka has more to teach the world than simply a few mistake-proofing devices or clever ways to detect problems before they occur. it originally was about enabling machines with some aspects of human intelligence – namely recognizing a good part from a bad part or a used component from a valid component. but in doing so, jidoka (spot every defect, notify every defect, react immediately and correct the situation) also changes the human relationship to systems: it creates teaching/learning systems where people can understand more about how the system works and behaves in the real world, and what to do about it to get the results they seek. in every complex system like a hospital or a plant where intentional behavior impacts safety, we need to know more about how the system works, not less – to make it safer.

_the author

teaching/learning systems – 精益中国-九游会电竞

jim womack谈精益思想的过去、现在和未来

回归基础：john shook谈教练过程中提问的…

ceo在企业精益转型中的角色