Tuesday, March 2, 2010

It's All About The Failure Modes

While there are many places that new RCM facilitators fall down when it comes to facilitating a thorough and useful RCM analysis, most errors in the process start at the failure mode level. Writing good failure modes requires an expert level of understanding of hundreds of different of components. What is the component intended to do (Function) and what are the ways that this component can fail (Failure Modes).

Through the years I have tried a couple of different ways to teach how to write good failure modes. In performing hundreds of RCM Blitz™ analyses with different facilitators, and practitioners for companies around the world we have come to understand that good failure modes should be written in three parts.

Part - Problem - Specific Cause of Failure

As an example: Cooling water pump bearing (Part) seizes (Problem) due to lack of lubrication (Specific Cause of Failure)

The part is the location or source of where the failure mode begins. Looking at the cooling water pump listed above, a rookie facilitator might be tempted to say the cooling water pump failed and while this is true, where did the failure begin? It began when the bearing was not lubricated.

The problem portion of the three part failure mode is the undesired condition that results from specific cause of failure. If we neglect to lubricate the cooling pump bearing it will vibrate, heat up and eventually seize. While the bearing has been failing for some time when it seizes we now have a problem.

The third part of a good failure mode is the specific cause of failure. As we write each failure mode we should recognize that the purpose of RCM is to develop a task that will clearly mitigate the cause. If we don't get the specific cause written at the correct level your team will never select develop a good mitigating task. Again with the end in mind if we miss the specific cause the outcome of your analysis will surely miss the failure mode.

So what exactly is a specific cause of failure? This is where experience in Root Cause Analysis or Cause Mapping becomes extremely valuable. Failure Modes are all about understanding the relationship between cause and effect. The trick is to learn to discuss each failure mode at a level where a sound maintenance task can mitigate or eliminate the failure mode. To understand this lets go back to the cooling water pump.

Cooling Tower Pump Fails - Some would consider this a failure mode, I would not it only contains two pieces of a three part failure mode, the pump and at a high level, the problem. How would one mitigate this failure? Is there a maintenance task to detect, reduce or eliminate this failure mode? Would this task be applicable and effective in detecting, eliminating or mitigating this failure mode? Being honest, this failure mode is nearly useless. The only way to deal with this failure mode is to replace the pump.

Cooling Tower Pump Bearing Fails - Again, just two parts here, there is not enough information here to make a sound task decision. Some would say that we could perform vibration analysis and detect the bearing failure. While in most cases this might be true, without knowing the specific cause we cannot be sure. In many cases there are specific causes of failure where vibration analysis is clearly not the best task for mitigating the failure mode. As an example, I don't want to use vibration analysis to tell me that we have not lubricated a bearing.

Cooling tower pump bearing seizes due to improper lubrication - While we have three parts here, how do I deal with this specific cause of failure? What does improper lubrication mean? There could be several specific causes buried within this one failure mode. For instance improper lubrication could mean, too much lubrication, not enough lubrication, the incorrect type of lubrication, lubrication at the incorrect interval. It is extremely important to remember we need to have the specific cause written at a level where we know the maintenance task will be both applicable and effective in eliminating the failure mode. Each of the separate causes listed in regard to lubrication would result in a different mitigating task. Combine the causes and we now risk missing a failure mode and a task.

Remember, the failure modes we identify are the key to developing our complete maintenance strategy and most important failure modes to identify the failures that result from the context and environment in which we operate our equipment.

More information on writing good failure modes can be found in my book Reliability Centered Maintenance using RCM Blitz™

No comments: