Thursday, May 27, 2010

Were we somehow smarter in 1984? Gulf Oil Spill Continued

With the number of comments and responses to the two recent articles I posted on this event I could not help going back to books on Reliability Basics. One of my favorite books for reference is "Reliability Toolkit: Commercial Practices Edition" released by the Reliability Analysis Center out of Rome, NY. While the book was released in 1988 the section I reference has been available to the public since 1984.

It seems that nearly every day I read an article that refers to the failure of the BOP (Blowout Preventer) Valve, with all that has been written about the BOP Valve one would have to believe that this was a critical item in regard to this process. Keeping this in mind I pulled out my favorite old textbook and took a look at what the research funded our tax dollars recommended in regard to Critical Item Reliability. While I admit to being a bit of a geek, I found it quite interesting that we had a plan of recommendations in place to deal with critical items that dates back twenty-six years.

According to the Reliability Toolkit, a critical item is a component whose failure can significantly affect safety, operating success or repair costs. (I would guess that in today’s world we could add environmental impact to that statement) It goes on to say that "Critical items include high valued components, new technologies, limited life items, single source or custom components and single failure points where failure cause a total loss of operating capabilities.

Wow! I don't claim to have a vast knowledge of oil platform components but I would say that the above paragraph clearly describes the component we have been reading about the last four weeks.

Going back to the book, the next section lists criteria for "What Causes Critical Items"

1. Use of necessary advanced technology or processes
2. Marginal component capability in adverse conditions
3. Low part or product reliability
4. Failures that cause other components to fail
5. High cost custom designed parts
6. Limited or single source vendors for critical applications
7. Severe Safety and Environmental Impact

I don't know about you but I'm beginning to see some red flags here so I elect to read on to learn about "Critical Item Control Checklist". In this section we learn about major concerns and recommended actions. I would have to say that if I had a critical item in my design, I would want to take a close look at this check list.

The first major concern asks the question; "Has a failure mode analysis been considered for critical items?" The recommend action is to "Develop failure mode identification procedures so that control of the item can be invoked". That sounds a lot like Reliability Centered Maintenance to me!

Concern - Have compensating features been considered for the design?

Action - Consider features like safety margins, overstress testing or fault tolerance

Concern - Have reliability improvements been considered?

Action - Evaluate special stress tests, checkouts, vendor quality procedures, alternate components and operating duty cycles.

Concern - Does the operating environment strain or exceed design limits?

Action - Include fault tolerant designs, safety margins and external changes

Concern - Does failure of the item jeopardize safety or does a single point of failure disrupt mission performance?

Action - A list of critical items and personnel responsible for controlling and reviewing procedures must be established!

Let's just say that it has been an interesting afternoon of reading and affirmation; what I learned about reliability at RIT several years ago still applies nearly everything I do today. When it comes to reliability the tools and techniques made available by some very bright people are right at our finger tips. The experience and knowledge of the people who install, operate and maintain this equipment so often ignored hold the answers making sure events like these never happen to begin with.

I find it hard to believe that we were smart enough instruct people how to identify and address critical items in 1984 and in the year 2010 we quite simply were not smart enough to read and act.

Tuesday, May 25, 2010

Update on the Gulf Oil Spill

A few weeks back I posted my views regarding the events that took place on the off shore oil well that resulted in millions of gallons of oil leaking into the Gulf of Mexico and more important the deaths of 11 workers. At the time of the blog I also guessed that this event while preventable would result in a whole lot of finger pointing, the demand for a root cause analysis and a focus on one or two probable causes. What I didn't recognize was how the most recent effects on wildlife and environmental damages would draw our attention away from making sure an incident like this never happens again. Instead we are now focused on who is to blame in regard to keeping the oil from reaching the beaches as well as marshes and wildlife that depends on a clean environment.

The most incredible thing in regard to this entire disaster is the amount of money companies and governments have to spend when they are reacting to catastrophic events in comparison to what they are willing to spend up front on a formal Reliability Centered Maintenance analysis that would have ensured that the failure NEVER happened to begin with or at a minimum reduced the probability of failure to close to zero.

Instead we have a mess, we have 11 dead workers, an uncontrolled oil spill 5,000 feet below the surface, hundreds of seafood workers have lost their source of income and the finger pointing goes on. As the pressure increases we make futile attempts to control the oil, stop the leak and protect our coastline and I still have to wonder could a thorough RCM analysis have prevented all of this? Did the 11 people who died on that platform know about the failure modes that cost them their lives?

My experience in facilitating Reliability Centered Maintenance answers yes to both questions. Reliability Centered Maintenance was designed to ensure the inherent designed reliability of an asset by developing a complete maintenance strategy based on known and probable failure modes identified by a cross-functional team of people who engineer, design, maintain and operate the equipment. I have yet to facilitate a RCM analysis with this type of team that didn't know the failure modes of their equipment as well as the risk and consequences of each. In identifying each failure mode and the potential consequences we would also be well aware of the effects each would have on our Health, Safety, Environment and operational capability of the equipment. Reliability and Safety depend on a proactive culture that demands on a thorough review of our design, and the failure modes associated with each component in that design. When we are not proactive, when we elect to not perform a thorough RCM analysis we are left with no choice but to look back and guess, hoping to solve what the LA Times called "a confluence of unfortunate events."

Friday, May 14, 2010

Implementing an effective condition monitoring program “optimizing reliability”

By Doug Plucknette, RCM Discipline Leader, Allied Reliability, Inc
GPAllied, LLC | www.gpallied.com
Walter Nijsen, Asst. Reliability Leader for Cargill Grain and Oilseeds Europe, spoke with Doug Plucknette, RCM Discipline Leader at Allied Reliability and author of RCM Blitz™ about what he has found is required of a company to ensure their asset reliability and integrity program is optimized. Walter has played an integral role, along with his business unit team members in setting up, performing, and implementing Condition Based Maintenance (CBM) Equipment Maintenance Plans (EMPs) which resulted from several Reliability Centered Maintenance - RCM Blitz™ analyses.

“Walter stated Cargill Europe uses CBM to understand the "different failure modes of their critical assets and how they fail". This enables them to eliminate the root cause and prevent equipment from failing again”
-Walter Nijsen
When did the journey begin?
Cargill Oilseeds Europe started using CBM some 10 years ago, starting with mainly vibration monitoring for rotating equipment as most companies find themselves doing. After some experience, it was very clear that vibration did not cover all failure modes so we extended the CBM program using several technologies. Most companies do not truly understand the value of a true condition monitoring program because they use only a few of the tools and most of the time they are not focused on predicting or preventing a failure mode (Failure mode: how a part fails).

Benefits from Cargill's Condition Monitoring Program
To mention all the benefits Cargill Europe found would take more space than we have here; however, the primary reason we are doing CBM is to anticipate the condition of our assets to respond pro-actively and to increase the resistance to failure and avoid failures", according to Walter. Walter stated that CBM helps their plants plan and schedule maintenance activities as well as identify and eliminate the causes for these failures, which will increase the efficiency and effectiveness resulting in increased reliability and lower cost. Cargill plans and schedules work based on defect severity (Defect = a point of at which failure is identified on the PF Curve using PdM or PM methodologies) and asset criticality ranking.

Secondly, Walter stated Cargill uses CBM to understand the different failure modes of their critical assets and how they fail. This enables them to eliminate the root cause and prevent it from failing again.

Cargill found a majority of their failures were self-inflicted - they were happening during maintenance, installation, engineering, start-up, operation, etc. They recognized the lack of effective use of CBM technologies, lack of effective procedures and lack of precision maintenance installation standards like: alignment, balancing, bearing mounting etc. was the root cause of these self-inflicted failures.

Finding and replacing a damaged bearing is one thing, but also understanding this failure mode is caused by lack of lubrication is another benefit of CBM. It reinforces the concept of knowing a shaft properly aligned before start-up, can save money and capacity. Understanding and eliminating infant mortality is the biggest benefit of a CBM program. All direct failures resulting from infant mortality are found and eliminated before collateral damage occurs by avoiding the failure in a proactive mode. The work is planned, scheduled, and executed using a repeatable work procedure which has the standards, specifications, etc. defined.

CBM technologies Cargill Uses
Cargill uses a number of Condition Monitoring Technologies. They use CBM in the same manner operators use condition inspections to assess product quality – in order to define the condition of their assets. The different types of CBM they use are:
• Quantitative visual inspections by both operators and maintenance personnel
• Vibration monitoring on rotating assets
• Ultrasound for slow speed bearings, compressed air leakage, electrical arcing in cabinets, and steam traps
• Infrared on heat exchangers, mechanical rotating equipment, electrical devices, and hydraulic reservoirs
• Oil and lubrication analyses including particle count and wear part finding
• Motor current analysis online and offline
• NDT for wall thickness of pipes and tanks, and vacuum leakage

How Cargill Identifies the CBM technology and the frequency it should be applied
In the beginning, Cargill used only vibration analysis on rotating equipment. In order to determine which CBM technology to apply to prevent or predict failure modes, Cargill uses two systematic approaches. The first is RCM Blitz™ (Cargill's preferred method of RCM), which is used on critical processes and assets. RCM Blitz™ is a similar process to traditional Reliability Centered Maintenance (RCM) using the seven questions of RCM, however, the results are identified in a shorter time and applied without delay. Cargill focuses on the systems that will give you the best Return on Investment (ROI). Simply put, to Cargill, RCM Blitz™ is a slam dunk when it comes to return on investment for critical assets.

If you ever wonder how and where to apply RCM Blitz™, you begin by first identifying the top 10% of your most critical assets. Once this list has been identified, you should now begin to measure Overall Equipment Effectiveness (OEE) on these assets and then begin performing RCM analysis on those critical assets that have equipment-based operational, speed and quality losses. If you have selected a critical asset, your implemented RCM maintenance strategy will show measurable improvements in OEE with added improvements in Health, Safety and Environmental performance as well. Cargill demonstrates this in all of their best performing plants globally.

"As a general rule, the success of your first implemented RCM analysis will build the business case to complete RCM analysis on the remainder of your critical assets"

Second, Cargill uses failure mode mapping to determine failure modes for the rest of their assets. Based on the failure modes, software (Asset Health Matrix) they use, maps the failure mode to the technology. The software determines which CBM technology is most effective to apply. See Figure 1.1 for the output of the Asset Health Matrix.



Figure 1.1 – Asset Health Matrix

Walter's Recommendation for the Three CBM Technologies to Invest in First
Walter stated, based on what he has experienced in his business, he would invest in oil and lubricant analysis, vibration monitoring and infrared. This recommendation will vary depending on the business and the type of critical equipment one may have. With these three technologies, you may be able cover most of the common failure modes. The technologies are cheap and widely available.



Infrared Image of Misaligned Shaft

In regard to the technologies, there seems to be a learning curve associated with each in regard to becoming proficient at detecting potential failures and identifying the failure mode that caused the failure, what did Cargill do to accelerate this learning curve?
Exact understanding of the technologies and getting the best benefits out of these is a challenge and takes some time. Within Cargill, they have done several things.

First, Cargill trained all their reliability leaders and maintenance managers in the process of moving from Preventive Maintenance Centric to Condition Monitoring Centric maintenance practices. This change requires that you must change leaders thinking from being reactive to a pro-active state. Changing the way leaders think is key; they must learn that failures are considered unacceptable for any critical asset. So education is foundational.

Second, Cargill recommends that all maintenance and reliability leaders obtain knowledge in all CBM technologies used at their facility to a minimum of Level 1.

Third, they work closely together with companies who have CBM measurement as their core business. They allow their plants only to work with select suppliers and build up a relationship of trust to become strategic partners with each other.

Fourth, Cargill brings in key leaders from all plants in a region two times a year to meet with the CBM supplier to share experiences and look for improvements to avoid re-inventing the wheel all over again and to accelerate their learning.

Bringing Operations Leadership up to speed in terms of CBM
Changing from reactive to pro-active maintenance is a complete culture change and is difficult for the whole organization. Implementing CBM and showing the Return on Investment (ROI) is not enough. If you do not educate and have relentless leadership towards pro-active maintenance, you will fail. Cargill developed trainings for various parts of the organization, depending on the level or function in the organization. Today, they still educate and show the ROI on CBM; however, they have reached the maturity level where CBM has become a part of their culture. Instead of convincing, they are now teaching how to accelerate the CBM implementations.

Process Verification Techniques such as operating pressures, temperatures or flows with Distributive Control Systems Trend Alarms to Determine Potential Failures
Cargill uses these processes to identify the first point of a failure (P on the PF Curve). With Cargill, it depends on the maturity of the plant. They use PI (a software tool to monitor process parameters) to determine when heat exchangers performance reduces or filters get blocked. In the more mature organizations, they use the amperage, flow and pressure to determine pump conditions. If you want to have 80% of your critical maintenance work indentified by CBM, process verification techniques are a must. With the predictive technologies only, you are able to cover about 40-50% and the disadvantage of these technologies today is you only see a snapshot. Several measurements need to be made on a certain frequency to see trend and identifying failures in an earlier stage of the PF curve. Using process verification techniques gives you information 24/7 which enables you to anticipate potential failures earlier and also to evaluate more in-depth what the conditions are and what the parameters are telling you.

Walter, Do you have an example where CBM detected a potential failure, and you were able to avoid a costly shutdown?
"We have several examples. Most of those examples are on the "hidden" equipment. The cooling tower main supply pump is one. This pump is mostly oversized, well-designed and installed and running for years without any problems. Typically, not many problems occur; therefore, it tends to be "forgotten". In one of our plants, we discovered during our CBM vibration monitoring rounds that the bearings were not sufficiently lubricated and would have reached a point of failure within days. We did not have a redundant unit installed; however, we did have a spare available. We then scheduled a short shutdown during a product change (optimizing downtime). After an investigation, the bearings were found to have completely worn out. Had this not been detected before failure, the plant would have had "forced downtime" for several hours with significant tonnage production loss."

Cargill's learning about CBM techniques
Cargill has found that there are several good places to understand and learn about CBM such as industry conferences like Euro-maintenance, ICOMS Asset Management Conference-Australia, SMRP, BEMAS and The International Maintenance Conference. At these conferences, solution providing thought leaders from CBM businesses are presenting themselves.

Measuring Success of a CBM Program
To be successful and to change a culture from reactive to pro-active can only be done if you show the value, benefits and success coming from your CBM program. Cargill has several measurements in place. These measurements are a mix of lagging (result) and leading indicators. Examples of some lagging indicators include;

•Maintenance cost per installed replacement asset value
•Plant reliability measured as Overall Equipment Effectiveness (OEE)
Their main focus is on the leading indicators. These indicators provide them with informative details about the condition of the plant. Examples include:

•Asset Health. This is the % of critical assets which have "No Identifiable Defect". We call an asset healthy if we cannot detect point P (P-F curve) for all critical failure modes with all of PdM technologies applicable.
•Another measurement is the effectiveness of the maintenance organization measured by actual maintenance hours spent on pro-active activities. Our goal is to spend 80% of the time on pro-active work (PM and PdM). This is work identified with CBM and the follow-up work.
In total, Cargill uses about 10 Key Performance Indicators (KPIs) to measure the success of their program.



Figure 1.2 – Example of a Balanced Scorecard at one of Cargill's Sites

Using CBM technologies to verify quality following installation; i.e. alignment, balance, lube testing of new oil
In most of Cargill's commission check lists, they have specified the start-up condition of new and repaired equipment. This includes oil cleanliness, vibration levels, but also balancing and alignment standards. Pumps should be mounted stress-free so before installing the pump, we check for the correct pipe support, etc. Vibration analysis, infrared or other technologies are used to validate that new equipment or components have no identifiable defect at the start of their life.

In conclusion, Cargill has developed a progressive maintenance and reliability program. They operate by procedures, standards, specifications and rely on outside experts to help them be the best in their industry. Cargill's reliability initiatives, successes and continuous improvement efforts help provide their customers a quality product on demand in highly competitive marketplaces around the globe. Cargill has over 1300 plants in the world with over 160,000 employees and are one of the largest, privately-held companies in the world.

About the Author
Doug Plucknette is the founder of RCM Blitz™ and Author of the book Reliability Centered Maintenance using RCM Blitz™. Doug Plucknette has provided Reliability Training and services to numerous companies around the world, large and small, including such Fortune 500 companies as Cargill, Whirlpool, Honda, Coors Brewing, Energizer, Corning, Invista, and Newmont Mining. Doug has made key contributions to standard reliability measures for manufacturing, and reliability training programs for engineers, managers, technicians, and skilled trades. He has trained numerous client RCM Facilitators and performed RCM analyses in hundreds of pieces of manufacturing equipment.

About GPAllied
GPAllied, LLC is a joint venture with General Physics Corporation, a global performance improvement solutions provider of sales and technical training, e-Learning solutions, management consulting and engineering services, and Allied Reliability, Inc., a global engineering firm specializing in predictive maintenance and reliability engineering.

GPAllied provides the most diverse reliability and operations consulting and services globally available today. With offices in the Americas, Europe and the Asia-Pacific region, GPAllied has extensive experience in the specialty fields of Lean, Reliability Engineering, Six Sigma, Condition Monitoring, Leadership and Change Management, Maintenance Planning and Scheduling, Workforce Development and Maintenance Craft Skills training. GPAllied serves clients in asset-intensive industries, like petrochemical, mining, energy, manufacturing, food and beverage, and life sciences to name a few. GPAllied brings together unique capabilities and synergistic strengths of two thought leaders and allows for global implementation never before realized by the industry. The result is the joining of People, Processes and Technologies in one total package never before realized – now available to the global marketplace. Additional information may be found at http://gpallied.com.

If you have questions or would like more information about GPAllied, LLC, RCM Blitz™, please contact GPAllied Managing Director, Dirk DeNutte, at ddenutte@gpallied.com or via phone: +32.496.572.104.

Monday, May 3, 2010

RCM Blitz -What Can We Do Before Things Go Wrong?

Seems like every time a company is faced with the tragic circumstances we have witnessed over the last few weeks with the coal mine explosion in West Virginia that killed 25 workers, and more recently the oil platform explosion in the Gulf of Mexico off the coast of Louisianan, that killed 11 workers and has resulted in thousands of gallons of oil leaking into these waters on a daily basis, our news agencies and people around the world are demanding investigations as to what went wrong with each incident. Human nature demands we investigate what went wrong and who was responsible. There must be a cause, we must find someone to blame and that person should be held accountable. It would seem at times like these that finding those responsible get more attention than making sure events like these never happen again.

In truth the pain for the families who lost loved ones and the companies who will be held responsible has just begun. Moving forward experts in each field will be hired to voice their opinions regarding the likely causes. The news coverage will likely focus on the one or two most likely potential causes and several months down the road a figure head for both companies will proclaim that their company has now addressed these issues removing the chance that this will never occur again!

And, as a seasoned RCM (Reliability Centered Maintenance) practitioner I will close my eyes, say a prayer, and hope that one or two things they focused on are the only things that could have caused these American workers their lives. I say a prayer because I know as other RCM and RCA (Root Cause Analysis) practitioners know that events like these seldom have a single cause. In reality tragedies like these are typically a series or chain of events that lead to catastrophic failure and the only way to reduce the likelihood of these failures to an acceptable level is to identify and mitigate the all the failure modes that could cause them.

The real shame comes in the understanding that what has happened, didn't have to happen. While all failures might not be predictable they are all preventable. Preventing failures takes leadership, structure, discipline, resources, expertise and patience. Being honest, these are characteristics seldom seen or displayed in companies as we face a very tough economy and this being said we all have to make tough decisions so the question always turns to; What would it have cost to put a team of experts together and identify every failure mode that COULD lead to catastrophic failure? And; had we put this team of experts together several years ago, would we now be in the position we find ourselves in today?

Reliability Centered Maintenance is a very structured process that asks a series of questions to discover and mitigate the failure modes that result in functional failure of your assets. In performing this process over the past 15 years I am continuously amazed at the unforeseen failure modes we uncover as a team and while this process is not perfect the companies who elect to perform and implement RCM always see an improvement in equipment reliability as well as a reduction in health, safety and environmental incidents and accidents.

To perform a thorough RCM analysis on your equipment you need to hire a seasoned RCM practitioner who believe it or not has little or no experience in the equipment you are about to analyze (Experience brings bias and leads to missed failure modes), a team of process experts, engineers (Mechanical, Electrical, Process, Safety/Environmental) equipment operators and a cross section of trades people (Mechanical, Electrical, Instrument). This RCM team should be composed of experts who are respected by their peers who are honest and open to change. In performing analyses of asses where failure could result in catastrophic events this team will need the patience required to discuss the causes and effects of all failures that could lead to catastrophic events and in discussing these failures we can then address tasks intended to mitigate each failure mode. The most important thing to remember as you assemble your RCM team is understand that the word "Expert" requires that this person actually has hands on experience working with your equipment and the environment in which it operates.

So, while we all wait to find out what happened and why these events occurred, I really hope that we all take a step back and think what could happen at our workplace. When it comes to our people and our assets we have two choices, the first is to be proactive and identify a team to identify and mitigate failure modes, the second is to cross our fingers and let an outside team of "experts" identify a couple of things we did wrong and hope they were the only causes.