Defect Elimination Strategy

From Imrtwiki

Jump to: navigation, search

Return to Condition Monitoring - Predictive Maintenance Techniques

7.1 – Introduction to Defect Elimination There are two factors that affect the reliability of an equipment item. • The physical defects that it contains (flaws, weaknesses, imperfections, faults, deficiencies) • The operating and maintenance environment it works in No piece of equipment can be considered perfect. There are always small design, manufacturing and installations defects that could be improved to increase its effectiveness and reliability. Most of the time these physical defects are hidden. No operating or maintenance environment is perfect. There are always better ways to operate and maintain equipment to increase its effectiveness and reliability. These issues are also called defects. Some of these operating and maintenance defects may be known but often they are not. Occasionally one or more defects become active and triggers a deterioration in the equipments condition. This eventually leads to equipment failure. If the most important defects are eliminated, then the effectiveness and reliability of the equipment will often be substantially improved. One way to learn about defects that are present is to measure or observe them directly through condition or other parameters. Fig 7.1 illustrates these ideas, with more detail covered in section 11 on Condition Monitoring Process in Chapter 1. Once a parameter shows a change from its normal level, this indicates that a more significant defect is present. If the cause of this level change can be determined, then often actions can be taken to slow further deterioration or to eliminate the defect entirely. Both options will improve equipment reliability. Other ways to lean more about defects and their causes is through post mortems on failure or analysis of failure statistics. Condition monitoring parameters usually only show defects causes indirectly. A troubleshooting or ‘defect cause understanding’ process is required to highlight the actual defect cause. A post mortem or failure analysis can observe the physical evidence more directly. Condition information can be very important to a post mortem, as it can allow repair or replacement to occur before evidence is lost through catastrophic failure. It can also direct the focus of a post mortem to reduce the likelihood that the key evidence is missed. Fig 7.1 illustrates that equipment repair is often an ideal opportunity to eliminate known equipment physical defects. Often known operating and maintenance defects for equipment can be eliminated at any time. There is a substantial amount of generic defect elimination good practice information amongst the wider maintenance community. Networking with other maintenance and equipment personnel within and outside your organisation is a key opportunity. Where this does not help for specific defects, there is a wide range of defect elimination solutions that can be developed through a Root Cause Analysis process. For many businesses, product quality is even more important than equipment reliability. The defect elimination process is identical in most cases for product quality defects as it is for equipment reliability detects. Its normal to find the root cause of a product quality defect being equipment related issue. Even if operational or product quality issues are not your responsibility, you should always keep yourself informed of current issues and assist in defect eliminations wherever possible. Sensible input into operational or quality problems by equipment specialists is often a major business improvement opportunity. The rest of this chapter outlines a Defect Elimination Strategy and some defect elimination opportunities. It is called a strategy rather than a process, as it must be driven by a strong management focus and is an approach rather than a procedure. It is all about utilising scarce resources of time and money to achieve the greatest level of improvements possible. Defect elimination is often carried out as a part of a broader business improvement strategy. Some widely used approaches are TPM, Six Sigma KT & BIP. There is a significant amount of information available on these types of strategies and about defect elimination generally. 7.2 – Defect Elimination Strategy Details The main parts of the defect elimination strategy are: • Find defects • Understand defects • Eliminate defects Some details of these three stages are given below. Tools and Techniques for Defect Elimination Find Defects – Time and resources have to be invested to understand what your most significant defects are. This should include process defects, specific equipment defects and defects for generic equipment. • Determine Equipment Priority (Business Risk) o Collect and analyse data on equipment failures & process stoppages, product quality, operational & maintenance costs and safety & environmental issues  Pareto and trend analysis by process, equipment, equipment type (automated monthly reports are ideal if possible) o Matrix risk analysis to determine equipment priority (Section 7.4) • Use manufacturing Excellence Tools o 5S – Workplace Organization and Engagement o Work Teams – Performance & Issue Management o Value Stream Mapping o SMED – Quick Changeover o Mistake Proofing o Total Productive maintenance o Process Control o Pull Systems o Theory of Constraints • Maintenance practice audits for key equipment and processes • Do operational and maintenance benchmark studies • If not already in place implement baseline equipment monitoring • If required implement PdM (& PM?) strategy into CMMS using: o Condition monitoring strategy decision matrix (Section 7.5) o Industry standards & expert advice o From local expertise and experience o FMECA (RCM) & decision making models  Spreadsheet systems  Software tools (PMO, RCM Turbo, RCM Cost) • Monitor PdM and PM schedule achievement & strategy improvement process Understand defects Diagnostics – Do when a defect is identified to understand what are its causes and root causes are. • Use available troubleshooting/diagnostic knowledge, previous problem history, manuals, texts etc. and assist with knowledge from experts, suppliers & service providers • Carry out diagnostics for specific equipment and processes as they arise • Monitor diagnostic success and diagnostics improvement process Root Cause Analysis – Do when a significant defect, problem or incident occurs. • Identify Root Cause Analysis (RCA) opportunities (Pareto analysis or significant event) • Develop Causal Tree (Focus and Find Causes in RCA Rt method) o Cross functional team of 3 to 8 including key stakeholders • Collect data to verify and refine causal tree and identify new causes and root causes o Involve external people with specialist knowledge • Build diagnostics systems for key problematic equipment and processes, based on RCA causal tree (Section 7.6) Post Mortem/ Failure Analysis – Do whenever practical while repairing and overhauling machines and components. Also disassemble component being replaced and scrapped. Not just on failed or failing items. Verify defects, identify hidden defects and further investigate defect causes. • The ideal is that before disassembly a list possible defects of interest are identified to focus observations. Use technical data, previous problem history and condition monitoring data to help identify defects of interest. The other key source of data is tradespersons with experience in disassembly and repair of the equipment. • For equipment being overhauled or repaired externally, record the components and defects of interest for inspection and attach the document to equipment in a clear plastic envelope. The document should specify what measurements, photos, information and components are to be returned on completion. • On disassembly mark components and their physical relationship to each other. Make all relevant measurements to verify tolerances, clearance etc that may be of interest. Identify anything unusual or any variation from as new condition (wear, damage, corrosion, discolouration, marking, cracks etc). • The digital camera is one of the most valuable tools to record defect data. • Use observations and data to determine the causes and root causes of defects and failures (See RCA above). Record findings. Document causes and evidence/symptoms by building or adding to a causal tree. • For serious physical defects or failures, specialist metallurgical or other expertise may be required.

Defect Elimination (DE) – Once you understand the causes and root causes of a problem, you need to find and implement a solution to eliminate or minimise the defect. There are always expensive solutions to a problem that people often set their hearts on and on some occasions they are the most appropriate, but mostly not. The goal for the solutions are simple, effective, sustainable and within your scope. It is good practice to always generate a simple and cheap solution to be able to compare with more expensive solutions, before you make the final decision. You don’t want to be in a situation of waiting for years on money approval to fix a problem, when a simpler solution would have done a 70% of the ideal solution a long time ago. • Identify solutions o Get suggestions for solutions from stakeholders and experts o Causal tree solutions analysis process o Identify generic proven DE solutions (see section below) • Choose solution. Use a decisions analysis process (eg KT) unless trivial (involve stakeholders) • If the decision is not in your control, build a business and cost justification for the solution and get approval • Plan and Schedule solution • Implement solution (involve stakeholders) • Initial verification of success. • Celebrate success and recognise and thank all involved • Longer term verification of success • Document success • Use the learning to spread success for similar applications elsewhere Information on most of the elements of the above Defect Elimination Strategy is widely available and there is not much benefit on trying to duplicate the information in this book. I have selected three elements out of the defect elimination strategy that are not so well document and have added some information on these tools at the end of the chapter. They are: • Matrix risk analysis for equipment priority • Condition monitoring strategy decision matrix • Developing diagnostics systems using RCA causal trees 7.3 - Defect Elimination Opportunities There is nothing new under the sun (ECC 1:9 RSV). For the problems you are facing now it is likely others have faced and solved elsewhere or in the past. There is a wide range of defect elimination good practice solution available. The list below is just a very small sample of what is possible. Detection can be achieved with a monitoring program or with just once off baseline checks. The structure of the items below is;


Pumps and Fans • Shaft misalignment. (S) Laser alignment. (B) Increased bearing life. (D) Axial and radial velocity vibration, vibration spectrum analysis and verified by alignment checks. Also higher ultrasonic vibration at loaded bearings if highly misaligned. (C) Halve the load on a bearing can increase life by 8 times. • Pump Gland leaks. (S) Replacement of grands by mechanical seals. (B) Gland and bearing reliability & reduced maintenance. (D) Gland maintenance frequency, shaft damage frequency & bearing failure post mortem. (C) • Pump Mechanical seal failures. (S) Service provider support or supplier training. (B) Reduced early failures. Failure frequency. (C) • Pump suction blocked (usually a plastic bag at sump strainer) causing poor flow and impellor casing damage. (S) Primary and secondary suction strainers or eliminate debris source. (B) Reduce random difficult to diagnose failures. (D) Failure history & risk review. • Bearing failure/moisture entry/wash-down (S) Improve equipment wash-down practices; install seal area splash shielding and lubricant checks. (B) Reduced bearing failures and republication frequency. (D) Lubricant analysis and bearing post mortem. (C) • Bearing failure by surface fatigue spalling/poor lubrication. (S) Implement lubrication schedule. Audit lubrication schedule including inspection for site symptoms of good top-up and relubrication practice. (B) Reduced bearing failures. (D) High ultrasonic vibration. Some other issues with well known root cause solutions are listed below. • Contamination in oil lubricated system o Improved breathers o Improved shaft seals o Fix other compartment sealing issues (dipsticks, covers o Improved filtration (recirculating oil system) o Regular filter cart filtration o Ensure top-up and change-out oil cleanliness is adequate (eg final filter on entry) o Positive air pressure (dry & clean) purge system o Replacing breather with bladder o Improve wash-down practices o Splash shields for sealing areas o Audit system for servicing (eg. oil level checks & top-ups, filter change-outs) o Contamination of grease lubricated housings > 400rpm o Improve wash-down practices o Labyrinth seal grease purging o Improved shaft seals • Bearing Fitting • Assembly tolerances • Shaft misalignment • Inadequate corrosion resistance • Inadequate wear resistance • Loading excessive • Material stress excessive • Operational practices • Inadequate operational documentation • High resistance electrical joints • Inadequate cooling for electrical systems • Excessive and inappropriate maintenance


7.4 - Matrix Risk Analysis for Equipment Priority Some plants have excellent history of plant down time, machine problems, repair costs, material costs and OH&S issues. This makes it easy to do trend and Pareto analysis of the data to help set the focus on where the most attention for defect elimination should go. For plants that are not in this position, then often the best source of data is the memory of some of your key operations and maintenance personnel. Fig 7.2 is spreadsheet system to enable you to quickly and rigorously collect information about how big and how often are the problems that occur with equipment and processes. The problem categories in Fig 7.2 are Production Loss, Maintenance Cost and HS&E problems and 5 or 6 magnitude groupings are created with increasing order of magnitude relevant to your organisation. You need to have a list of equipment that ideally has equipment type categorised, usually from the CMMS. Make sure the items are at a low enough level eg hydraulic system, pump, conveyor, conveyor drive etc, but not too low a level eg bearing, cylinder, pump motor etc. A group of two or three people is best to do the analysis (the lower the number, the faster the analysis will be). First get a consensus from some examples of equipment that would fall into each category. Loss can be from a known yearly cost or from the cost of a known or likely significant event. Before you start you need to have an agreed average cost of lost operations (ensure this number is not underestimated). For each event you pick a frequency. As you only have to guess an order of magnitude it is usually quite easy. As many equipment items tend to be repeats you can do the analysis very quickly. The spreadsheet automatically creates a risk/priority number by multiplying each Loss Factor (1 to 6) with its Frequency Factor (1 to 6) and adding the three category numbers together. The higher the number is the bigger the risk. Once the data collection is done equipment can be sorted to find the top 20% of highest risk items. The risk from the different equipment types should also be analysed. Often an equipment type will not have many individual high risks but in total has a high total risk. This is a large opportunity as specific equipment types often have common causes of failures. In this situation the intension of this type of analysis is to use a quick way to focus people’s attention to the areas of opportunity. If you do this same analysis again with a different group or at a later time you will very likely get a slightly different set of answers. Resist the temptation for yourself or others to waste time to ‘get it RIGHT’. As long as the analysis is more right than wrong it will achieve its gaol. 7.5 Condition Monitoring Strategy Decision Matrix There are a lot of variable and options when making decisions on what condition monitoring to use and how often you do it. The Condition Monitoring Strategy Decision Matrix shown in the next two tables is a way to set some standards and provide guidance on where the different options should be used. The first table below defines the strategy options and the second table gives recommended frequency in weeks related to ‘risk’ expressed as your expected average Operate-To-Failure (RM) cost per year. It also gives the expected average yearly total cost to the business of using the particular strategy. What the matrix tries to do is for a particular monitoring application (eg Brg monitoring on simple machines > 400rpm) defines a range of good monitoring methods from simple 5 senses to full on-line. I have then done an optimisation analysis using some hopefully good sense input of PF interval, monitoring cost, capital cost etc. The matrix displays monitoring method on the left and yearly average cost of an operator to fail strategy (only servicing performed such as lube). For example if the total business cost of a bearing failure including all direct and indirect cost was $10,000 and MTBF was 2 year, the top axis would be $5,000. The info within the matrix is the optimum monitoring frequency in weeks and the total cost of the strategy to the business per year of bearing failures (including all maint costs, operation loss cost and OH&S costs) and represents risk level. One of the most significance influences in all CM decision matrixes is the PF interval, which is the warning time given by the specific CM technique. For example in the below matrix the PF interval for 5 Senses detection of bearing problems is a lot smaller than can be achieved with a well setup vibration data collector system. The other key influences are the cost of an unpredicted failure, the Mean Time Between Failures (MTBF), the cost one monitoring & analysis routine, the cost of a predicted failure and very importantly the percentage of failures that the CM technique will miss entirely. The information that went into the model that created the matrix required making certain decisions and assumptions so the optimum frequencies and its total cost will not be true in all cases. Also if a frequency optimum is given as 6 weeks, in many cases there may be only $30AUD difference in the total yearly cost to the business by selecting either 4 weeks or 8 weeks as the monitoring frequency. It is usually much more important to form practical monitoring and inspection routes than to use an ideal optimum. Another issues with this particular CM matrix system is that it does not fully define the cut-off point of where Reactive Maintenance should be used. This requires more information on the cost difference between a predicted and an unpredicted repair and the opportunity cost for on other competing plant issues. Of interest in the matrix below is that there is not a big total cost difference between simple monitoring vibration monitoring and vibration data collector techniques for lower risk levels. Also if a sensible routine monitoring system is in place on-line systems are hard to justify.


7.6 Build Diagnostics Systems using Causal Trees With the availability of portable computers and PDA’s it is possible to share and carry huge quantities of data. When we are under pressure to solve an equipment problem the information we need never seems to be accessible or in the right format. When you have a problem such as an equipment defect you need to quickly link Symptoms > Causes > Root Causes > Solutions. As we have already said previously, someone has likely done this already. So why don’t all these experts pool all this knowledge into a format we can all use. This has been attempted for years with expert systems and troubleshooting tables and although they are often useful, they never seem to be flexible, detailed or accessible enough.

Figure 7.3 – Very Simple Causal Tree for a Fan Bearing Problem A Root Cause Analysis causal tree (simple example shown in Fig 7.3) can link symptoms, causes, root causes and solutions together but does not usually display it in a way most suitable for diagnostics. Table 7.3 shows a table of causes and symptoms from the causal tree in Fig 7.3. You can see that the symptom text has been structured with a category header. This allows a resort into symptom order as in Fig 7.4. This format is far more useful for problem solving. For example, if I have a fan with an ultrasonic vibration in the bearing, I can easily find the two linked causes linked to this symptom. I would then go to Table 7.3 and check the other symptoms of these causes to see which symptom list are the best match for my problem and do any further checks and measurements to confirm. I could then look at the causal tree and check the likely root causes for my problem and any root cause solution stored.

Table 7.3 – Cause to Symptom Report

Table 7.4 – Symptom to Cause Report This diagnostic example is very simple and is used to illustrate the concept of using the RCA software systems such as ‘RCA Rt’ to act as a troubleshooting/diagnostic development system. Often causal trees are developed to help solve problems and then are discarded. With only a little extra input and review of evidence/symptom data, they can be turned into useful diagnostic systems. These can be easily added to and improved over time and shared. The software can hold thousands of causal trees, which could be grouped, for example all fan problems. Files with this fan data could be added to PDA’s to make this data accessible while working on-site. With today’s software systems, sorting records and finding the specific record you want by multiple word search is not a problem.

Personal tools