10 Tips to Fail Well

Steps to Increase the Probability of Having a Successful Failure Analysis

23 October 2018

At Intertek's Asset Integrity Management facility in Santa Clara, CA, we deal with a myriad of failed equipment daily.  Rapid unscheduled disassembly is traumatic and confusing.  Critical information can easily be lost before samples reach the hands of failure analysis experts if proper care is not taken while extracting parts of interest from the incident site.  Failure is rarely a neat and orderly affair, but our response to it must be well organized to maximize learning from unfortunate events.  Here are 10 basic steps that can be taken to increase the probability of having a successful failure.

  1. Scene safety:  Ensure that equipment is shut down, electrical systems deenergized, highly energetic and hazardous substances are removed, and appropriate safety procedures followed to lockout the scene and ensure remaining hazards are mitigated.  There is often an urge to get to the heart of a problem, but it is almost never worth the risk.
  2. Document the scene:  Assuming the site is safe, taking pictures of the environment including the failure site and surroundings is essential.  The more information gathered on site the less likely it is that materials will be misrouted or misidentified.  Look around and document anything which seems amiss.  Gather any available sensor or performance data.  If there is nondestructive testing capability onsite, be sure to deploy it with care as what is nondestructive during operations can sometimes disturb the chain of failure.  If you have questions don't hesitate to call experts.  A quick phone call with a failure analysis professional can save information, time, and money.
  3. Get permission:  Ensure that approval from all interested parties is obtained prior to performing destructive removal or other activities which may disturb the condition of the site.
  4. Select items for removal:  Inevitably, some parts will need to be removed for laboratory examination.  Carefully mark or tag each sample to be removed with information about orientation, process information (flow direction, pressure, temperature, etc.) and identification.
  5. Be generous with sample size:  Where possible, give a wide berth to the regions of failure.  Anything damaged during removal will be useless to obtaining the cause of the failure.  Structure away from the failure within a part can provide critical information as to part condition and mechanism.
  6. Remove specimens with caution:  It is important that any deposits or fluids not be disturbed during removal, or worse, transferred to other components.  If there are issues during removal document them and inform the lab you are sending the samples to.  In doing so, time is not wasted examining damage introduced during extraction.
  7. Preserve condition:  Unless fluids or solid deposits are hazardous, do not remove them.  Do not clean the parts, and do not clean them with aggressive mechanical action and used cleaning supplies.  Wrap parts carefully with a nonreacting material; polymer film is typical.  Protect the failure from the environment where possible, and note the conditions during removal, storage, and packing.
  8. Pack with care:  Shipping samples is expensive.  Take some time to pack each item to prevent damage during shipping thus causing loss of data.  Sturdy wooden crates with a lot of fresh, clean wrapping, padding, and desiccant to arrest the propagation of moisture damage are essential.  Be sure that the item is wrapped so that contents can't migrate out into the packing material, and to prevent ingress of packing materials into the specimen as well.
  9. Ship fast:  Some conditions can change over time.  The sooner the samples arrive in the lab, the less likely it is that they will be compromised.  Airfreight is best if possible.  Timely results are impossible if the samples don't arrive quickly.  For very large samples, consider bringing a field team onsite to get data more quickly than shipping would allow.
  10. Supplement specimens with data:  Be sure to send any information on the operations and site, such as drawings, operations data, sensor data, and previous inspection reports.  Gather the information and send it in digital format if possible.  The more context that is available, the faster the failure analysis will progress, and such data is essential to even begin a root cause investigation.

Failures occur for a variety of complex reasons, and by following these steps, alongside industry specific best practices, hopefully the complexity of your failure can be preserved long enough to be fully understood and prevented from needlessly recurring.

Dr. Hasier has experience ranging from analog electronics to mechanical design to materials characterization. His previous work includes rocket injector design and testing, paleoaltimetry, particle physics, energy storage, magnetic functional materials, material discovery, alloy design, and heat treatment. He has a bachelor's degree in physics from Caltech and a PhD in Materials Science and Engineering from the Illinois Institute of Technology.