By Farshad Bakhshi
1. Equipment Overview
The machine under study is a high-speed turboexpander/compressor operating in the off-gas section of a petrochemical unit. The machine was designed in accordance with the requirements of API 617 and is classified as a highly sensitive rotating equipment with respect to operating conditions.
Due to the nature of the process gas, its critical role in the production process, and the significant economic consequences associated with unplanned shutdowns or failures, this machine was identified as a critical asset in the plant equipment criticality assessment.
2. Failure Occurrence Description
In 2023, the machine underwent an emergency shutdown and tripped after the AMD (Axial Motion Detection) protection system was triggered, while operating within its normal operating envelope. Prior to the alarm, no abnormal indications of process instability or system faults had been reported.
Following the shutdown, the operating staff initiated a restart attempt. However, immediately after startup, vibration levels increased abnormally, leading to a second trip. Based on this behavior, it can be inferred that internal mechanical damage had already occurred prior to the alarm and became more severe during the restart.
3. Failure History
A review of the equipment’s operational and failure history shows that this turboexpander has experienced recurring mechanical failures over several years. Due to the repeated nature of these failures, plant management intervened and decided to initiate a focused investigation to identify the underlying root causes.
Accordingly, a Root Cause Analysis (RCA) team was formed, including representatives from operations, engineering, inspection, machinery maintenance, electrical, instrumentation, and maintenance planning departments.
As a first step, historical maintenance records were reviewed using the CMMS. The review showed that the machine had experienced serious mechanical failures at an average frequency of approximately two occurrences per year in previous years. These failures primarily involved damage to rotating components – particularly the impeller – as well as recurring damage observed during different operating periods.
The repeated nature of these failures suggests that the most recent failure was not an isolated incident, but was likely related to previous occurrences.
4. Failure Observations
To assess the internal condition of the machine without initial disassembly, a boroscopic inspection was performed. Boroscopy is a non-destructive inspection method that enables visual examination of internal components of rotating equipment through limited access points. During the inspection, clear evidence of impeller fracture was identified. Based on the nature of the damage and the operating condition of the unit, the machine was transferred to the workshop for a major overhaul.


Figure 1. Damaged impeller.
Post-disassembly inspections confirmed that, in addition to the impeller fracture, several stationary components adjacent to the damaged area were also affected. Although photographic documentation of the bearing assembly was not available, the observed conditions indicate that severe vibration associated with the impeller failure had adversely affected the integrity and performance of the bearing system.
5. Failure Mechanism Analysis
As described in the failure occurrence section, the failure occurred without clear warning signs while the machine was operating under normal conditions. The AMD alarm was triggered suddenly, even though process conditions were reported as normal. This behavior indicates that the machine abruptly entered an unstable operating region.
Given the high rotational speed of the machine, entering an unstable region can lead to amplified vibration and expose the impeller to resonant conditions. Under resonance, dynamic stresses acting on rotating components increase significantly, creating favorable conditions for crack initiation, fatigue crack growth, and ultimately catastrophic impeller fracture.
The restart attempt following the initial trip likely contributed to aggravating the damage. The abnormal rise in vibration immediately after restart and the subsequent second trip suggest that mechanical damage had already developed prior to the first alarm, and that the restart primarily caused the damage to manifest more severely.
Overall, the evidence indicates that the dominant failure mechanism was driven by dynamic instability and entry into a resonant operating condition—an outcome that, in the absence of effective condition monitoring and early-warning capability, may occur with limited or no apparent precursors and lead to sudden, high-consequence failures.
Based on the failure mechanism classification approach defined in ISO 14224, this event is categorized as a mechanical failure, with the mechanism attributed to dynamic instability and resonance.
6. Root Causes of Failure
Based on the results of the analysis and with a clear distinction between the failure mechanism and its underlying causes, the primary root cause of the failure was identified as operation outside the stable operating envelope, resulting from inadequate control of operating conditions during normal operation. This condition ultimately led to the onset of resonant operating conditions.
In addition to the primary cause, several contributing factors played a role in the occurrence and escalation of the failure. The absence of a vibration monitoring system prevented early detection of initial signs of dynamic instability. Furthermore, the lack of timely action to identify and eliminate the root cause – despite the presence of recurring failure records in the CMMS – reflects weaknesses in failure management and maintenance practices.
Moreover, the decision to restart the machine following the initial trip, without verification of internal mechanical integrity, acted as an operation-related contributing cause, which intensified the failure and contributed to the progression of mechanical damage.
In accordance with the failure cause classification framework defined in ISO 14224, the primary cause identified in this study falls within the category of process and control related failure causes, while the additional factors are classified as contributing causes associated with condition monitoring, maintenance, and operational practices.
7. Corrective Actions
Following the identification of the root causes, the machine was transferred to the workshop and subjected to a major overhaul. Given the severity of the observed damage, the damaged impeller was replaced, and necessary repair and corrective actions were performed on the adjacent stationary components and associated parts. After completion of the repair activities and the required inspections, the machine was returned to service.
In addition to the technical repairs, the following corrective actions were implemented:
- Review and enhancement of maintenance strategies for critical equipment within the plant
- Development of operating procedures to improve response to unplanned trips and to revise decision-making criteria for machine restart following a trip
- Increased emphasis on preventing operation within unstable operating regions through continuous monitoring of operating conditions
- Implementation of vibration monitoring to enable early detection of developing failures
- Review of equipment key performance indicators (KPIs) and more effective use of CMMS data for failure history analysis and decision-making
- Periodic evaluation of the effectiveness of the implemented corrective actions over defined time intervals
8. Lessons Learned
This case study demonstrates that, in high-speed rotating machinery, certain failures may occur without obvious warning signs and under seemingly normal operating conditions. The key lessons learned from this case are as follows:
- For high-speed machinery, reliance solely on stable process conditions is not sufficient to ensure mechanical integrity.
- The absence of effective condition monitoring—especially vibration monitoring—significantly increases the risk of sudden and high-consequence failures.
- Decisions regarding machine restart following a trip should be based on a technical assessment of internal component integrity, rather than a rapid return to service.
- Recurring failure histories, when not supported by effective root cause analysis, may indicate the presence of hidden and unresolved issues in machine performance.
9. Conclusion
Effective maintenance management extends beyond purely technical actions and requires a structured, organizational, and systematic approach. The findings highlight the key role of management support for the maintenance planning department, the commitment of maintenance departments to the proper execution of maintenance programs, and operators’ familiarity with machine start-up and restart procedures.
In addition, this case underscores the importance of condition monitoring – particularly vibration monitoring – as a fundamental element in managing high-speed rotating machinery. Furthermore, the establishment and active involvement of cross-functional Root Cause Analysis (RCA) teams is identified as an effective means of uncovering hidden issues, preventing failure recurrence, and improving the reliability of critical equipment.
Overall, adopting an integrated, standards-based approach supported by cross-departmental collaboration can play a decisive role in reducing operational risks, preventing unexpected failures, and enhancing the long-term performance and reliability of rotating machinery in process industries.
About the author

Farshad Bakhshi is a Maintenance & Reliability consultant and CMMS implementation specialist with over 20 years of experience in asset-intensive industries. He helps organizations improve reliability performance through maintenance strategy, data governance, preventive maintenance optimization, and root cause analysis.













