Studying MitigateScan

The History and Objectives of RAMS

RAMS in a nutshell...

The world of engineering and technology has witnessed remarkable advancements over the years, and the quest for robust and dependable systems has been at the forefront of this evolution. The RAMS discipline, encompassing Reliability, Availability, Maintainability, and Safety, has played a pivotal role in shaping the reliability and performance of complex systems. We will review the history of RAMS, tracing its roots from the inception of reliability and exploring the subsequent development of interconnected notions.

Reliability - The Pioneer:

The concept of reliability emerged during the early 20th century in response to the growing complexity of industrial systems. Engineers sought a way to quantify the dependability of machinery, and thus the notion of reliability was born. The first systematic approach to reliability was introduced by the mathematician and engineer William W. Alberts in the 1920s, laying the foundation for a more standardized understanding of system dependability.

During World War II, reliability became a critical aspect of military operations. The failure of equipment in the field could have dire consequences, prompting a surge in research and development focused on improving the reliability of military hardware.

Reliability has emerged as the fundamental and pervasive concept in RAMS. Who wouldn’t be concerned about obtaining an unreliable product? This criterion ranks among the top factors influencing customers’ perception of quality. Therefore, it is meticulously quantified for each system susceptible to breakdown. This quantification can be conducted at different levels depending on the product’s value or complexity. The cheaper it is, the less likely a system-based approach will be adopted, opting instead for a large-scale test to failure, with the average time to failure referred to as “Mean Time To Fail” or MTTF.

Consider an electric fan manufacturer aiming to demonstrate that their products achieve a satisfactory reliability level. They could power them continuously until they break down, recording each running time, with the average providing the product’s MTBF.

On the contrary, the more expensive and complex the product, the more likely system reliability models will be adopted. This approach involves detailing the product into various subsystems or components, depending on the desired level of detail.

Is it convenient to manufacture the first prototypes of a new car with random parts, only to realize during testing that half of it is not reliable, especially considering the development cost is in the billions?

This choice generally depends on the available reliability data. It is generally accepted that the necessary level of detail is the one at which clear and quantified failure mode data have been identified. Failure modes, as we will discuss later, refer to the list answering the question “how does the component break down?”

For example, a keyboard’s key failure modes list might include: no input detected, untimely input detected, degraded operation (discussed later)…

Customer satisfaction lies at the core of the Reliability discipline and is deeply intertwined with quality and cost-effectiveness.

Availability - Maximizing Uptime:

As technology continued to advance, the importance of system availability became apparent. Availability, often measured as the percentage of time a system is operational, became a crucial metric, especially in fields where downtime could result in significant losses. The aviation industry, with its emphasis on aircraft availability, played a pivotal role in popularizing the concept of availability.

The Apollo 11 mission to the moon in 1969 marked a milestone for availability engineering. The meticulous planning and redundant systems ensured the spacecraft’s availability, enabling the historic lunar landing.

Availability is often confused with Reliability. While no failure implies infinite reliability and 100% availability, it is also closely related to Maintainability. For instance, if a car spends equal time in the workshop and on the road, its operational availability is only 50%. Availability has various interpretations based on what we consider behind the term “available.” Do we refer to total time or operational time?

Depending on the need for quantification, the availability of a system can be assessed solely during operating time or over the total time. For example, you expect your car to start up every morning for your commute, even if it is not in operation yet. We will delve into this further in the dedicated module.

Additionally, two distinct notions can be identified: mean availability, as discussed earlier, and instant availability. Generally quantified in availability per hour, it represents the probability of having no failure during an hour. This concept is particularly applicable in safety, and I won’t delve further into it at this point.

Maintainability - Sustaining Performance:

The next logical step in the evolution of RAMS was the inclusion of maintainability. Engineers recognized that even the most reliable and available systems would require maintenance to sustain their performance over time. The field of maintenance engineering grew, focusing on minimizing downtime and optimizing the ease and efficiency of repairs.

The development of predictive maintenance in the 1980s marked a paradigm shift. Instead of reactive repairs, engineers began using data and analytics to predict when maintenance was needed, reducing downtime and extending the lifespan of critical systems.

It stands at the crossroads of all other RAMS fields. Regular maintenance plays a crucial role in enhancing reliability and availability, shielding the system from wear-out failures before they occur, especially during critical operational periods. While it cannot prevent 100% of failures, unless the entire system is restored to a new state, such an approach would clearly be cost-prohibitive. The essence of maintainability analysis lies in striking the optimal balance. The graphic below illustrates the relationship between maintenance costs and failure costs, emphasizing that a lower overall cost is preferable.

Graph preventive maintenance

Maintenance tasks are often dictated by safety requirements to meet specific targets. In such cases, the decision-making process is not directly influenced by cost, as failing to meet safety targets renders the product unsellable.

Various decision methods, such as conditional or foreseeable maintenance, exist to address these considerations

Safety - Protecting Lives and Assets:

While reliability, availability, and maintainability addressed the operational aspects of systems, the importance of safety became increasingly evident. The integration of safety into the RAMS framework aimed to prevent accidents, protect human lives, and safeguard assets. This notion gained prominence in industries such as nuclear power, aerospace, and healthcare.

The Chernobyl disaster in 1986 underscored the critical role of safety in complex systems. The aftermath led to a reevaluation of safety practices and a renewed focus on integrating safety considerations into the design and operation of industrial systems.

Safety-related failures encompass those that could potentially impact the user’s physical well-being. The primary objective is to systematically identify all possible failures, isolate those with safety implications, and subsequently mitigate the associated risks to a reasonable level. This explains the choice of the website’s name, MitigateScan. Once mitigation measures are determined, they must be consistently applied and validated throughout the product development cycle, often extending beyond the product’s sale.

Low gravity or low probability are fundamental characteristics of low risk. For instance, a non-functional light in a refrigerator is highly probable but not serious. Conversely, a burning fridge, while serious, may be rare enough to consider the risk mitigated.

Considering safety early in the design process is crucial. It ensures the system’s intrinsic reliability, avoiding the need for corrective patches that might compromise system performance and introduce unnecessary complexity.

Designing a secure part fixation under a train from the outset is superior to retrofitting a casing later on, as the latter involves additional components, costs, and reduces ground clearance.

Safety considerations may conflict with reliability, as they introduce complexity into the system, and increased complexity raises the likelihood of failure. Furthermore, safety measures can prevent the system from functioning if the required safe conditions are not met.

 

RAMS - A Unified Approach:

In recent decades, the RAMS discipline has evolved into a holistic and integrated approach, recognizing the interconnected nature of reliability, availability, maintainability, and safety. Today, RAMS engineering involves a multidisciplinary approach that considers all aspects of system performance, aiming to achieve optimal functionality while minimizing risks and ensuring the safety of users.

The development of RAMS standards, such as IEC 61508, reflects the global recognition of the importance of an integrated approach to risk management.

RAMS in the product development cycle

Integrating RAMS into the design from the outset of the development cycle is essential to create products that are inherently Reliable, Available, Maintainable, and Safe. In the context of the product development cycle, commonly known as the V-cycle, RAMS plays a crucial role at every stage of the lifecycle process:

Graph V-Cycle

See references

    • “Reliability Engineering” by A. Elsayed
    • “Sûreté de fonctionnement des systèmes industriels” by  Alain Villemeur