Cynic's Blog

Posts

Showing posts from April, 2024

Reliability and Availability Metrics and Calculations

For a complex software solution, you usually have to stick to customer requirements for reliability and availability as defined in the SLA. For a monolithic appliance, this could be trivially determined, but most real world applications requires multiple physical nodes, VM or machine. Extrapolating the reliability and availability figures for a complex multi-tier software system could pose a challenge to an IT practitioner who is not familiar with reliability engineering. So, let's dine right into it. Let's first define some key terms, MTTF: Mean Time To Failure aka 'Average time betwwen two failure of a non-reparable component'. MTBF: Mean Time Between Failure aka 'Average time between two failures of a reparable component'. MTTR: Mean Time To Repair aka 'Average time to repair a component'. Now, let's estabish the concept of failure rate (λ) as , $\lambda = \frac{1}{MTBF}$ or $ = \frac{1}{MTTF}$ The reliability function is defined as, $R(t) = ...