We present a method for the synthesis of system designs that satisfy strict quality requirements, are Pareto-optimal with respect to a set of
quality optimisation criteria, and are robust to variations in the system parameters.
To this end, we model the design space of the system under development as a parametric continuous-time Markov chain (pCTMC) with discrete and
continuous parameters that correspond to alternative system architectures and to the ranges of possible values for configuration parameters,
respectively. Given this pCTMC and required tolerance levels for the configuration parameters, our method produces a sensitivity-aware Pareto-
optimal set of designs, which allows the modeller to inspect the ranges of quality attributes induced by these tolerances, thus enabling the
effective selection of robust designs. Through application to two systems from different domains, we demonstrate the ability of our method to
synthesise robust designs with a wide spectrum of useful trade-offs between quality attributes and sensitivity.
An efficient method that combines probabilistic model synthesis and precise parameter synthesis to automate the
generation of sensitivity-aware Pareto fronts for quality engineering.
A parametric continuous-time Markov chain (pCTMC) defines the design space of the system under development.
A candidate designC(P',q) defines a fixed discrete parameter valuation and continuous parameter values from a small region.
A quality attribute over pCTMCs-defined design spaces is defined using continuous stochastic logic (CSL).
A quality requirement of a system with design space given by a pCTMC C(P,Q) can be:
An objective function fi, i ∈ I, that corresponds to a quality system attribute and defined in
terms of a CSL path formula φi, i ∈ I, such that:
A boolean constraint cj, j ∈ J, that corresponds to a quality system attribute and defined in
terms of a CSL path formula φi, i ∈ I, and a threshold ~j rj such that:
Parametric Markov Chain Synthesis Problem
Consider a system with design spaceC(P,Q), quality requirements given by
objective functions{fi}i ∈ I and constraints
{cj}j ∈ J,
and designer-specified tolerances {γk}k ∈ K for the continuous parameters of the system.
The set of feasible designs F for the system, i.e., of candidate designs that meet the tolerances {γk}k
∈ K
and satisfy the constraints {cj}j ∈ J is given by:
The parametric Markov chain synthesis problem consists of finding the Pareto-optimal set PS of candidate
designs (i.e. pCTMCs) with tolerances {γk}k ∈ K that satisfy the
constraints{cj}j ∈ J and are non-dominated with respect to the objective functions{fi}i ∈ I:
The sensitivity-aware dominance relation '≺' between two candidate designs is defined as follows.
Parametric CTMC Synthesis Method
Computing the Pareto-optimal design setPS is typically unfeasible, as the design spaceC(P,Q) is infinite due to its real-valued parameters. Moreover, every candidate designC(P′, q)
corresponds to an infinite set of CTMCs that cannot all be analysed to establish its quality and sensitivity.
Our pCTMC synthesis method, whose high-level steps are shown in Algorithm 1 below, combines search-based software engineering (SBSE) techniques with
techniques for the effective analysis of pCTMCs, and produces a close approximation of the Pareto-optimal design set.
Function AnalyseDesign
This function establishes the quality attributes and sensitivity of candidate designs
by employing precise parameter synthesis techniques to compute safe enclosures of the satisfaction probability of CSL formulae over pCTMCs.
Given a pCTMCC(P′,q) and a CSL path formula φ these techniques provide a safe under-approximation
∧qmin and a safe over-approximation ∧qmax of the minimal and maximal probability that C(P′,q)
satisfies φ:
This allows us to safely approximate the bounds {fi ⊥, fi ⊤}i ∈
I of the
objective functions, and the constraints{cj}j ∈ J.
Function GetCandidateDesigns
This function uses established multiobjective optimisation genetic algorithms (MOGAs), e.g., NSGA-II or SPEA2, and encodes a
candidate design C(P′,q) that satisfies a fixed set of tolerances {γk}k ∈ K
by the gene tuple (p, q), where p ∈ P is the centre point of the continuous parameter region P′.
Given a population of candidate designs, several GetCandidateDesigns executions evolve the population into
'fitter' designs using MOGA selection, crossover and mutation. The evolution of the design population terminates after a fixed number of design
evaluations or when a predetermined number of successive iterations generate populations with no significantly fitter designs.
Software Systems Used to Evaluate RODES
Google File System
The Google File System (GFS) represents the replicated file system used by
Google’s search engine.
GFS partitions files into chunks of equal size, and stores copies of each chunk on multiple chunk servers. A master server monitors the locations of
these copies and the chunk servers, replicating the chunks as needed. During normal operation, GFS stores CMAX copies of each chunk. However, as
servers fail and are repaired, the number c of copies for a chunk may vary from 0 to CMAX.
We adapted the CTMC model of the lifecycle of a GFS chunk by considering several continuous and discrete
parameters that a designer of the system needs to decide.
To evaluate our design synthesis method, we assume that GFS designers must select the hardware failure and repair rates cHardFail and
cHardRepair of the chunk servers, and the maximum number of chunks NC stored on a chunk server within the designer-specified ranges.
These parameters reflect the fact that designers can choose from a range of physical servers, can select different levels of service offered by a
hardware repair workshop, and can decide a maximum workload for chunk servers.
The GFS model is encoded in the PRISM modelling
language extended with the evolve constructs.
φ11 = ¬SL1 U[10,60] SL1, where SL1 = M_up ∧ c > 0 holds in states where service level 1 (master up and at
least one chunk copy available) is provided
f2
φ2 = C≤60, where a reward of 1 is assigned to the states where the number of running chunk servers is at least 0.5M.
c1
ψ1 = C≤60 with threshold ~1 r1 ≡ '≤ 5', where a transition reward of 1 is assigned to each
chunk replication transition.
Workstation Cluster
We extend the CTMC models of a cluster availability management system.
This CTMC models a system comprising two sub-clusters, each with N workstations and a switch that connects the workstations to a central
backbone. For each component, we consider failure, inspection and repair rates (where repairs are initiated only after an inspection detects
failures), and we assume that designers must decide these rates for workstations, i.e., the real-valued parameters wsFail, wsFail and
wsRepair for our pCTMC, respectively. Additionally, we assume that designers must select the sub-cluster size N, and must choose
between an expensive repair implementation (i.e., pCTMC module) with a 100% success probability and a cheaper repair module with 50% success
probability.
The cluster model is encoded in the PRISM
modelling language extended with the
evolve constructs.
The figure below shows the Pareto fronts obtained using the "lower bound" definition for the objective functions f1 (x-axis) and
f2 (y-axis) over candidate designs, and parameters
ε1 = ε2 = ε ∈ {0, 0.05, 0.1}.
Boxes represent quality-attribute regions, coloured by sensitivity (red: sensitive, blue: robust).
Red-bordered boxes and arrows indicate the sub-optimal robust designs.
For each front, we report mean sensitivity and mean volume.
The figure below shows the corresponding synthesised sensitivity-aware Pareto-optimal designs set.
Rectangles in x-y plane correspond to the continuous parameter regions (cHWF: hardware failure rate; cHWR: hardware repair rate) induced by the
tolerance γ.
The box heights show the value of the discrete parameter NC. Boxes are coloured by sensitivity.
Statistical Analysis
We also analysed the performance of our NSGA-II-based RODES against a tool variant that uses random search (RS).
For each tool variant and combination of γ ∈ {0.01, 0.02} and ε ∈ {0, 0.05, 0.1} we carried out 30
independent runs.
Then, we compared the Pareto fronts achieved by each variant against this reference front by using the following metrics
M1 = w x Iεnorm+ (1−w) x sensnorm
M2 = w x IIGD;norm+ (1−w) x sensnorm
which use a weight w ∈ [0, 1] to combine normalised versions of the normalised design sensitivity with the following established (but
sensitivity-agnostic) Pareto-front quality indicators:
Unary additive epsilon (Iε),
which measures the minimum additive term by which the elements of the objective vectors from
a Pareto front approximation must be adjusted so as to dominate the objective vectors from the reference front. Smaller values denote Pareto
front approximations with better convergence.
Inverted Generational Distance (IIGD),
which provides an "error measure" as the Euclidean distance in the objective space between the reference front and the Pareto front
approximation. Smaller values denote Pareto front approximations with better convergence and diversity.
The figure below compares RODES and RS across our ε, γ combinations using metrics M1 and M2 with w = 0.5.
Evaluating RODES on the Cluster Availability Management System
As for GFS, the figure below shows the Pareto fronts obtained using the "lower bound" definition for the objective functions f1 (x-axis) and f2 (y-
axis) over candidate designs, and parameters ε1 = ε2 = ε ∈ {0, 0.05, 0.1}.
Boxes represent quality-attribute regions, coloured by sensitivity (red: sensitive, blue: robust). Red-bordered boxes and arrows indicate the sub-
optimal robust designs. For each front, we report mean sensitivity and mean volume.
The figure below shows the corresponding synthesised sensitivity-aware Pareto-optimal designs set.
Rectangles in x-y plane correspond to the continuous parameter regions (cHWF: hardware failure rate; cHWR: hardware repair rate) induced by the
tolerance γ.
The box heights show the value of the discrete parameter NC. Boxes are coloured by sensitivity.
The table below shows the average design sensitivity for two variants of the synthesis problem corresponding to different ranges of possible values
for the parameter N, for several combinations of $γ and ε.
The results show that sensitivity-aware designs (i.e. where ε > 0) for N ∈ {10..15} have lower sensitivity than for N ∈
{11..14}, validating our observation that more robust designs are obtained for N=10 or N=15.