We present a method for the synthesis of system designs that satisfy strict quality requirements, are Pareto-optimal with respect to a set of
quality optimisation criteria, and are robust to variations in the system parameters.
To this end, we model the design space of the system under development as a parametric continuous-time Markov chain (pCTMC) with discrete and
continuous parameters that correspond to alternative system architectures and to the ranges of possible values for configuration parameters,
respectively. Given this pCTMC and required tolerance levels for the configuration parameters, our method produces a sensitivity-aware Pareto-
optimal set of designs, which allows the modeller to inspect the ranges of quality attributes induced by these tolerances, thus enabling the
effective selection of robust designs. Through application to two systems from different domains, we demonstrate the ability of our method to
synthesise robust designs with a wide spectrum of useful trade-offs between quality attributes and sensitivity.

An efficient method that combines probabilistic model synthesis and precise parameter synthesis to automate the
generation of sensitivity-aware Pareto fronts for quality engineering.

A parametric continuous-time Markov chain (pCTMC) defines the design space of the system under development.
A candidate designC(P',q) defines a fixed discrete parameter valuation and continuous parameter values from a small region.
A quality attribute over pCTMCs-defined design spaces is defined using continuous stochastic logic (CSL).

A quality requirement of a system with design space given by a pCTMC C(P,Q) can be:

An objective function f_{i}, i ∈ I, that corresponds to a quality system attribute and defined in
terms of a CSL path formula φ_{i}, i ∈ I, such that:

A boolean constraint c_{j}, j ∈ J, that corresponds to a quality system attribute and defined in
terms of a CSL path formula φ_{i}, i ∈ I, and a threshold ~_{j }r_{j} such that:

Parametric Markov Chain Synthesis Problem

Consider a system with design spaceC(P,Q), quality requirements given by
objective functions{f_{i}}_{i ∈ I} and constraints
{c_{j}}_{j ∈ J},
and designer-specified tolerances {γ_{k}}_{k ∈ K} for the continuous parameters of the system.
The set of feasible designs F for the system, i.e., of candidate designs that meet the tolerances {γ_{k}}_{k
∈ K}
and satisfy the constraints {c_{j}}_{j ∈ J} is given by:

The parametric Markov chain synthesis problem consists of finding the Pareto-optimal set PS of candidate
designs (i.e. pCTMCs) with tolerances {γ_{k}}_{k ∈ K} that satisfy the
constraints{c_{j}}_{j ∈ J} and are non-dominated with respect to the objective functions{f_{i}}_{i ∈ I}:

The sensitivity-aware dominance relation '≺' between two candidate designs is defined as follows.

Parametric CTMC Synthesis Method

Computing the Pareto-optimal design setPS is typically unfeasible, as the design spaceC(P,Q) is infinite due to its real-valued parameters. Moreover, every candidate designC(P′, q)
corresponds to an infinite set of CTMCs that cannot all be analysed to establish its quality and sensitivity.
Our pCTMC synthesis method, whose high-level steps are shown in Algorithm 1 below, combines search-based software engineering (SBSE) techniques with
techniques for the effective analysis of pCTMCs, and produces a close approximation of the Pareto-optimal design set.

Function AnalyseDesign

This function establishes the quality attributes and sensitivity of candidate designs
by employing precise parameter synthesis techniques to compute safe enclosures of the satisfaction probability of CSL formulae over pCTMCs.
Given a pCTMCC(P′,q) and a CSL path formula φ these techniques provide a safe under-approximation
∧^{q}_{min} and a safe over-approximation ∧^{q}_{max} of the minimal and maximal probability that C(P′,q)
satisfies φ:

This allows us to safely approximate the bounds {f_{i}^{ ⊥}, f_{i}^{ ⊤}}_{i ∈
I} of the
objective functions, and the constraints{c_{j}}_{j ∈ J}.

Function GetCandidateDesigns

This function uses established multiobjective optimisation genetic algorithms (MOGAs), e.g., NSGA-II or SPEA2, and encodes a
candidate design C(P′,q) that satisfies a fixed set of tolerances {γ_{k}}_{k ∈ K}
by the gene tuple (p, q), where p ∈ P is the centre point of the continuous parameter region P′.
Given a population of candidate designs, several GetCandidateDesigns executions evolve the population into
'fitter' designs using MOGA selection, crossover and mutation. The evolution of the design population terminates after a fixed number of design
evaluations or when a predetermined number of successive iterations generate populations with no significantly fitter designs.

Software Systems Used to Evaluate RODES

Google File System

The Google File System (GFS) represents the replicated file system used by
Google’s search engine.
GFS partitions files into chunks of equal size, and stores copies of each chunk on multiple chunk servers. A master server monitors the locations of
these copies and the chunk servers, replicating the chunks as needed. During normal operation, GFS stores CMAX copies of each chunk. However, as
servers fail and are repaired, the number c of copies for a chunk may vary from 0 to CMAX.
We adapted the CTMC model of the lifecycle of a GFS chunk by considering several continuous and discrete
parameters that a designer of the system needs to decide.
To evaluate our design synthesis method, we assume that GFS designers must select the hardware failure and repair rates cHardFail and
cHardRepair of the chunk servers, and the maximum number of chunks NC stored on a chunk server within the designer-specified ranges.
These parameters reflect the fact that designers can choose from a range of physical servers, can select different levels of service offered by a
hardware repair workshop, and can decide a maximum workload for chunk servers.
The GFS model is encoded in the PRISM modelling
language extended with the evolve constructs.

φ_{1}1 = ¬SL1 U[10,60] SL1, where SL1 = M_up ∧ c > 0 holds in states where service level 1 (master up and at
least one chunk copy available) is provided

f_{2}

φ_{2} = C≤60, where a reward of 1 is assigned to the states where the number of running chunk servers is at least 0.5M.

c_{1}

ψ_{1} = C≤60 with threshold ~_{1} r_{1} ≡ '≤ 5', where a transition reward of 1 is assigned to each
chunk replication transition.

Workstation Cluster

We extend the CTMC models of a cluster availability management system.
This CTMC models a system comprising two sub-clusters, each with N workstations and a switch that connects the workstations to a central
backbone. For each component, we consider failure, inspection and repair rates (where repairs are initiated only after an inspection detects
failures), and we assume that designers must decide these rates for workstations, i.e., the real-valued parameters wsFail, wsFail and
wsRepair for our pCTMC, respectively. Additionally, we assume that designers must select the sub-cluster size N, and must choose
between an expensive repair implementation (i.e., pCTMC module) with a 100% success probability and a cheaper repair module with 50% success
probability.
The cluster model is encoded in the PRISM
modelling language extended with the
evolve constructs.

φ_{1} = ¬premium U[20,100] premium, where premium denotes a system service where at least 1.25N workstations are
connected and operating

f_{2}

φ_{2} = C≤100, where the reward 1 is assigned only to the states where the number of operation clusters is between 1.2N and
1.6N.

c_{1}

ψ_{1} = C≤100 with threshold ~_{1} r_{1} ≡ '≤ 80', where transition rewards are associated with repair
actions of the workstations, switches and backbone.

The figure below shows the Pareto fronts obtained using the "lower bound" definition for the objective functions f_{1} (x-axis) and
f_{2} (y-axis) over candidate designs, and parameters
ε_{1} = ε_{2} = ε ∈ {0, 0.05, 0.1}.
Boxes represent quality-attribute regions, coloured by sensitivity (red: sensitive, blue: robust).
Red-bordered boxes and arrows indicate the sub-optimal robust designs.
For each front, we report mean sensitivity and mean volume.

The figure below shows the corresponding synthesised sensitivity-aware Pareto-optimal designs set.
Rectangles in x-y plane correspond to the continuous parameter regions (cHWF: hardware failure rate; cHWR: hardware repair rate) induced by the
tolerance γ.
The box heights show the value of the discrete parameter NC. Boxes are coloured by sensitivity.

Statistical Analysis

We also analysed the performance of our NSGA-II-based RODES against a tool variant that uses random search (RS).
For each tool variant and combination of γ ∈ {0.01, 0.02} and ε ∈ {0, 0.05, 0.1} we carried out 30
independent runs.
Then, we compared the Pareto fronts achieved by each variant against this reference front by using the following metrics

M_{1} = w x I_{εnorm}+ (1−w) x sens_{norm}

M_{2} = w x I_{IGD;norm}+ (1−w) x sens_{norm}

which use a weight w ∈ [0, 1] to combine normalised versions of the normalised design sensitivity with the following established (but
sensitivity-agnostic) Pareto-front quality indicators:

Unary additive epsilon (I_{ε}),
which measures the minimum additive term by which the elements of the objective vectors from
a Pareto front approximation must be adjusted so as to dominate the objective vectors from the reference front. Smaller values denote Pareto
front approximations with better convergence.

Inverted Generational Distance (I_{IGD}),
which provides an "error measure" as the Euclidean distance in the objective space between the reference front and the Pareto front
approximation. Smaller values denote Pareto front approximations with better convergence and diversity.

The figure below compares RODES and RS across our ε, γ combinations using metrics M_{1} and M_{2} with w = 0.5.

Evaluating RODES on the Cluster Availability Management System

As for GFS, the figure below shows the Pareto fronts obtained using the "lower bound" definition for the objective functions f1 (x-axis) and f2 (y-
axis) over candidate designs, and parameters ε_{1} = ε_{2} = ε ∈ {0, 0.05, 0.1}.
Boxes represent quality-attribute regions, coloured by sensitivity (red: sensitive, blue: robust). Red-bordered boxes and arrows indicate the sub-
optimal robust designs. For each front, we report mean sensitivity and mean volume.

The figure below shows the corresponding synthesised sensitivity-aware Pareto-optimal designs set.
Rectangles in x-y plane correspond to the continuous parameter regions (cHWF: hardware failure rate; cHWR: hardware repair rate) induced by the
tolerance γ.
The box heights show the value of the discrete parameter NC. Boxes are coloured by sensitivity.

The table below shows the average design sensitivity for two variants of the synthesis problem corresponding to different ranges of possible values
for the parameter N, for several combinations of $γ and ε.
The results show that sensitivity-aware designs (i.e. where ε > 0) for N ∈ {10..15} have lower sensitivity than for N ∈
{11..14}, validating our observation that more robust designs are obtained for N=10 or N=15.