The formal verification of finite-state probabilistic systems supports the engineering of software with strict quality-of-service (QoS) requirements.
However, its use in software design is currently a tedious process of manual multiobjective optimisation. Software designers must build and verify
probabilistic models for numerous alternative architectures and instantiations of the system parameters. When successful, they end up with feasible but
often suboptimal models. The EvoChecker search-based software engineering approach and tool introduced in our paper employ
multiobjective optimisation genetic algorithms to automate this process and considerably improve its outcome. We evaluate
EvoChecker for six variants of two software systems from the domains of dynamic power
management and foreign exchange trading. These
systems are characterised by different types of design parameters and QoS requirements, and their design spaces comprise between 2E+14 and 7.22E+86
relevant alternative designs. Our results provide strong evidence that EvoChecker significantly outperforms the current practice
and yields actionable insights for software designers.

EvoChecker allows software designers to explore different system designs using probabilistic model templates specified using an extension of the PRISM
modelling language with constructs that support:

Evolvable modules, i.e., the specification of n ≥ 2 alternative module implementations

evolve module mod implementation_{1} endmodule
...
evolve module mod implementation_{n} endmodule

Evolvable parameters, i.e., model parameters of type 'int' and 'double', and acceptable ranges for them

evolve int param [min..max];
evolve double param [min..max];

Evolvable probability distributions, i.e., an n-element
discrete probability distribution, and ranges for the n ≥ 2 probabilities of the distribution

evolve distribution dist [min_{1}..max_{1}] ... [min_{n}..max_{n}];

where [min_{i}..max_{i}] ⊆ [0,1] for all 1 ≤ i ≤ n is used to declare the n-element probability
distribution

Probabilistic model template (T): any valid PRISM probabilistic model (M) augmented with EvoChecker constructs 1-3
Design space of T (DS^{T}): the set of all probabilistic models that are instances of the probabilistic model
template T

Probabilistic Model Synthesis Problem

Consider a software system with a set of QoS requirements comprising n_{1}≥0 constraints and n_{2}≥1 optimisation
objectives:

Given a probabilistic model template T with design space DS^{T}, and the QoS requirements above, the probabilistic model synthesis
problem involves finding the Pareto optimal design set PS
of models from DS^{T} that satisfy the n_{1} constraints and are non-dominated with respect to the n_{2} optimisation
objectives:

EvoChecker Tool

The EvoChecker tool uses the implementations supported by JMetal for
the multiobjective optimisation Genetic algorithms and integrates an embedded instance of the probabilistic model checker
PRISM. The template parser was developed using the
Antlr parser generator.
The EvoChecker source code is available on
GitHub

Software Systems Used to Evaluate EvoChecker

Dynamic Power Management System

A software-controlled dynamic power management (DPM)
system
consisting of a service provider that handles requests generated by a service requester and stored in two request queues of
different priorities. The service provider has four states associated with different power usage, i.e., busy, idle, standby, sleep.
The figure above depicts the power usage of each state (in watts), the possible transitions between states, and the energy consumed by each
transitions (in joules). The task of a software power manager is to reduce power use while maintaining an acceptable service level for the system.

A DPM system designer must select:

the capacity of the request queues, Qmax_{H} and Qmax_{L}

one of several alternative power managers

the parameters associated with the selected power manager

such that the following QoS requirements are satisfied:

ID

Description

Type

R1

The steady-state utilisation of the high-priority queue should be less than 90%

Constraint

R2

The steady-state utilisation of the low-priority queue should be less than 90%

Constraint

R3

The system should operate with minimum steady-state power utilisation

Objective

R4

The number of messages lost at the steady state should be minimised

An FX customer (called a trader) can use the system in two operation modes. In the expert mode, FX executes a loop that analyses market
activity, identifies patterns that satisfy the trader's objectives, and automatically carries out trades. Thus, the Market watch service
extracts real-time exchange rates (bid/ask price) of selected currency pairs. This data is used by a Technical analysisi> service that
evaluates the current trading conditions, predicts future price movement, and decides if the trader's objectives are: (i) "satisfied" (causing the
invocation of an Order service to carry out a trade); (ii) "unsatisfied" (resulting in a new Market watch invocation); or
(iii) "unsatisfied with high variance" (triggering an Alarm service invocation to notify the trader about discrepancies/opportunities not
covered by the trading objectives). In the normal mode, FX assesses the economic outlook of a country using a Fundamental analysis
service that collects, analyses and evaluates information such as news reports, economic data and political events, and provides an assessment on
the country's currency. If satisfied with this assessment, the trader can use the Order service to sell or buy currency, in which case a
Notification service confirms the completion of the trade.

The FX designer has to select third-party implementations for each of the n ≥ 1 services from the workflow above for which in-house
implementations are not available, in order to meet the following QoS requirements:

ID

Description

Type

R1

Workflow executions must complete successfully with probability at least 0.9

Constraint

R2

The total service response time per workflow execution should be minimised

Objective

R3

The probability of a service failure during a workflow execution should be minimised

Objective

R4

The total cost of the third-party services used by a workflow execution should be minimised

We would like to thank our anonymous industrial collaborator for providing the model and QoS requirements for the FX system.

Evaluation

Research Questions

We evaluate the effectiveness of EvoChecker by answering the following research questions:

RQ1 (Validation):

How does EvoChecker perform compared to random search?

RQ2 (Comparison):

How do EvoChecker instances using different multiobjective optimisation genetic algorithms (NSGA-II, SPEA2, MOCell) perform compared to each other?

RQ3 (Insights):

Can EvoChecker provide insights into the trade-offs between the QoS attributes of alternative software
architectures and configurations?

System Variants

We evaluate EvoChecker using the system variants from the table below.
Note: The last two columns in this table contain links to the probabilistic model template T and the probabilistic temporal logic
formulae for the QoS attributes of these system variants.

* Assuming two-decimal precision for double-valued parameters of the probabilistic model templates
** Probabilistic Model Template

Quality Indicators

We evaluate the multiobjective optimisation algorithms used in EvoChecker (NSGA-II, SPEA2, MOCell, Random) using the established Pareto-front
quality indicators below.

Unary additive epsilon (I_{ε}),
which measures the minimum additive term by which the elements of the objective vectors from
a Pareto front approximation must be adjusted so as to dominate the objective vectors from the reference front. Smaller values denote Pareto
front approximations with better convergence.

Hypervolume (I_{HV}),
which measures the volume in the objective space covered by a Pareto front approximation with respect to the reference front. Larger values
denote Pareto front approximations with better convergence and diversity.

Inverted Generational Distance (I_{IGD}),
which provides an "error measure" as the Euclidean distance in the objective space between the reference front and the Pareto front
approximation. Smaller values denote Pareto front approximations with better convergence and diversity.

Results

DPM results

The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the DPM system variants, and the
Figure (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.

Variant

NSGA-II

SPEA2

MOCell

Random

I_{ε} (Epsilon)

DPM_Small

0.0209

0.0130

0.0242

0.1403

+

DPM_Medium

0.0225

0.0123

0.0489

0.1996

+

DPM_Large

0.0229

0.0147

0.0884

0.2497

+

I_{HV} (Hypervolume)

DPM_Small

0.4455

0.4458

0.4396

0.4022

+

DPM_Medium

0.4487

0.4499

0.4386

0.3946

+

DPM_Large

0.4528

0.4549

0.4395

0.3947

+

I_{IGD} (Inverted Generational Distance)

DPM_Small

23E-4

18E-4

16E-4

62E-5

+

DPM_Medium

24E-5

19E-5

28E-5

91E-5

+

DPM_Large

24E-5

20E-5

38E-5

11E-5

+

FX results (with fixed services characteristics)

The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the FX system variants, and the Figure
(bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.

Variant

NSGA-II

SPEA2

MOCell

Random

I_{ε} (Epsilon)

FX_Small

0.6258

0.5083

0.6745

2.2274

+

FX_Medium

1.6379

2.0105

2.0486

6.1529

+

FX_Large

3.8528

5.2777

4.6366

13.023

+

I_{HV} (Hypervolume)

FX_Small

0.611

0.628

0.608

0.593

+

FX_Medium

0.719

0.725

0.702

0.606

+

FX_Large

0.657

0.675

0.633

0.555

+

I_{IGD} (Inverted Generational Distance)

FX_Small

0.00123

0.00129

0.00125

0.00145

+

FX_Medium

0.00192

0.00207

0.00200

0.00316

+

FX_Large

0.00244

0.00255

0.00272

0.00395

+

FX results (with random services characteristics)

We carried out a second study for the FX system, where the characteristics of each concrete service (i.e., cost, success probability, response time)
per
experiment were randomly generated. With this study, we examine whether EvoChecker is capable of synthesising good probabilistic
model sets irrespective of the characteristics of the considered problem. We generated 30 random experiments per system variant and performed 10
independent runs per experiment per system variant (i.e., a total of 300 runs per system variant and 900 runs in total). To allow for a fair
comparison across the experiments and to avoid undesired scaling effects, we normalise the results obtained for each quality indicator per
experiment within the range [0,1].
The Table (left) reports the mean quality indicator values, and the Figure (right) depicts the boxplots for the FX system variants.

Variant

NSGA-II

SPEA2

MOCell

Random

I_{ε} (Epsilon)

FX_Small

0.2212

0.2009

0.2272

0.6200

+

FX_Medium

0.3393

0.3664

0.3645

0.7568

+

FX_Large

0.3396

0.3764

0.3625

0.7970

+

I_{HV} (Hypervolume)

FX_Small

0.9374

0.9914

0.9337

0.9016

+

FX_Medium

0.9514

0.9848

0.9219

0.8138

+

FX_Large

0.9467

0.9804

0.8962

0.7868

+

I_{IGD} (Inverted Generational Distance)

FX_Small

0.6365

0.5348

0.6390

0.8000

+

FX_Medium

0.5919

0.5790

0.6114

0.7957

+

FX_Large

0.5887

0.5622

0.6561

0.8884

+

Comparison with Model Repair

Network Virus Infection

Paper: Model Repair for Markov Decision Processes.
T. Chen, E. Hahn, T. Han, M. Kwiatkowska, H. Qu, and L. Zhang. In 7th Intl. Symposium on Theoretical Aspects of Software Engineering (TASE'13),
pages 85–92, 2013.
In this case study we established that EvoChecker can replicate/outperform the result obtained in the paper Model Repair for Markov Decision Processes by
Chen et al. This work performed model repair on the Network Virus Infection case study which considers a parametric
model of a computer virus infecting a network.
The network is a grid of N by N nodes, with each node connected to four neighbours (the nodes above, below, to the left and to the right),
except for the nodes on the border, for which some of the neighbours are not present. The model is an MDP that represents the situation where the
virus spawns/multiplies. More concretely, once a node is infected, the virus remains at that node and repeatedly tries to infect uninfected
neighbouring nodes. The goal is to repair the model such that the minimal expected number of attacks required by the virus until an infection of the
node (1, 1) starting at (N,N). The repairing action should have a minimum cost (i.e., a minimal modification to the original model).
The cost for the model repair solution obtained by Chen et al., and reported in their paper, is 0.02044.
EvoChecker generated a model repair solution with cost 0.0201473 in less than 3000 evaluations.
We provide the probabilistic model template for this case study and the PCTL
formulae associated with its QoS requirements.

IPv4 Zeroconf Protocol

Paper: Model Repair for Probabilistic Systems.
E. Bartocci, R. Grosu, P. Katsaros, C.R. Ramakrishnan, and S. A. Smolka. In 7th Intl. Conf. on Tools and Algorithms for the Construction and
Analysis of Systems (TACAS'11), pages 326-340, 2011.
The Zeroconf protocol is used for assigning IP addresses in
devices joining a network. When
a new host joins the network it randomly selects an IP address among K possible ones. With m hosts in the network, the collision probability is
q = m/K. A new host asks the other hosts whether the randomly selected IP address is already used and waits for an answer. The probability
that the new host does not get any answer is p, in which case it repeats the query. If after n tries there is still no answer, the
host will erroneously consider the chosen address as valid.
In paper Model repair for probabilistic systems, Bartocci et al. use a Max
Profit algorithm to repair the model given an initial q. For n = 3, p = 0.1, and initial q = 0.6, they reduced the expected number of
steps to termination from 6.15 to 5.1 setting q = 0.5002.
To replicate this in EvoChecker, we define q as an evolvable parameter (double) as follows:

evolve double q [0.5 .. 0.7]

EvoChecker found the optimal q = 0.50000964 and determined that the expected steps to termination are 5.105. EvoChecher obtained these results in
less than 1000 evaluations.
We provide the probabilistic model template for this case study and the
PCTL formula for the QoS requirement.