Search-Based Synthesis of Probabilistic Models for Quality-of-Service Software Engineering
The formal verification of finite-state probabilistic systems supports the engineering of software with strict quality-of-service (QoS) requirements.
However, its use in software design is currently a tedious process of manual multiobjective optimisation. Software designers must build and verify
probabilistic models for numerous alternative architectures and instantiations of the system parameters. When successful, they end up with feasible but
often suboptimal models. The
EvoChecker search-based software engineering approach and
tool introduced in our paper employ
multiobjective optimisation genetic algorithms to automate this process and considerably improve its outcome. We evaluate
EvoChecker for
six variants of two software systems from the domains of
dynamic power
management and
foreign exchange trading. These
systems are characterised by different types of design parameters and QoS requirements, and their design spaces comprise between 2E+14 and 7.22E+86
relevant alternative designs. Our results provide strong evidence that
EvoChecker significantly outperforms the current practice
and yields actionable insights for software designers.
The main contributions of this work are:
Our paper has been accepted at the 30th Intl. Conference on Automated Software Engineering (ASE 2015)
We have also made available the camera-ready version of our paper.
Software Systems Used to Evaluate EvoChecker
Dynamic Power Management System
A
software-controlled dynamic power management (DPM)
system
consisting of a
service provider that handles requests generated by a
service requester and stored in two
request queues of
different priorities. The
service provider has four states associated with different power usage, i.e.,
busy, idle, standby, sleep.
The figure above depicts the power usage of each state (in watts), the possible transitions between states, and the energy consumed by each
transitions (in joules). The task of a software power manager is to reduce power use while maintaining an acceptable service level for the system.
A DPM system designer must select:
- the capacity of the request queues, QmaxH and QmaxL
- one of several alternative power managers
- the parameters associated with the selected power manager
such that the following QoS requirements are satisfied:
ID |
Description |
Type |
R1 |
The steady-state utilisation of the high-priority queue should be less than 90% |
Constraint |
R2 |
The steady-state utilisation of the low-priority queue should be less than 90% |
Constraint |
R3 |
The system should operate with minimum steady-state power utilisation |
Objective |
R4 |
The number of messages lost at the steady state should be minimised |
Objective |
R5 |
The capacity of both queues should minimised |
Objective |
We consider
three different variants of the DPM system. We provide the
DPM probabilistic model
templates and the
CSL formulae associated with the QoS requirements for these DPM system variants.
Foreign Exchange Trading System
An FX customer (called a trader) can use the system in two operation modes. In the
expert mode, FX executes a loop that analyses market
activity, identifies patterns that satisfy the trader's objectives, and automatically carries out trades. Thus, the
Market watch service
extracts real-time exchange rates (bid/ask price) of selected currency pairs. This data is used by a
Technical analysisi> service that
evaluates the current trading conditions, predicts future price movement, and decides if the trader's objectives are: (i) "satisfied" (causing the
invocation of an
Order service to carry out a trade); (ii) "unsatisfied" (resulting in a new
Market watch invocation); or
(iii) "unsatisfied with high variance" (triggering an
Alarm service invocation to notify the trader about discrepancies/opportunities not
covered by the trading objectives). In the
normal mode, FX assesses the economic outlook of a country using a
Fundamental analysis
service that collects, analyses and evaluates information such as news reports, economic data and political events, and provides an assessment on
the country's currency. If satisfied with this assessment, the trader can use the
Order service to sell or buy currency, in which case a
Notification service confirms the completion of the trade.
The FX designer has to select third-party implementations for each of the
n ≥ 1 services from the workflow above for which in-house
implementations are not available, in order to meet the following QoS requirements:
ID |
Description |
Type |
R1 |
Workflow executions must complete successfully with probability at least 0.9 |
Constraint |
R2 |
The total service response time per workflow execution should be minimised |
Objective |
R3 |
The probability of a service failure during a workflow execution should be minimised |
Objective |
R4 |
The total cost of the third-party services used by a workflow execution should be minimised |
Objective |
We consider
three different variants of the FX system. We provide the
FX probabilistic model
templates and the
PCTL formulae associated with the QoS requirements for these FX system variants.
We would like to thank our anonymous industrial collaborator for providing the model and QoS requirements for the FX system.
Evaluation
Research Questions
We evaluate the effectiveness of EvoChecker by answering the following research questions:
RQ1 (Validation): |
How does EvoChecker perform compared to random search? |
RQ2 (Comparison): |
How do EvoChecker instances using different multiobjective optimisation genetic algorithms (NSGA-II, SPEA2, MOCell) perform compared to each other? |
RQ3 (Insights): |
Can EvoChecker provide insights into the trade-offs between the QoS attributes of alternative software
architectures and configurations? |
System Variants
We evaluate EvoChecker using the system variants from the table below.
Note: The last two columns in this table contain links to the probabilistic model template T and the probabilistic temporal logic
formulae for the QoS attributes of these system variants.
Variant |
Description |
Size* |
T ** |
QoS attributes |
DPM_Small |
QmaxH,L ∈ {1, ..., 10}, m = 2 |
2E+14 |
TDPMsmall |
CSLDPMsmall |
DPM_Medium |
QmaxH,L ∈ {1, ..., 15}, m = 2 |
4.5E+14 |
TDPMmedium |
CSLDPMmedium |
DPM_Large |
QmaxH,L ∈ {1, ..., 20}, m = 2 |
8E+14 |
TDPMlarge |
CSLDPMlarge |
FX_Small |
n = 4, n1 = n2 = n3 = n4= 3 |
4.98E+31 |
TFXsmall |
PCTLFXsmall |
FX_Medium |
n = 6, n1 = n2 = ... = n6 = 4 |
1.39E+65 |
TFXmedium |
PCTLFXmedium |
FX_Large |
n = 8, n1 = n2 = ... = n8 = 4 |
7.222E+86 |
TFXlarge |
PCTLFXlarge |
* Assuming two-decimal precision for double-valued parameters of the probabilistic model templates
** Probabilistic Model Template
Quality Indicators
We evaluate the multiobjective optimisation algorithms used in EvoChecker (NSGA-II, SPEA2, MOCell, Random) using the established Pareto-front
quality indicators below.
-
Unary additive epsilon (Iε),
which measures the minimum additive term by which the elements of the objective vectors from
a Pareto front approximation must be adjusted so as to dominate the objective vectors from the reference front. Smaller values denote Pareto
front approximations with better convergence.
-
Hypervolume (IHV),
which measures the volume in the objective space covered by a Pareto front approximation with respect to the reference front. Larger values
denote Pareto front approximations with better convergence and diversity.
-
Inverted Generational Distance (IIGD),
which provides an "error measure" as the Euclidean distance in the objective space between the reference front and the Pareto front
approximation. Smaller values denote Pareto front approximations with better convergence and diversity.
Results
DPM results
The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the DPM system variants, and the
Figure (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.
Variant |
NSGA-II |
SPEA2 |
MOCell |
Random |
|
|
Iε (Epsilon) |
DPM_Small |
0.0209 |
0.0130 |
0.0242 |
0.1403 |
+ |
DPM_Medium |
0.0225 |
0.0123 |
0.0489 |
0.1996 |
+ |
DPM_Large |
0.0229 |
0.0147 |
0.0884 |
0.2497 |
+ |
|
|
IHV (Hypervolume) |
DPM_Small |
0.4455 |
0.4458 |
0.4396 |
0.4022 |
+ |
DPM_Medium |
0.4487 |
0.4499 |
0.4386 |
0.3946 |
+ |
DPM_Large |
0.4528 |
0.4549 |
0.4395 |
0.3947 |
+ |
|
IIGD (Inverted Generational Distance) |
DPM_Small |
23E-4 |
18E-4 |
16E-4 |
62E-5 |
+ |
DPM_Medium |
24E-5 |
19E-5 |
28E-5 |
91E-5 |
+ |
DPM_Large |
24E-5 |
20E-5 |
38E-5 |
11E-5 |
+ |
FX results (with fixed services characteristics)
The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the FX system variants, and the Figure
(bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.
Variant |
NSGA-II |
SPEA2 |
MOCell |
Random |
|
|
Iε (Epsilon) |
FX_Small |
0.6258 |
0.5083 |
0.6745 |
2.2274 |
+ |
FX_Medium |
1.6379 |
2.0105 |
2.0486 |
6.1529 |
+ |
FX_Large |
3.8528 |
5.2777 |
4.6366 |
13.023 |
+ |
|
|
IHV (Hypervolume) |
FX_Small |
0.611 |
0.628 |
0.608 |
0.593 |
+ |
FX_Medium |
0.719 |
0.725 |
0.702 |
0.606 |
+ |
FX_Large |
0.657 |
0.675 |
0.633 |
0.555 |
+ |
|
IIGD (Inverted Generational Distance) |
FX_Small |
0.00123 |
0.00129 |
0.00125 |
0.00145 |
+ |
FX_Medium |
0.00192 |
0.00207 |
0.00200 |
0.00316 |
+ |
FX_Large |
0.00244 |
0.00255 |
0.00272 |
0.00395 |
+ |
FX results (with random services characteristics)
We carried out a second study for the FX system, where the characteristics of each concrete service (i.e., cost, success probability, response time)
per
experiment were randomly generated. With this study, we examine whether
EvoChecker is capable of synthesising good probabilistic
model sets irrespective of the characteristics of the considered problem. We generated 30 random experiments per system variant and performed 10
independent runs per experiment per system variant (i.e., a total of 300 runs per system variant and 900 runs in total). To allow for a fair
comparison across the experiments and to avoid undesired scaling effects, we normalise the results obtained for each quality indicator per
experiment within the range [0,1].
The Table (left) reports the mean quality indicator values, and the Figure (right) depicts the boxplots for the FX system variants.
Variant |
NSGA-II |
SPEA2 |
MOCell |
Random |
|
|
Iε (Epsilon) |
FX_Small |
0.2212 |
0.2009 |
0.2272 |
0.6200 |
+ |
FX_Medium |
0.3393 |
0.3664 |
0.3645 |
0.7568 |
+ |
FX_Large |
0.3396 |
0.3764 |
0.3625 |
0.7970 |
+ |
|
|
IHV (Hypervolume) |
FX_Small |
0.9374 |
0.9914 |
0.9337 |
0.9016 |
+ |
FX_Medium |
0.9514 |
0.9848 |
0.9219 |
0.8138 |
+ |
FX_Large |
0.9467 |
0.9804 |
0.8962 |
0.7868 |
+ |
|
IIGD (Inverted Generational Distance) |
FX_Small |
0.6365 |
0.5348 |
0.6390 |
0.8000 |
+ |
FX_Medium |
0.5919 |
0.5790 |
0.6114 |
0.7957 |
+ |
FX_Large |
0.5887 |
0.5622 |
0.6561 |
0.8884 |
+ |
Comparison with Model Repair
Network Virus Infection
Paper:
Model Repair for Markov Decision Processes.
T. Chen, E. Hahn, T. Han, M. Kwiatkowska, H. Qu, and L. Zhang. In 7th Intl. Symposium on Theoretical Aspects of Software Engineering (TASE'13),
pages 85–92, 2013.
In this case study we established that EvoChecker can replicate/outperform the result obtained in the paper
Model Repair for Markov Decision Processes by
Chen et al. This work performed model repair on the
Network Virus Infection case study which considers a parametric
model of a computer virus infecting a network.
The network is a grid of N by N nodes, with each node connected to four neighbours (the nodes above, below, to the left and to the right),
except for the nodes on the border, for which some of the neighbours are not present. The model is an MDP that represents the situation where the
virus spawns/multiplies. More concretely, once a node is infected, the virus remains at that node and repeatedly tries to infect uninfected
neighbouring nodes. The goal is to repair the model such that the minimal expected number of attacks required by the virus until an infection of the
node (1, 1) starting at (N,N). The repairing action should have a minimum cost (i.e., a minimal modification to the original model).
The cost for the model repair solution obtained by Chen et al., and reported in their paper, is 0.02044.
EvoChecker generated a model repair solution with cost 0.0201473 in less than 3000 evaluations.
We provide the
probabilistic model template for this case study and the
PCTL
formulae associated with its QoS requirements.
IPv4 Zeroconf Protocol
Paper:
Model Repair for Probabilistic Systems.
E. Bartocci, R. Grosu, P. Katsaros, C.R. Ramakrishnan, and S. A. Smolka. In 7th Intl. Conf. on Tools and Algorithms for the Construction and
Analysis of Systems (TACAS'11), pages 326-340, 2011.
The
Zeroconf protocol is used for assigning IP addresses in
devices joining a network. When
a new host joins the network it randomly selects an IP address among K possible ones. With m hosts in the network, the collision probability is
q = m/K. A new host asks the other hosts whether the randomly selected IP address is already used and waits for an answer. The probability
that the new host does not get any answer is
p, in which case it repeats the query. If after
n tries there is still no answer, the
host will erroneously consider the chosen address as valid.
In paper
Model repair for probabilistic systems, Bartocci et al. use a Max
Profit algorithm to repair the model given an initial
q. For n = 3, p = 0.1, and initial q = 0.6, they reduced the expected number of
steps to termination from 6.15 to 5.1 setting q = 0.5002.
To replicate this in EvoChecker, we define
q as an evolvable parameter (double) as follows:
evolve double q [0.5 .. 0.7]
EvoChecker found the optimal q = 0.50000964 and determined that the expected steps to termination are 5.105. EvoChecher obtained these results in
less than 1000 evaluations.
We provide the
probabilistic model template for this case study and the
PCTL formula for the QoS requirement.