EvoChecker

Search-Based Synthesis of Probabilistic Models
for Quality-of-Service Software Engineering

Simos Gerasimou, Giordano Tamburrelli, Radu Calinescu

The formal verification of finite-state probabilistic systems supports the engineering of software with strict quality-of-service (QoS) requirements. However, its use in software design is currently a tedious process of manual multiobjective optimisation. Software designers must build and verify probabilistic models for numerous alternative architectures and instantiations of the system parameters. When successful, they end up with feasible but often suboptimal models. The EvoChecker search-based software engineering approach and tool introduced in our paper employ multiobjective optimisation genetic algorithms to automate this process and considerably improve its outcome. We evaluate EvoChecker for six variants of two software systems from the domains of dynamic power management and foreign exchange trading. These systems are characterised by different types of design parameters and QoS requirements, and their design spaces comprise between 2E+14 and 7.22E+86 relevant alternative designs. Our results provide strong evidence that EvoChecker significantly outperforms the current practice and yields actionable insights for software designers.

The main contributions of this work are:

The EvoChecker extension of the PRISM modelling language, and the EvoChecker search-based model synthesis approach
The definition of the probabilistic model synthesis problem
The open-source EvoChecker tool, which we make freely available
An extensive evaluation of the EvoChecker approach and prototype tool in two case studies

Our paper has been accepted at the 30th Intl. Conference on Automated Software Engineering (ASE 2015)
We have also made available the camera-ready version of our paper.

EvoChecker

EvoChecker Modelling Language

EvoChecker allows software designers to explore different system designs using probabilistic model templates specified using an extension of the PRISM modelling language with constructs that support:

Evolvable modules, i.e., the specification of n ≥ 2 alternative module implementations
evolve module mod implementation₁ endmodule
...
evolve module mod implementation_n endmodule
Evolvable parameters, i.e., model parameters of type 'int' and 'double', and acceptable ranges for them
evolve int param [min..max];
evolve double param [min..max];
Evolvable probability distributions, i.e., an n-element discrete probability distribution, and ranges for the n ≥ 2 probabilities of the distribution
evolve distribution dist [min₁..max₁] ... [min_n..max_n];
where [min_i..max_i] ⊆ [0,1] for all 1 ≤ i ≤ n is used to declare the n-element probability distribution

Probabilistic model template (T): any valid PRISM probabilistic model (M) augmented with EvoChecker constructs 1-3
Design space of T (DS^T): the set of all probabilistic models that are instances of the probabilistic model template T

Probabilistic Model Synthesis Problem

Consider a software system with a set of QoS requirements comprising n₁≥0 constraints and n₂≥1 optimisation objectives:

Given a probabilistic model template T with design space DS^T, and the QoS requirements above, the probabilistic model synthesis problem involves finding the Pareto optimal design set PS of models from DS^T that satisfy the n₁ constraints and are non-dominated with respect to the n₂ optimisation objectives:

EvoChecker Tool

approach — High-level EvoChecker architecture

The EvoChecker tool uses the implementations supported by JMetal for the multiobjective optimisation Genetic algorithms and integrates an embedded instance of the probabilistic model checker PRISM. The template parser was developed using the Antlr parser generator.
The EvoChecker source code is available on GitHub

Software Systems Used to Evaluate EvoChecker

Dynamic Power Management System

A software-controlled dynamic power management (DPM) system consisting of a service provider that handles requests generated by a service requester and stored in two request queues of different priorities. The service provider has four states associated with different power usage, i.e., busy, idle, standby, sleep. The figure above depicts the power usage of each state (in watts), the possible transitions between states, and the energy consumed by each transitions (in joules). The task of a software power manager is to reduce power use while maintaining an acceptable service level for the system.

A DPM system designer must select:

the capacity of the request queues, Qmax_H and Qmax_L
one of several alternative power managers
the parameters associated with the selected power manager

such that the following QoS requirements are satisfied:

ID	Description	Type
R1	The steady-state utilisation of the high-priority queue should be less than 90%	Constraint
R2	The steady-state utilisation of the low-priority queue should be less than 90%	Constraint
R3	The system should operate with minimum steady-state power utilisation	Objective
R4	The number of messages lost at the steady state should be minimised	Objective
R5	The capacity of both queues should minimised	Objective

We consider three different variants of the DPM system. We provide the DPM probabilistic model templates and the CSL formulae associated with the QoS requirements for these DPM system variants.

Foreign Exchange Trading System

An FX customer (called a trader) can use the system in two operation modes. In the expert mode, FX executes a loop that analyses market activity, identifies patterns that satisfy the trader's objectives, and automatically carries out trades. Thus, the Market watch service extracts real-time exchange rates (bid/ask price) of selected currency pairs. This data is used by a Technical analysisi> service that evaluates the current trading conditions, predicts future price movement, and decides if the trader's objectives are: (i) "satisfied" (causing the invocation of an Order service to carry out a trade); (ii) "unsatisfied" (resulting in a new Market watch invocation); or (iii) "unsatisfied with high variance" (triggering an Alarm service invocation to notify the trader about discrepancies/opportunities not covered by the trading objectives). In the normal mode, FX assesses the economic outlook of a country using a Fundamental analysis service that collects, analyses and evaluates information such as news reports, economic data and political events, and provides an assessment on the country's currency. If satisfied with this assessment, the trader can use the Order service to sell or buy currency, in which case a Notification service confirms the completion of the trade.

The FX designer has to select third-party implementations for each of the n ≥ 1 services from the workflow above for which in-house implementations are not available, in order to meet the following QoS requirements:

ID	Description	Type
R1	Workflow executions must complete successfully with probability at least 0.9	Constraint
R2	The total service response time per workflow execution should be minimised	Objective
R3	The probability of a service failure during a workflow execution should be minimised	Objective
R4	The total cost of the third-party services used by a workflow execution should be minimised	Objective

We consider three different variants of the FX system. We provide the FX probabilistic model templates and the PCTL formulae associated with the QoS requirements for these FX system variants.

We would like to thank our anonymous industrial collaborator for providing the model and QoS requirements for the FX system.

Evaluation

Research Questions

We evaluate the effectiveness of EvoChecker by answering the following research questions:

RQ1 (Validation):	How does EvoChecker perform compared to random search?
RQ2 (Comparison):	How do EvoChecker instances using different multiobjective optimisation genetic algorithms (NSGA-II, SPEA2, MOCell) perform compared to each other?
RQ3 (Insights):	Can EvoChecker provide insights into the trade-offs between the QoS attributes of alternative software architectures and configurations?

System Variants

We evaluate EvoChecker using the system variants from the table below.

Note: The last two columns in this table contain links to the probabilistic model template T and the probabilistic temporal logic formulae for the QoS attributes of these system variants.

Variant	Description	Size^*	T ^**	QoS attributes
DPM_Small	Qmax_H,L ∈ {1, ..., 10}, m = 2	2E+14	T_{DPM_small}	CSL_{DPM_small}
DPM_Medium	Qmax_H,L ∈ {1, ..., 15}, m = 2	4.5E+14	T_{DPM_medium}	CSL_{DPM_medium}
DPM_Large	Qmax_H,L ∈ {1, ..., 20}, m = 2	8E+14	T_{DPM_large}	CSL_{DPM_large}
FX_Small	n = 4, n₁ = n₂ = n₃ = n₄= 3	4.98E+31	T_{FX_small}	PCTL_{FX_small}
FX_Medium	n = 6, n₁ = n₂ = ... = n_{6 = 4}	1.39E+65	T_{FX_medium}	PCTL_{FX_medium}
FX_Large	n = 8, n₁ = n₂ = ... = n_{8 = 4}	7.222E+86	T_{FX_large}	PCTL_{FX_large}

* Assuming two-decimal precision for double-valued parameters of the probabilistic model templates
** Probabilistic Model Template

Quality Indicators

We evaluate the multiobjective optimisation algorithms used in EvoChecker (NSGA-II, SPEA2, MOCell, Random) using the established Pareto-front quality indicators below.

Unary additive epsilon (I_ε), which measures the minimum additive term by which the elements of the objective vectors from a Pareto front approximation must be adjusted so as to dominate the objective vectors from the reference front. Smaller values denote Pareto front approximations with better convergence.
Hypervolume (I_HV), which measures the volume in the objective space covered by a Pareto front approximation with respect to the reference front. Larger values denote Pareto front approximations with better convergence and diversity.
Inverted Generational Distance (I_IGD), which provides an "error measure" as the Euclidean distance in the objective space between the reference front and the Pareto front approximation. Smaller values denote Pareto front approximations with better convergence and diversity.

Results

DPM results

The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the DPM system variants, and the Figure (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.

Variant	NSGA-II	SPEA2	MOCell	Random
	I_ε (Epsilon)
DPM_Small	0.0209	0.0130	0.0242	0.1403	+
DPM_Medium	0.0225	0.0123	0.0489	0.1996	+
DPM_Large	0.0229	0.0147	0.0884	0.2497	+

	I_HV (Hypervolume)
DPM_Small	0.4455	0.4458	0.4396	0.4022	+
DPM_Medium	0.4487	0.4499	0.4386	0.3946	+
DPM_Large	0.4528	0.4549	0.4395	0.3947	+

I_IGD (Inverted Generational Distance)
DPM_Small	23E-4	18E-4	16E-4	62E-5	+
DPM_Medium	24E-5	19E-5	28E-5	91E-5	+
DPM_Large	24E-5	20E-5	38E-5	11E-5	+

FX results (with fixed services characteristics)

The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the FX system variants, and the Figure (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.

Variant	NSGA-II	SPEA2	MOCell	Random
	I_ε (Epsilon)
FX_Small	0.6258	0.5083	0.6745	2.2274	+
FX_Medium	1.6379	2.0105	2.0486	6.1529	+
FX_Large	3.8528	5.2777	4.6366	13.023	+

	I_HV (Hypervolume)
FX_Small	0.611	0.628	0.608	0.593	+
FX_Medium	0.719	0.725	0.702	0.606	+
FX_Large	0.657	0.675	0.633	0.555	+

I_IGD (Inverted Generational Distance)
FX_Small	0.00123	0.00129	0.00125	0.00145	+
FX_Medium	0.00192	0.00207	0.00200	0.00316	+
FX_Large	0.00244	0.00255	0.00272	0.00395	+

FX results (with random services characteristics)

We carried out a second study for the FX system, where the characteristics of each concrete service (i.e., cost, success probability, response time) per experiment were randomly generated. With this study, we examine whether EvoChecker is capable of synthesising good probabilistic model sets irrespective of the characteristics of the considered problem. We generated 30 random experiments per system variant and performed 10 independent runs per experiment per system variant (i.e., a total of 300 runs per system variant and 900 runs in total). To allow for a fair comparison across the experiments and to avoid undesired scaling effects, we normalise the results obtained for each quality indicator per experiment within the range [0,1].

The Table (left) reports the mean quality indicator values, and the Figure (right) depicts the boxplots for the FX system variants.

Variant	NSGA-II	SPEA2	MOCell	Random
	I_ε (Epsilon)
FX_Small	0.2212	0.2009	0.2272	0.6200	+
FX_Medium	0.3393	0.3664	0.3645	0.7568	+
FX_Large	0.3396	0.3764	0.3625	0.7970	+

	I_HV (Hypervolume)
FX_Small	0.9374	0.9914	0.9337	0.9016	+
FX_Medium	0.9514	0.9848	0.9219	0.8138	+
FX_Large	0.9467	0.9804	0.8962	0.7868	+

I_IGD (Inverted Generational Distance)
FX_Small	0.6365	0.5348	0.6390	0.8000	+
FX_Medium	0.5919	0.5790	0.6114	0.7957	+
FX_Large	0.5887	0.5622	0.6561	0.8884	+

Comparison with Model Repair

Network Virus Infection

Paper: Model Repair for Markov Decision Processes. T. Chen, E. Hahn, T. Han, M. Kwiatkowska, H. Qu, and L. Zhang. In 7th Intl. Symposium on Theoretical Aspects of Software Engineering (TASE'13), pages 85–92, 2013.

In this case study we established that EvoChecker can replicate/outperform the result obtained in the paper Model Repair for Markov Decision Processes by Chen et al. This work performed model repair on the Network Virus Infection case study which considers a parametric model of a computer virus infecting a network. The network is a grid of N by N nodes, with each node connected to four neighbours (the nodes above, below, to the left and to the right), except for the nodes on the border, for which some of the neighbours are not present. The model is an MDP that represents the situation where the virus spawns/multiplies. More concretely, once a node is infected, the virus remains at that node and repeatedly tries to infect uninfected neighbouring nodes. The goal is to repair the model such that the minimal expected number of attacks required by the virus until an infection of the node (1, 1) starting at (N,N). The repairing action should have a minimum cost (i.e., a minimal modification to the original model).

The cost for the model repair solution obtained by Chen et al., and reported in their paper, is 0.02044.

EvoChecker generated a model repair solution with cost 0.0201473 in less than 3000 evaluations.

We provide the probabilistic model template for this case study and the PCTL formulae associated with its QoS requirements.

IPv4 Zeroconf Protocol

Paper: Model Repair for Probabilistic Systems. E. Bartocci, R. Grosu, P. Katsaros, C.R. Ramakrishnan, and S. A. Smolka. In 7th Intl. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS'11), pages 326-340, 2011.

The Zeroconf protocol is used for assigning IP addresses in devices joining a network. When a new host joins the network it randomly selects an IP address among K possible ones. With m hosts in the network, the collision probability is q = m/K. A new host asks the other hosts whether the randomly selected IP address is already used and waits for an answer. The probability that the new host does not get any answer is p, in which case it repeats the query. If after n tries there is still no answer, the host will erroneously consider the chosen address as valid.

In paper Model repair for probabilistic systems, Bartocci et al. use a Max Profit algorithm to repair the model given an initial q. For n = 3, p = 0.1, and initial q = 0.6, they reduced the expected number of steps to termination from 6.15 to 5.1 setting q = 0.5002.

To replicate this in EvoChecker, we define q as an evolvable parameter (double) as follows:

evolve double q [0.5 .. 0.7]

EvoChecker found the optimal q = 0.50000964 and determined that the expected steps to termination are 5.105. EvoChecher obtained these results in less than 1000 evaluations.

We provide the probabilistic model template for this case study and the PCTL formula for the QoS requirement.

EvoChecker

Search-Based Synthesis of Probabilistic Models for Quality-of-Service Software Engineering

Simos Gerasimou, Giordano Tamburrelli, Radu Calinescu

EvoChecker

EvoChecker Modelling Language

Probabilistic Model Synthesis Problem

EvoChecker Tool

Software Systems Used to Evaluate EvoChecker

Dynamic Power Management System

Foreign Exchange Trading System

Evaluation

Research Questions

System Variants

Quality Indicators

Results

DPM results

FX results (with fixed services characteristics)

FX results (with random services characteristics)

Comparison with Model Repair

Network Virus Infection

IPv4 Zeroconf Protocol

Search-Based Synthesis of Probabilistic Models
for Quality-of-Service Software Engineering