Search-Based Synthesis of Probabilistic Models for Quality-of-Service Software Engineering
            
            
        The formal verification of finite-state probabilistic systems supports the engineering of software with strict quality-of-service (QoS) requirements. 
        However, its use in software design is currently a tedious process of manual multiobjective optimisation. Software designers must build and verify 
        probabilistic models for numerous alternative architectures and instantiations of the system parameters. When successful, they end up with feasible but 
        often suboptimal models. The 
EvoChecker search-based software engineering approach and 
tool introduced in our paper employ 
        multiobjective optimisation genetic algorithms to automate this process and considerably improve its outcome. We evaluate 
        
EvoChecker for 
six variants of two software systems from the domains of 
dynamic power 
        management and 
foreign exchange trading. These 
        systems are characterised by different types of design parameters and QoS requirements, and their design spaces comprise between 2E+14 and 7.22E+86 
        relevant alternative designs. Our results provide strong evidence that 
EvoChecker significantly outperforms the current practice 
        and yields actionable insights for software designers. 
                    
            
The main contributions of this work are:
            
    
            
            Our paper has been accepted at the 30th Intl. Conference on Automated Software Engineering (ASE 2015)
            
            We have also made available the camera-ready version of our paper.
        
            
            
            
			
    	 
        
    		Software Systems Used to Evaluate EvoChecker
            
            
            Dynamic Power Management System
            
                        
             A 
software-controlled dynamic power management (DPM) 
            system 
            consisting of a 
service provider that handles requests generated by a 
service requester and stored in two 
request queues of 
            different priorities. The 
service provider has four states associated with  different power usage, i.e., 
busy, idle, standby, sleep. 
            The figure above depicts the power usage of each state (in watts), the possible transitions between states, and the energy consumed by each 
            transitions (in joules). The task of a software power manager is to reduce power use while maintaining an acceptable service level for the system.
            
            A DPM system designer must select:
            
                - the capacity of the request queues, QmaxH and QmaxL
- one of several alternative power managers
- the parameters associated with the selected power manager
            
            such that the following QoS requirements are satisfied:
            
            
                
                    | ID | Description | Type | 
              
                | R1 | The steady-state utilisation of the high-priority queue should be less than 90% | Constraint | 
              
                | R2 | The steady-state utilisation of the low-priority queue should be less than 90% | Constraint | 
              
                | R3 | The system should operate with minimum steady-state power utilisation | Objective | 
              
                | R4 | The number of messages lost at the steady state should be minimised | Objective | 
              
                | R5 | The capacity of both queues should minimised | Objective | 
            
            
            
            We consider 
 three different variants of the DPM system. We provide the 
DPM probabilistic model 
            templates and the 
CSL formulae associated with the QoS requirements for these DPM system variants.
            
            
            
            
            
            Foreign Exchange Trading System
            
            
            An FX customer (called a trader) can use the system in two operation modes. In the 
expert  mode, FX executes a loop that analyses market 
            activity, identifies patterns that satisfy the trader's objectives, and automatically carries out trades. Thus, the 
Market watch service 
            extracts real-time exchange rates (bid/ask price) of selected currency pairs. This data is used by a 
Technical analysisi> service that 
            evaluates the current trading conditions, predicts future price movement, and decides if the trader's objectives are: (i) "satisfied" (causing the 
            invocation of an 
Order service to carry out a trade); (ii) "unsatisfied" (resulting in a new 
Market watch invocation); or 
            (iii) "unsatisfied with high variance" (triggering an 
Alarm service invocation to notify the trader about discrepancies/opportunities not 
            covered by the trading objectives). In the 
normal mode, FX assesses the economic outlook of a country using a 
Fundamental analysis 
            service that collects, analyses and evaluates information such as news reports, economic data and political events, and provides an assessment on 
            the country's currency. If satisfied with this assessment, the trader can use the 
Order service to sell or buy currency, in which case a 
            
Notification service confirms the completion of the trade.
            
            The FX designer has to select third-party implementations for each of the 
n ≥ 1 services from the workflow above for which in-house 
            implementations are not available, in order to meet the following QoS requirements:
            
            
            
                
                    | ID | Description | Type | 
              
                | R1 | Workflow executions must complete successfully with probability at least 0.9 | Constraint | 
              
                | R2 | The total service response time per workflow execution should be minimised | Objective | 
              
                | R3 | The probability of a service failure during a workflow execution should be minimised | Objective | 
              
                | R4 | The total cost of the third-party services used by a workflow execution should be minimised | Objective | 
            
            
            
            
            We consider 
three different variants of the FX system. We provide the 
FX probabilistic model 
            templates and the 
PCTL formulae associated with the QoS requirements for these FX system variants.
            
            We would like to thank our anonymous industrial collaborator for providing the model and QoS requirements for the FX system.
            
            
            
    	
         
        
		
    		Evaluation
            
            Research Questions
            We evaluate the effectiveness of EvoChecker by answering the following research questions:
            
                
                    | RQ1 (Validation): | How does EvoChecker perform compared to random search? | 
                
                    | RQ2 (Comparison): | How do EvoChecker instances using different multiobjective optimisation genetic algorithms (NSGA-II, SPEA2, MOCell) perform compared to each other? | 
                
                    | RQ3 (Insights): | Can EvoChecker provide insights into the trade-offs between the QoS attributes of alternative software 
            architectures and configurations? | 
            
            
            
            
            
            System Variants
            We evaluate EvoChecker using the system variants from the table below. 
            Note: The last two columns in this table contain links to the probabilistic model template T and the probabilistic temporal logic 
            formulae for the QoS attributes of these system variants.
            
            
                
                    | Variant | Description | Size* | T ** | QoS attributes | 
              
                | DPM_Small | QmaxH,L ∈ {1, ..., 10}, m = 2 | 2E+14 | TDPMsmall | CSLDPMsmall | 
              
                | DPM_Medium | QmaxH,L ∈ {1, ..., 15}, m = 2 | 4.5E+14 | TDPMmedium | CSLDPMmedium | 
              
                | DPM_Large | QmaxH,L ∈ {1, ..., 20}, m = 2 | 8E+14 | TDPMlarge | CSLDPMlarge | 
              
                | FX_Small | n = 4, n1 = n2 = n3 = n4= 3 | 4.98E+31 | TFXsmall | PCTLFXsmall | 
              
                | FX_Medium | n = 6, n1 = n2 = ... = n6 = 4 | 1.39E+65 | TFXmedium | PCTLFXmedium | 
              
                | FX_Large | n = 8, n1 = n2 = ... = n8 = 4 | 7.222E+86 | TFXlarge | PCTLFXlarge | 
            
            *       Assuming two-decimal precision for double-valued parameters of the probabilistic model templates 
            **    Probabilistic Model Template 
            
            
            
            
            Quality Indicators
            We evaluate the multiobjective optimisation algorithms used in EvoChecker (NSGA-II, SPEA2, MOCell, Random) using the established Pareto-front 
            quality indicators below.
            
                - 
                    Unary additive epsilon (Iε),
                    which measures the minimum additive term by which the elements of the objective vectors from 
                    a Pareto front approximation must be adjusted so as to dominate the objective vectors from the reference front. Smaller values denote Pareto 
                    front approximations with better convergence.
                
- 
                    Hypervolume (IHV),
                    which measures the volume in the objective space covered by a Pareto front approximation with respect to the reference front. Larger values 
                    denote Pareto front approximations with better convergence and diversity.
                
- 
                    Inverted Generational Distance (IIGD),
                    which provides an "error measure" as the Euclidean distance in the objective space between the reference front and the Pareto front 
                    approximation. Smaller values denote Pareto front approximations with better convergence and diversity.
                
Results
            DPM results
            The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the DPM system variants, and the 
            Figure (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.
            
            
                
                    
                        
                            | Variant | NSGA-II | SPEA2 | MOCell | Random |  | 
                        
                            |  | Iε (Epsilon) | 
                        
                            | DPM_Small | 0.0209 | 0.0130 | 0.0242 | 0.1403 | + | 
                        
                            | DPM_Medium | 0.0225 | 0.0123 | 0.0489 | 0.1996 | + | 
                        
                            | DPM_Large | 0.0229 | 0.0147 | 0.0884 | 0.2497 | + | 
 
                        |  | 
                        
                            |  | IHV (Hypervolume) | 
                        
                            | DPM_Small | 0.4455 | 0.4458 | 0.4396 | 0.4022 | + | 
                        
                            | DPM_Medium | 0.4487 | 0.4499 | 0.4386 | 0.3946 | + | 
                        
                            | DPM_Large | 0.4528 | 0.4549 | 0.4395 | 0.3947 | + | 
                        |  | 
                        
                            | IIGD (Inverted Generational Distance) | 
                        
                            | DPM_Small | 23E-4 | 18E-4 | 16E-4 | 62E-5 | + | 
                        
                            | DPM_Medium | 24E-5 | 19E-5 | 28E-5 | 91E-5 | + | 
                        
                            | DPM_Large | 24E-5 | 20E-5 | 38E-5 | 11E-5 | + | 
 
                    
                 
                
             
            
             
            
             
            
            
            
            FX results (with fixed services characteristics)
            The Table (left) reports the mean quality indicator values, the Figure (right) depicts the boxplots for the FX system variants, and the Figure 
            (bottom) presents a visualisation of the Pareto front approximations achieved by EvoChecker.
            
            
                
                    
                        
                            | Variant | NSGA-II | SPEA2 | MOCell | Random |  | 
                        
                            |  | Iε (Epsilon) | 
                        
                            | FX_Small | 0.6258 | 0.5083 | 0.6745 | 2.2274 | + | 
                        
                            | FX_Medium | 1.6379 | 2.0105 | 2.0486 | 6.1529 | + | 
                        
                            | FX_Large | 3.8528 | 5.2777 | 4.6366 | 13.023 | + | 
 
                        |  | 
                        
                            |  | IHV (Hypervolume) | 
                        
                            | FX_Small | 0.611 | 0.628 | 0.608 | 0.593 | + | 
                        
                            | FX_Medium | 0.719 | 0.725 | 0.702 | 0.606 | + | 
                        
                            | FX_Large | 0.657 | 0.675 | 0.633 | 0.555 | + | 
                        |  | 
                        
                            | IIGD (Inverted Generational Distance) | 
                        
                            | FX_Small | 0.00123 | 0.00129 | 0.00125 | 0.00145 | + | 
                        
                            | FX_Medium | 0.00192 | 0.00207 | 0.00200 | 0.00316 | + | 
                        
                            | FX_Large | 0.00244 | 0.00255 | 0.00272 | 0.00395 | + | 
 
                    
                 
                
             
             
            
             
            
            
                
            
            
            
            FX results (with random services characteristics)
            
            We carried out a second study for the FX system, where the characteristics of each concrete service (i.e., cost, success probability, response time) 
            per 
            experiment were randomly generated. With this study, we examine whether 
EvoChecker is capable of synthesising good probabilistic 
            model sets irrespective of the characteristics of the considered problem. We generated 30 random experiments per system variant and performed 10 
            independent runs per experiment per system variant (i.e., a total of 300 runs per system variant and 900 runs in total). To allow for a fair 
            comparison across the experiments and to avoid undesired scaling effects, we normalise the results obtained for each quality indicator per 
            experiment within the range [0,1]. 
            
            The Table (left) reports the mean quality indicator values, and the Figure (right) depicts the boxplots for the FX system variants.
            
            
                
                    
                        
                            | Variant | NSGA-II | SPEA2 | MOCell | Random |  | 
                        
                            |  | Iε (Epsilon) | 
                        
                            | FX_Small | 0.2212 | 0.2009 | 0.2272 | 0.6200 | + | 
                        
                            | FX_Medium | 0.3393 | 0.3664 | 0.3645 | 0.7568 | + | 
                        
                            | FX_Large | 0.3396 | 0.3764 | 0.3625 | 0.7970 | + | 
 
                        |  | 
                        
                            |  | IHV (Hypervolume) | 
                        
                            | FX_Small | 0.9374 | 0.9914 | 0.9337 | 0.9016 | + | 
                        
                            | FX_Medium | 0.9514 | 0.9848 | 0.9219 | 0.8138 | + | 
                        
                            | FX_Large | 0.9467 | 0.9804 | 0.8962 | 0.7868 | + | 
                        |  | 
                        
                            | IIGD (Inverted Generational Distance) | 
                        
                            | FX_Small | 0.6365 | 0.5348 | 0.6390 | 0.8000 | + | 
                        
                            | FX_Medium | 0.5919 | 0.5790 | 0.6114 | 0.7957 | + | 
                        
                            | FX_Large | 0.5887 | 0.5622 | 0.6561 | 0.8884 | + | 
 
                    
                 
                
             
            
            
            
            
            Comparison with Model Repair
            Network Virus Infection
            Paper: 
Model Repair for Markov Decision Processes.
            T. Chen, E. Hahn, T. Han, M. Kwiatkowska, H. Qu, and L. Zhang. In 7th Intl. Symposium on Theoretical Aspects of Software Engineering (TASE'13), 
            pages 85–92, 2013.
            
            In this case study we established that EvoChecker can replicate/outperform the result obtained in  the paper 
Model Repair for Markov Decision Processes by
            Chen et al. This work performed model repair on the 
 Network Virus Infection case study which considers a parametric 
            model of a computer virus infecting a network. 
            The network is a grid of N by N nodes, with each node connected to four neighbours (the nodes above, below, to the left and to the right), 
            except for the nodes on the border, for which some of the neighbours are not present. The model is an MDP that represents the situation where the 
            virus spawns/multiplies. More concretely, once a node is infected, the virus remains at that node and repeatedly tries to infect uninfected 
            neighbouring nodes. The goal is to repair the model such that the minimal expected number of attacks required by the virus until an infection of the 
            node (1, 1) starting at (N,N). The repairing action should have a minimum cost (i.e., a minimal modification to the original model).
            
            The cost for the model repair solution obtained by Chen et al., and reported in their paper, is 0.02044.
            
            EvoChecker generated a model repair solution with cost 0.0201473 in less than 3000 evaluations.
            
            We provide the 
probabilistic model template for this case study and the 
PCTL 
            formulae associated with its QoS requirements.
            
            
            IPv4 Zeroconf Protocol
            Paper: 
Model Repair for Probabilistic Systems. 
            E. Bartocci, R. Grosu, P. Katsaros, C.R. Ramakrishnan, and S. A. Smolka. In 7th Intl. Conf. on Tools and Algorithms for the Construction and 
            Analysis of Systems (TACAS'11), pages 326-340, 2011.
            
            The 
Zeroconf protocol is used for assigning IP addresses in 
            devices joining a network. When 
            a new host joins the network it randomly selects an IP address among K possible ones. With m hosts in the network, the collision probability is 
            
q = m/K. A new host asks the other hosts whether the randomly selected IP address is already used and waits for an answer. The probability 
            that the new host does not get any answer is 
p, in which case it repeats the query. If after 
n tries there is still no answer, the 
            host will erroneously consider the chosen address as valid.
            
            In paper 
Model repair for probabilistic systems, Bartocci et al. use a Max 
            Profit algorithm to repair the model given an initial 
q. For n = 3, p = 0.1, and initial q = 0.6, they reduced the expected number of 
            steps to termination from 6.15 to 5.1 setting q = 0.5002.
            
            
            To replicate this in EvoChecker, we define 
q as an evolvable parameter (double) as follows:
            
            
                evolve double q [0.5 .. 0.7]
            
            
            EvoChecker found the optimal q = 0.50000964 and determined that the expected steps to termination are 5.105. EvoChecher obtained these results in 
            less than 1000 evaluations.
            
            We provide the 
probabilistic model template for this case study and the 
            
PCTL formula for the QoS requirement.