In the context of black-box testing, generating test cases through model mutation is known to produce powerful test suites but usually has the drawback of being prohibitively expensive. This paper presents a new version of the tool MoMuT::UML, which implements a scalable version of mutation-driven test case generation (MDTCG). It is able to handle industrial-sized UML models comprising networks of, e.g., 2800 interacting state machines. To achieve the required scalability, the implemented algorithm exploits the concurrency in MDTCG and combines it with a search based generation strategy. For evaluation, we use seven case studies of different application domains with an increasing level of difficulty, stopping at a model of a railway station in Austria’s national rail network.
For reactive systems, especially in the safety critical domain and in contexts where updates are expensive, proper testing is mandatory. In the former domain, safety standards actually require certain forms of testing to be conducted and some standards highly recommend the use of formal methods and model-based testing. Mutation-driven test case generation (MDTCG), which is a form of fault-based testing , could help in satisfying the requirements posed by the standards – if it worked well enough with industrial sized models.
In this work, we address pushing out the boundaries of what can be done with MDTCG by addressing performance issues caused by the specific nature of the model-based mutation testing problem. To do so, we take advantage of the locality of mutations for the exploration itself and use mutation based heuristics for guiding a search based approach over mutant schemata. As a benchmark we use seven industrial sized UML models of different domains. Notice that we could not handle most of these with our previous versions of MoMuT::UML 1 . The models were built either by industry engineers at partner companies or in close cooperation with them. The smallest model describes the behaviour of a typical car alarm system, while the largest is a model of a mid-sized railway station in the national rail network of Austria. The models have different characteristics: some favour symbolic approaches, some favour enumerative ones, some are highly concurrent, some are serialized, some use discrete time, while others do not.
Table 1 provides an overview of the three MoMuT::UML generations and their differences. The first generation appeared around 2010 and was based on a Prolog-based enumerative input-output conformance (ioco) checker, called Ulysses, that was paired with a separate UML mutation engine. Generation one proved powerful enough to handle simple to slightly complex models . The second MoMuT::UML generation changed the exploration approach from interpreted enumeration to SMT-solver-based exploration. Although the SMTbased back end proved to be more efficient than the first generation engine and made a big performance impact with a particular type of model, its lack of list-support and object variables meant it could not handle the models we were ultimately aiming for. That said, we applied it successfully to a use-case that is also re-presented in this paper. The article  provides the following summary of our findings when applying the second generation MoMuT::UML:
The figures on the computation time prove that we spend more than 60% of the total TCG time checking equivalent model mutants. . . . The data also demonstrates that our mutation engine needs to be improved to generate more meaningful mutants as . . . 80% of the model mutants are covered by test cases generated for 2% of them.
Back then, the total TCG time was in excess of 44 hours for test cases with a depth of 25 interactions generated from an UML model comprising one instance, and 64 attributes. Our goal was to handle models in excess of 2000 instances, 3000 attributes, and to produce vastly deeper test cases in less time. So, for MoMuT::UML 3.0 we switched from formal conformance checking to a searchbased exploration strategy with short-lived mutants as announced in :
. . . the authors attempt to combine directed random exploration and mutation based test-case generation on-the-fly. The idea is that during the exploration of the system, mutations which are close to the current
state are dynamically inserted and conformance is checked. . . .
The contributions of this paper are threefold. First, we describe the techniques used in the latest version of MoMuT::UML that allow us to scale up MDTCG to industry relevant model sizes. Second we present four case studies taken from industry the first time and, third, we evaluate our approach according to the following research questions. RQ1: Can we apply MDTCG to industrialsized models at a cost (time, resources) the user would be willing to accept? RQ2: Do we retain enough fault coverage for the tool to be useful? RQ3: Given our two different guidance heuristics, are both equally suitable?
The paper is organized as follows. Section 2 introduces the mutation-based test case generation problem, mentions the main difficulties associated with it, provides a list of mutation operators used for the presented experiments, and gives a brief introduction of the Object-Oriented Action Systems modelling language. Section 3 presents MoMuT::UML’s main system architecture, discusses the mutation engine in detail, and introduces the short-lived-mutant algorithm. Section 4 then presents the seven case studies and is followed by an evaluation in Section 5. The paper closes with a discussion of related work in Section 6 and concludes in Section 7.
2 Generating Tests from Model Mutants
Mutation-driven test case generation strives to automatically produce test cases that are able to detect faulty (”mutated”) versions of a given specification (”model”). Research has shown that this methodology is powerful and can subsume other coverage criteria, like condition coverage , given the right set of mutation operators. Besides requiring a model, the main drawback of MDTCG is the computational complexity involved in finding test data that is able to reveal a faulty model (”mutant”). This challenge is easily cast as a model-checking problem, however, using model-checkers for test generation has its own issues , e.g., with large and non-deterministic models.
Besides the general computational complexity of the underlying problem, another issue that further increases the computational cost is that of equivalent mutants. Equivalent mutants are mutants that do not show any difference in behaviour. In other words, they are observationally equivalent to the original model. These mutants, which cannot lead to a test case, carry the maximum penalty for TCG approaches based on exhaustive exploration. For example, in our previous MoMuT::UML version, we spent 60% of the overall computation time in checking equivalent mutants.