Abstract
I. Introduction
II. Background
III. General Framework
IV. Activity Extraction From the SVN Log Based on Semantic Features
V. Dynamic Incremental Method of Event-Activity Mapping
Authors
Figures
References
Abstract
The abundance of event data in current software configuration management systems makes it possible to discover software process models automatically by using actual observed behavior. However, traditional process mining algorithms cannot be applied to event logs recorded in software configuration management (SCM) systems, such as SVN, because of missing activity attributes. To address this problem, a software process activity classifier is proposed to build event-activity mapping relationships from software development event streams, revealing activity attributes and associating the activity to the original SVN log. The proposed approach extracts activity from the SVN log based on semantic features and introduces a novel technique based on a naive Bayes approach to associate event activities dynamically. The approach has been applied to two real-world software development process logs, ArgoUML and jEdit, consisting of more than 80,000 events, covering development information from 1998 to 2015. With the application of our approach to such data, activities can be extracted from event logs and a classifier can be constructed for adding activity attributes to new events. The results of the classification are evaluated in terms of precision rate, recall rate, and the F-measure. Overall, two real-world software development process logs are used to validate the method, and the experimental results show that the approach can mine software process activities from SVN log events automatically and in real-time.
Introduction
Nowadays, it is widely accepted that the quality of software is not only related to the product, but to the organization and to the production process that is carried out [1]–[3]. Software process modeling helps to create process descriptions that correspond to processes actually performed during software development or maintenance. Process models can be used to visualize tacit knowledge, roles, and information flows in the processes, identifying points for improvement and optimization [4], [5]. However, with the deepening of research on software development, problems associated with traditional subjective modeling methods have become apparent. These arise because the task of designing a software process model is complex and error prone, and because the life cycle of the model is short, individuals are not sensitive to differences between actual processes and the process model, and there are increasing requirements for process engineers, all while the software process is still evolving [6]–[8]. As software systems become more and more complex, the establishment of a sound process model is becoming more like ‘‘an art rather than a science’’ [9].