OpenTrace: a Workbench for Automatic Software Traceability Link Generation and Evaluation
The OpenTrace workbench provides for automatic traceability link recovery between various types of textual artifacts of a software project. It includes a collection of customizable GATE pipelines and configurable components that allow to generate, evaluate and visualize traceability information. OpenTrace facilitates reproducibility, as it provides out-of-the-box traceability; as well as support for all-in-one packaging of tools, datasets, and configurations ready for remote distribution, download, and installation .
Software traceability involves finding relationships between pairs of artifacts, such as requirements, source code, test cases, among others. While the artifacts contribute to the planning, creation, and documentation of a project, traceability links between them support activities like reverse engineering, impact analysis, and compliance testing.
The OpenTrace workbench allows for experimenting with traceability strategies to automatically discover the relationship between artifacts. This workbench additionally supports evaluation of generated links against a gold standard as well as visualization of trace experiments for optimal trace calibration. OpenTrace also provides a process for packaging a complete trace experiment with the tool, configuration and analysed data for easy remote download and installation, facilitating the reproduction of traceability results.
OpenTrace consists of a collection of sophisticated GATE pipelines. These pipelines are customizable and extendible because they are component based. They include standard off-the-shelf components as well as custom-build traceability ones that can be sequentially assembled and configured.
All traceability information is represented through mark-up annotations on the analyzed documents that are read by subsequent components of a pipeline. Each annotation has a specific start and end offset and holds arbitrary trace information. Artifact annotations are assigned a unique name and type in order to act as the logical units for linking. The artifact content, on the other hand, is represented through several ArtifactToken annotations that provide powerful flexibility on how content is filtered, purified, emphasized, grouped or transformed. Link annotations represent uni-directional traceability links. They reference the source and destination artifacts with a type and a weight indicating the calculated degree of similiarity between the related artifacts.
3. Configurable Features
The Trace pipeline already comes pre-configured with default settings to run out-of-the-box, but in most cases an analyst will want to calibrate traceability for their specific needs. To do so, OpenTrace provides fine-tuning of the following:
3.1. Artifact Processing
These configurations specify how artifacts are handled. An analyst can customize the granularity (e.g., class, method, paragraph, sentence) at which documents are partitioned into artifacts through an ontology; can pre-process the textual content to remove stop-words, perform identifier splitting, inject synonyms or lematize root words; as well as they can specify the N-Gram at which words are grouped into terms used for analysis.
3.2. Link Recovery
These options specify how links are discovered. An analyst will not always need to trace between every artifact, thus they can narrow or broaden the trace scope; can change the link directionality to perform horizontal or vertical tracing; as well as replace the underlying trace engine (e.g., VSM, LSA, or others) used for computing similarities between artifact pairs. Traces often generate many candidate links of which not all of them carry the same importance, thus irrelevant links can also be filtered by quantity and similarity thresholds.
3.3. Trace Evaluation
This component can automatically evaluate link correctness against a reference traceability matrix. It computes standard information retrieval and traceability metrics such as selectivity, precision and recall, to benchmark traceability experiments against your own or public datasets.
3.4. Trace Visualization
This component collects global trace information and generates various types of visualizations to help users analyze and optimize their traceability experiments. This component includes visualizations such as configuration tables, precision-recall plots and artifact-linked maps.
4. Download & Installation
- Download, install and configure GATE (v7 or better).
- Start GATE developer and open the CREOLE Plugin Manager.
- Go to the "Configuration" tab and enable the "Semantic Software Lab" repository. Click "Apply All". (You might also need to specify your "User Plugin Directory", where the downloads will be stored.)
- Go to the "Available to install" tab and select the OpenTrace item. Click "Apply All" to download it.
- Go to the "Installed Plugins" tab and either check "Load Now" or "Load Always". Click "Apply All".
- Load the Linker and Benchmark pipelines as shown in the screenshot.
You can also download the install package creole.zip manually (but the recommended method of installation is to use the GATE Plugin Manager as described above).
For more details, please refer to the OpenTrace documentation or view the following tutorial demos:
6. Change Log
Update v1.0 release (15.10.2012) includes:
- Trace Visualization PR
- JAVA 1.5 source-code document processing support
- Custom artifact granularity support
- Import dataset conversion scripts
Initial v0.1 release (31.07.2012) includes:
- Ready-made Trace & Benchmark pipelines
- Sample Mini dataset
- OpenTrace Plug-in with Artifact Recogniser, Artifact Linker & Trace Evaluator PRs
The OpenTrace system, including its components and resources, is published under the GNU Affero General Public License v3 (AGPL3).
For questions, comments, etc., please use our Forum.