Skip navigation.
Semantic Software Lab
Concordia University
Montréal, Canada

OpenTrace: a Workbench for Automatic Software Traceability Link Generation and Evaluation

Printer-friendly versionPrinter-friendly versionPDF versionPDF version

The OpenTrace workbench provides for automatic traceability link recovery between various types of textual artifacts of a software project. It includes a collection of customizable GATE pipelines and configurable components that allow to generate, evaluate and visualize traceability information. OpenTrace facilitates reproducibility, as it provides out-of-the-box traceability; as well as support for all-in-one packaging of tools, datasets, and configurations ready for remote distribution, download, and installation [1].

1. Overview

Software traceability involves finding relationships between pairs of artifacts, such as requirements, source code, test cases, among others. While the artifacts contribute to the planning, creation, and documentation of a project, traceability links between them support activities like reverse engineering, impact analysis, and compliance testing.

The OpenTrace workbench allows for experimenting with traceability strategies to automatically discover the relationship between artifacts. This workbench additionally supports evaluation of generated links against a gold standard as well as visualization of trace experiments for optimal trace calibration. OpenTrace also provides a process for packaging a complete trace experiment with the tool, configuration and analysed data for easy remote download and installation, facilitating the reproduction of traceability results.

2. Design

OpenTrace consists of a collection of sophisticated GATE pipelines. These pipelines are customizable and extendible because they are component based. They include standard off-the-shelf components as well as custom-build traceability ones that can be sequentially assembled and configured.

Trace generator PipelineTrace generator Pipeline
Benchmark evaluation PipelineBenchmark evaluation Pipeline
The basic Trace pipeline is responsible for analyzing software documents, recognizing artifacts and generating links. The first components perform standard pre-processing tasks, including tokenization, grouping words into sentences and assigning part-of-speech tags to words, according to their grammatical usage. The pipeline then tags stop-words and performs lemmatization, while the last three custom-built components drive the traceability process.

The Benchmark pipeline is responsible for consuming traceability links. Namely, visualizing a collection of traceability runs and comparing generated links against a given answer-set. Therefore, this functionality is provided as a separate pipeline, consisting of two, custom-build processing resources.

Traceability AnnotationsTraceability Annotations

All traceability information is represented through mark-up annotations on the analyzed documents that are read by subsequent components of a pipeline. Each annotation has a specific start and end offset and holds arbitrary trace information. Artifact annotations are assigned a unique name and type in order to act as the logical units for linking. The artifact content, on the other hand, is represented through several ArtifactToken annotations that provide powerful flexibility on how content is filtered, purified, emphasized, grouped or transformed. Link annotations represent uni-directional traceability links. They reference the source and destination artifacts with a type and a weight indicating the calculated degree of similiarity between the related artifacts.

3. Configurable Features

The Trace pipeline already comes pre-configured with default settings to run out-of-the-box, but in most cases an analyst will want to calibrate traceability for their specific needs. To do so, OpenTrace provides fine-tuning of the following:

3.1. Artifact Processing

These configurations specify how artifacts are handled. An analyst can customize the granularity (e.g., class, method, paragraph, sentence) at which documents are partitioned into artifacts through an ontology; can pre-process the textual content to remove stop-words, perform identifier splitting, inject synonyms or lematize root words; as well as they can specify the N-Gram at which words are grouped into terms used for analysis.

These options specify how links are discovered. An analyst will not always need to trace between every artifact, thus they can narrow or broaden the trace scope; can change the link directionality to perform horizontal or vertical tracing; as well as replace the underlying trace engine (e.g., VSM, LSA, or others) used for computing similarities between artifact pairs. Traces often generate many candidate links of which not all of them carry the same importance, thus irrelevant links can also be filtered by quantity and similarity thresholds.

3.3. Trace Evaluation

Traceability MatrixTraceability Matrix
This component can automatically evaluate link correctness against a reference traceability matrix. It computes standard information retrieval and traceability metrics such as selectivity, precision and recall, to benchmark traceability experiments against your own or public datasets.

3.4. Trace Visualization

This component collects global trace information and generates various types of visualizations to help users analyze and optimize their traceability experiments. This component includes visualizations such as configuration tables, precision-recall plots and artifact-linked maps.
Precision-Recall PlotPrecision-Recall Plot
Artifact-Link MapArtifact-Link Map

4. Download & Installation

OpenTrace pipelines in GATEOpenTrace pipelines in GATE To automatically download and install the OpenTrace pipelines and processing resources:

  1. Download, install and configure GATE (v7 or better).
  2. Start GATE developer and open the CREOLE Plugin Manager.
  3. Go to the "Configuration" tab and enable the "Semantic Software Lab" repository. Click "Apply All". (You might also need to specify your "User Plugin Directory", where the downloads will be stored.)
  4. Go to the "Available to install" tab and select the OpenTrace item. Click "Apply All" to download it.
  5. Go to the "Installed Plugins" tab and either check "Load Now" or "Load Always". Click "Apply All".
  6. Load the Linker and Benchmark pipelines as shown in the screenshot.

You can also download the install package manually (but the recommended method of installation is to use the GATE Plugin Manager as described above).

5. Demo

For more details, please refer to the OpenTrace documentation or view the following tutorial demos:

6. Change Log

Update v1.0 release (15.10.2012) includes:

  • Trace Visualization PR
  • JAVA 1.5 source-code document processing support
  • Custom artifact granularity support
  • Import dataset conversion scripts

Initial v0.1 release (31.07.2012) includes:

  • Ready-made Trace & Benchmark pipelines
  • Sample Mini dataset
  • OpenTrace Plug-in with Artifact Recogniser, Artifact Linker & Trace Evaluator PRs

7. License

The OpenTrace system, including its components and resources, is published under the GNU Affero General Public License v3 (AGPL3).

8. Feedback

For questions, comments, etc., please use our Forum.


  1. Angius, E., and R. Witte, "OpenTrace: an Open Source Workbench for Automatic Software Traceability Link Recovery", 19th Working Conference on Reverse Engineering (WCRE 2012), Kingston, Ontario, Canada : IEEE Computer Society, pp. 507–508, October 15-18, 2012.