Testing Terminology

Defects or faults are flaws in software that may potentially cause failures.
Failures are the observed events during program execution that represent unacceptable deviations from required behaviours.
Latent defects are defects that remain in delivered software for a substantial period, often because testing has been unable to expose these defects as observable failures.

The mechanism for deciding whether the execution of a given test case produces the expected behaviour or not is called the test oracle.

When the expected behaviour is to produce a precisely specified output file, then the test oracle can be very simple.
- In unit testing, the developer of an individual module can often create a test driver that performs simple tests and is able to function as a test oracle as well.
- In regression testing, one compares outputs of a program under test with the previously recorded outputs of a prior version of that program. In this case, a file difference utility may serve well as a test oracle.
- Python's doctest provides a particularly elegant example of how simple unit test examples can serve both as program documentation and as a tool for automated regression testing.
Test oracles may more complicated when certain aspects of output are left unspecified.
- White space and formatting of output files may not be significant.
- The exact nature of warning or other informational messages may be unspecified.
- The order in which outputs are produced and the time to produce output may be unspecified.
In some cases, the test oracle is a human being.
- This makes for a an expensive and difficult-to-repeat testing process.
- Often a necessary step in validating that software is truly acceptable to the end user (acceptance testing or validation).

How do you know when to stop testing?

In other words, when is a particular test suite (set of test cases) adequate:

The test suite contains sufficiently many test cases to address each of the desired program properties.
Absence of failures in applying the test suite is considered to mean acceptance of the software for delivery.

Two common answers in industry are both inadequate from the point of view of a reliable and predictable software process:

White-box adequacy criteria might refer to percentage coverage of program statements, branches or paths.

Unfortunately, good defensive programming often yields branches that are difficult to cover.

Black box adequancy criteria might refer to thoroughness of coverage of input specifications.

Mutation-adequacy criteria can provide an independent assessment of test data adequacy.

Create N source code mutants of the program under test, each differing from that program by a single insertion, deletion or replacement of code (mutation).
Apply the test suite to each of the mutants.
Determine the number M of mutants that fail at least one of the test cases.
The ratio M/N can be used as a measure of test data adequacy.
Note that there may be benign mutations that do not create faults that lead to observable failures.

Updated Sun Sept. 10 2023, 10:41 by cameron.

Simon Fraser University
Engaging the World