Software Testing and Reliability¶

Xiaoyuan Xie 谢晓园

xxie@whu.edu.cn

计算机学院 E301

Reference Course: CS 6340: Software Analysis and Testing

I Preparation¶

Regulations
References
- The Art of Software Testing, Second Edition
- Software Reliability: Principles & Practices
课程内容：
- 阅读论文、做报告、代码复现
- 没有考试
- 成绩：论文报告（40%） + 大报告

II Basic Concept¶

graph LR
A(Fault) --> B(Error) --> C(Failure)

1 Bug Classification:

Software Fault: A static defect in the software
Software Failure: External, incorrect behavior with respect to the requirements or other description of the expected behavior (e.g., crash failure)
Software Error: internal, An incorrect internal state that is the manifestation of some fault

2 Static Testing -> fault:

Verification only
No execution of code
Find defects
Static analysis (automated)
Review (manual examinationm tool supprts)
Analyze code (e.g., control flow, data flow), and generated output
Typical use cases for static analysis:
- syntax checking
- code smell detection
- coding standards compliance
Typical defects found by static analysis:
- syntax violations
- unreachable code
- overly complicated constructs

3 Dynamic Testing -> failure:

Execution of code
Find failures

4 Continuous Integration:

Automated activities
- execute static code analysis
- execute unit tests, and check code coverage
- deploy to test environment
- execute integration tests
Benefits
- earlier defection and analysis of conflicting changes
- regular feedback on whether the code is working
- no big-bang integration
- reduces repetitive manual testing activities
- ...

III Control Flow Coverage¶

1 Control-flow Coverage

Statement coverage
Branch coverage
- Imply statement coverage
Condition coverage
Decision-condition coverage
- Branch coverage + Condition coverage
Multiple-condition coverage
Path coverage
- Execute every path at least once
- Imply all the coverage criteria
- Problem
  - When the software contains loops, it has infinite number of paths - then 100% path coverage becomes infeasible
  - Solution:
    - Group paths in to equivalence classes 路径等价类划分
    - 独立路径测试

2 Equivalence classes of paths 路径等价类

Two paths are considered equivalent if they differ only in the number of iterations
Two classes
- Zero
- At least one loop

3 Primary path coverage

主路径：包含不同（distinct）判定节点最多的路径，不唯一

4 独立路径测试

思路：寻找代表性路径集合，抽取出一组线性无关的独立路径（向量基）
流程：
- 确定主路径
- 以主路径为起点，逐条加入独立路径
  - 至少能引入一条新语句或新判定分支的路径
- 上限是 CFG NcCable 圈复杂度（CC）
独立路径集合中的路径满足线性无关
- \(Path_i=<x_1, x_2, ..., x_n>\)，路径 \(P_i\) 对边 \(e_j\) 的访问次数构成一个向量，向量间是线性无关的

5 How to generate a test case that executes an item?

It is a search problem
- Normally, there are more that one paths that will reach the item
The task is to solve ONE of the path conditions related to the item
- Random approach
- Search algorithms, such as the genetic algorithm
- Constraint solver

6 How do we know that an item has been executed?

By program instrumentation 动态、自动地收集程序执行的信息
- Insert some printing statements to trace the program execution
- Application:
  - To measure the thoroughness of a test
  - To control test case execution
  - To symbolic execution
  - To program analysis
Can be done either after or in parallel with compilation

7 Subdomain and Function

Canonical Representation of a program \({(D_1, f_1), (D_2, f_2), \dots, (D_i, f_i), \dots}\), where \(D_i\) is the i-th subdomain; \(f_i\) is the computation function of \(D_i\)
Example:
- INPUT A; IF A > 25 THEN B = 3A + 1; ELSE B = 4A - 3; OUTPUT B;
- D1: A>25; f1: 3A+1; D2: A<=25; f2: 4A-3

IV Data Flow Coverage¶

1 Data-flow Coverage

Generate test cases according to the pattern of data usage inside the software
Three types of data usage
- Define: def
  - The variable is given a value, for example, X=5
- Predicate use: p-use
  - The variable's value is used to decide the TRUE or FALSE outcome of a predicate
- Computational use: c-use
  - The variable's value is used for computation
One pattern of data usage
- Define a variable, and the use its value (最近的一个 defination)

2 Def-Use pair coverage critgerion

Aims to cover all pairs of def-use for all variables
Steps:
- Find all the def-use pairs for all variables
- Generate test cases to execute each of these pairs at least once

3 More data-flow coverage criteria

rarely used

V Program Analysis (Data Flow Analysis)¶

1 Data Flow Analysis

Not the same as the data flow coverage testing
A analysis method for detecting improper usage of variables in program
Three possible actions
- Define: d
  - The variable is assigned a value
- Reference: r
  - The variable's value is referred
- Undefined: u
  - The variable is declared but not yet assigned any value
  - The variable's value is destroyed
  - The variable'svalue goes out of the scope
Data Flow Anomalies
- undefine-reference (ur anomaly)
- define-define (dd anomaly)
- define-undefine (du anomaly)
  - A variable's value has been defined but not used before the value is destroyed

Anomaly: Not fault, failure, or error. Maybe it just a typo.

coverage tools: gcov, llvm-cov

2 Approaches

Static analysis
- Scan throuth and analyze
- Tools: DVAE (FORTRAN), Lint ©
- Automation != Execution of the program under analysis
- The program size is small but the execution time is long
- Many false alarm
Dynamic analysis
- executing an instrumented version of the program
  - Gather some run time information
  - Keep track of the actions on each variable
  - Change the state of the right variable at the right place
- Identify anomalies by keeping track of state transitions
- Finite State Automata
  - 用状态记录上一步的动作

3 Types of State:

U: Undefined
D: Defined
A: Anomaly
R: Reference

4 State Variable

For each variable under analysis, we need to create another variable to store the state of it

VI Program Analysis (Program Slicing)¶

Types of slices

Backward static slice
Executable slice
Forward static slice
Execution slice
Generic algorithm for static slice

A static program slice S consists of all statements in program P that may affect the value of variable v in a statement x.

A dynamic slice of a program with respect to an input value of a variable v at a program point p for a particular execution e of the program is the set of all statements in the program that affect the value of v at p.

Levels of slices

Intraprocedural
Interprocedural

[!quote] Slices are "The mental abstraction people make when they are debugging a program." [Weiser]

Applications

Dubugging
Program Comprehensiion
Reerse Engineering
Program Testing
Measuring Program - metrics
- Coverage, Overlap, Clustering
Refactoring

Backward static slice

A program with respect to a program point p and set of program variables V consists of all statements and predicates in the program that may affect the value of variables in V at p.

Execution Slice

An execution slice of a program with respect to an input value of a variable v is the set of statements in the program that are executed with input v.

Methods for Computing Slices

Data flow on the flow graph
Reachability in a dependency graph

VII Black Box Testing¶

等价类划分（Equivalence Partitioning Testing）
- 等价类覆盖
- 有效等价类覆盖：尽可能多地覆盖所有尚未覆盖的有效等价类
- 无效等价类覆盖：设计测试用例，使其只覆盖一个无效等价类
- 组合测试
- Pairwise Testing
边界值分析法（Boundary Value Analysis）
- 边界组合方式
- 强边界法
- 弱边界法
- 全边界法

VIII Mutation Testing¶

[!definition] Mutation Testing is a method of inserting faults into programs to test whether the tests pick them up, thereby validating or invalidating the tests.

PIT: A state of art mutation testing system.

Keywords: syntactic errors, mutants, quality of tests

[!tip] We can use mutation testing to evaluate the test suite generation techniques.

Mutant Equivalence:

There may surviving mutants that cannot be killed. -> indistinguishable
They have to be checked 'by hand'

IX Reliability¶

[!definition] It is the probability of failure-free operation of a computer program for a sepcified time in a specified environment.

e.g. A system has a reliability of 0.96 for 12 hours when used by the average user.

Error Seeding Model:

Assumption - original indigenous and seeded errors are of the same change of being detected.

X Advances in Software Testing (Part I)¶

This section includes software testing and fault localization.

Software Testing¶

Challenge 1 - Input Problem¶

Random Testing - the most basic solution, Efficient but not effective.
1. What: Randomly sample in the input domain
2. Tools: Randoop, Jcheck, and Yeti-TEST
3. Problem:
  1. Complex data structure?
  2. Memory/Time constrains? Not effective enough, and the probability of reaching an error can be astronomically less.
  3. How to check the output?
Symbolic Execution - Static, Effective but not efficient
1. What: Execute the program in symbolic domain, and generate test input based by solving the constraints
2. Problem:
  1. Can't solve all constraints.
Symbolic Execution - Concolic Testing
1. What: Combine concrete and symbolic execution
2. Tools: DART, CUTE
3. DART steps:
  1. interface extraction
  2. generation of test driver for random testing
  3. directed search
4. Problems:
  1. Path explosion
  2. Environmental interaction
Search-based Methods -> Search-based Software Engineering
1. Apply search techiniques to search large search spaces, guided by a fitness function that captures properties of the acceptable software artefacts we seek.
2. Metaheuristic search algorithms: Genetic Algorithms, Hill climbing, Simulated Annealing, Random, Tabu Search, Estimation of Distribution Algorithms, Particle Swarm Optimization
3. Genetic Algorithm -> evolutionary testing
4. 编码：
  1. 数值型输入：二进制编码，input vectors
  2. 非数值型输入：coding and encoding
5. Problems:
  1. Flag variable problem -> testability transformation
  2. Prematurity problem: population may prematurely converge on one local optimal solution -> DOMP (Dynamic Optimisation of Mutation Possibility)
  3. 论文：演化测试技术的研究
6. Article：Evolutionary testing of classes
Adaptive Random Testing
1. An enhanced form of random testing.
2. Adaptive random testing seeks to distribute test cases more evenly within the input space. It is based on the intuition that for non-point types of failure patterns, an even spread of test cases is more likely to detect failures using fewer test cases than oridinary random testing.
3. Article: Adaptive Random Testing
Regression Testing
1. Run old test cases on the new version of software.
2. It will cost a lot if we run the whole suite each time.
3. Try to save time and cost for new rounds of testing
  1. Test prioritization
    1. How to?
      1. To discover bugs sooner
      2. Or approximation: to achieve higher coverage sooner
    2. Average Percentage of Fault Detected (APFD): A number of faults are detected after each test case.
    3. Approaches:
      1. Coverage-based approach: The assumption is that the maximization of structural coverage will increase the change of the maximization of fault detection.
      2. Distribution-based approach: It prioritizes test cases based on the distribution of the profiles of test cases. Clustering test cases show that similar test cases are redundant and isolated clusters may cause failures.
      3. Human-based approach: A human tester is used in this kind of approach. Prioritization is based on the comparisions made by the human tester.
  2. Test relevant code
    1. Change impact analysis
  3. Record and replay

[!citation] Due to non-linearity of software, test problems are converted to complex, discontinuous, non-linear search spaces. - Baresel

Challenge 2 - Oracle Problem (谕言缺失问题)¶

What is test oracle? An "Inspector" of executions.
How to solve?
1. application domains
2. development environments
3. development phases
Solution:
1. Precise Documentation
2. Invariant
  1. A constraint over a variable's value
  2. A relationship between multiple variable values.
  3. Defined as mathematical predicates.
3. Metamorphic Testing (蜕变测试)
  2. Purpose: To make passing test cases useful. Generate test cases from passing test cases.
  3. Applications:
    1. 面向 MRC (Machine Reading Comprehension) 的测试和验证
    2. 面向 QA 软件的测试与验证
    3. 面向 POS 标注的测试与验证
    4. 面向多目标跟踪系统的测试与验证
  4. 基于多样性的测试优化
    1. MT is not just to find specific failure-inducing inputs.
    2. Cost Effectiveness (测试优化方法)
    3. Which tests have higher fault detection ability?
      1. MP which leads to diverse executions is more likely to trigger violations.
      2. Diversity - white box (statement coverage, path coverage, etc.)

Software Fault Analysis¶

SBFL (Spectrum-based Fault Localization)¶

Reference: Essential Spectrum-based Fault Localization

Program spectrum: runtime-profile

Associated with each testing result
Granularity: statements, branches, etc.
Coverage information: binary (execution slice), execution trace, etc.

Risk evaluation formula:

\[Tarantula=\frac{\frac{e_f}{e_f+n_f}}{\frac{e_p}{e_p+n_p} + \frac{e_f}{e_f+n_f}}\]

\(e_f\): the number of failure test cases that cover this statement
\(e_p\): the number of passing test cases that cover this statement
\(n_f\): the number of failure test cases that don't cover this statement
\(n_p\): the number of passing test cases that don't cover this statement

	passing test cases	failure test cases
cover	\(e_p\)	\(e_f\)
not cover	\(n_p\)	\(n_f\)

Expanse metric: 须被检查语句的比例

最优公式判定理论框架：A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization

SBFL: 适用于大规模（排除大量无关语句/方法）、粗粒度（如方法级别）的缺陷定位

MSlice-based Fault Localization (蜕变切片)¶

Tackling the oracle problem in SBFL.

[!abstract] Currently, all existing SBFL techniques have assumed the existence of a test oracle; otherwise, the program spectrum will not be associated with the testing result of failed or passed. As a consequence, a program with no test oracle will have no sufficient information to perform SBFL. However, in many real-world applications, it is very common that test oracles do not exist, and hence SBFL cannot be applied in such situations. In this chapter, we will introduce a technique proposed by us Xie et al. (Inf Softw Technol 55(5):866–879, 2013), namely, metamorphic slice to alleviate this problem. Metamorphic slice is resulted from the integration of metamorphic testing and program slicing. Instead of using the program slice and the testing result of failed or passed for an individual test case, metamorphic slice and the testing result of violation or non-violation of a metamorphic relation are used. Then, the existence of test oracle is no longer a prerequisite to SBFL, and hence the application domain of SBFL can be significantly extended.

ref: Essential Spectrum-based Fault Localization, chapter 7.

Bug-report Analysis¶

Tool: iTAPE