System complexity

System complexity SYSC, also known as design complexity (defined by Card & Agresti), is a composite measure of complexity inside procedures and between them. It measures the complexity of a system design in terms of procedure calls, parameter passing and data use.

System complexity is originally a design-time metric. It can be used to evaluate the difficulty of producing a designed system even before the actual implementation exists. System complexity can also be calculated from the source code to evaluate the same thing. System complexity is not suitable for the evaluation of how difficult it is to change an existing system.

System complexity consists of two elements: SC structural complexity (external) and DC data complexity (internal). So how do we calculate SC and DC? We start with the following definitions:

SFOUT = Structural fan-out = Number of other procedures called (see SFOUT) IOvars = number of input/output variables for a procedure (see IOvars)

SC Structural complexity

Structural complexity (the external complexity) for a procedure equals its fan-out squared:

SC = SFOUT2

As you can see, a procedure that calls a large number of other procedures has a relatively high structural complexity. This interaction with other procedures is why SC is thought of as the external complexity.

DC Data complexity

Data complexity (the local or internal complexity) for a procedure is defined by the following equation:

DC = IOvars / (SFOUT + 1)

The more data (IOvars) the procedure reads and writes, the higher data complexity it has. On the other hand, the more other procedures it calls (SFOUT), the lower the data complexity, as parts of the complex data processing is likely to have been delegated to the other procedures.

SYSC and RSYSC: Total and relative system complexity

Now that we can calculate SC and DC for one procedure, let's calculate the complexity of the entire system.

Total system complexity SYSC = sum(SC + DC) over all procedures Relative system complexity RSYSC = average(SC + DC) over all procedures

The relative system complexity is the more interesting measure. It measures the average complexity of procedures. It is a normalized measure for the entire system and it does not depend on the system size. It thus allows for design complexity evaluation among different systems.

Minimizing the relative system complexity RSYSC

RSYSC has been found to be a good predictor of the error rate in a system. A high RSYSC predicts lots of errors per lines of code. That's why a system design should produce a relatively low RSYSC. Reaching a low value requires a good balance between procedure division, the structure of the call tree and techniques of data read/write.

To minimize the structural complexity SC, you try to distribute the fan-out evenly among procedures. A procedure with a high fan-out has a high contribution to the complexity, and minimizing the average complexity requires an even distribution. The exception is leaf procedures that don't call other procedures.

To minimize the data complexity DC, several techniques exist. First, try to minimize variable repetition. Variable repetition occurs when the same variable appears as parameter in several procedures, or when a non-local variable is read or written by several procedures. You should remove variables from where they are not needed (by dead parameter elimination, for example). You should hide or encapsulate data in classes and modules to prevent too wide use.

Should you pass data via parameters or use non-local variables instead (globals, class variables, module-level variables)? It depends on the call tree. In a simple call (A->B) parameters produce lower data complexity. On the contrary, when data is to be passed in a large call tree, the use of non-local variables provides lower data complexity. You should not pass parameters through many procedures without using their values. If you need to use the data in several places, storing the value in a non-local variable results in a lower DC than passing the data up and down the call tree.

It is to be noted that minimizing DC may result in smaller procedures but more calls between them, leading into an increase in SC.

Reference values

Originally, Card & Agresti investigated 8 old systems (the newest one was from 1981) and found out the following values:

MeasureValue range
SC/proc11.8 ... 24.6
DC/proc4.9 ... 12.1
RSYSC22.6 ... 32.8Good RSYSC <= 25.3, poor RSYSC >= 26.5

They also compared RSYSC agains the quality of the systems and found out that 4 "poor quality" systems had RSYSC >= 26.5 while 4 "good quality" systems had RSYSC <= 25.3. One can question how suitable this old data is for Visual Basic systems. We suggest that you don't use this data as a reference as such, but establish your own threshold values.

Implementation details. Each input and output variable is counted once in v, regardless of how many times it's being used and if it's used as an input, output or as both input and output. By definition, an array is counted just like a single variable. An input variable is any non-local variable or a parameter that is read by the procedure. An output variable is any non-local variable or ByRef parameter that is written to by the procedure. — As a deviation from the original definition, the function return value is counted as an additional output variable. Assigning a return value is an alternative to writing the output to a ByRef parameter or a non-local variable. By counting the return value in v we get a more realistic value for DC.

Alternative symbols. SYSC is also known as C~ (total system complexity) and RSYSC as C (relative system complexity). Similary, SC is also known as S, and DC is also known as L (local complexity).

Readings

©Aivosto Oy - Project Analyzer Help Contents