Metric charts and analyses

Project Metrics offers a selection of charts to choose from. You can find them in the Chart menu. The charts visualize the values currently on the grid. Most charts visualize the values of the currently selected column (current metric). Thus, before making a chart, put the cursor on the metric column you are interested in.

You can save the charts to a file. The available file formats are metafile (EMF and WMF), bitmap and GIF. Resize the picture to the desired size before pressing the Save button. You can also copy & paste them to your favorite word processor for your project documentation.

More charts? If you need the build other types of charts, you can copy & paste or export the data to your favorite spreadsheet program.

Regression analysis in the XY chart

The XY chart includes a simple linear regression analysis of 2 metrics (x and y). It tells how the y metric depends on the x metric, if there is a statistical dependency.

If x and y correlate (at 95% probability), a regression line is drawn and its equation is displayed at the bottom of the graph. If there is no statistically significant correlation (values are unrelated or number of data points is low), no equation and line is shown. Here is an example of the equation.

CC = -0.68 + 0.16 x LLOC (R2=76%)

This means that in this project, LLOC explained 76% of the variation of a procedure's cyclomatic complexity. CC increased by 0.16 for each line of code. In other words, complexity increases as procedure size increases. This is no surprise as CC is the amount of decision statements + 1, and the number of decision statements is likely to be higher in a large procedure. Notice that this is not a universal equation, the coefficients are likely to vary by coding style.

The R2 value tells how well the x value explains the variation of the y value. R2=100% means perfect correlation. You can get values close to 100% from metric pairs that are closely related, such as LINES and LLINES (physical and logical lines). A low value of R2 means that although the values may be related, there is no clear linear relationship. A good way to evaluate the fit is also to see how well the regression line fits the data points.

For another project, we tried the XY chart where x=PARAMS and y=LLOC' (parameters vs. comment lines). We found out that R2=3%. Thus, the number of parameters and the number of comment lines were not related. One would probably expect that the amount of commentation increased by the number of parameters because the use of each parameter should be commented. Thus, the use of parameters was not properly commented in this project. In fact, a half of the procedures were totally uncommented. Related, we also tried with x=CC and y=LLOC'. In this case, R2=39%. We were happy to find a good positive correlation between complexity and commentation. This was a good sign because even if full commentation was lacking, the more complex procedures were commented to a certain extent.

Correlation and regression analysis

Select Correlation analysis in the Report menu to run a correlation and regression analysis on the data currently displayed in the grid.

This feature calculates the linear correlation coefficients (r) and regression line equations (y=a+bx) for each pair of the metric series.

Select the metrics to correlate before running correlation analysis. You can do this in the View combobox. A large number of selected metrics (such as <All>) leads to longer analysis times and a large correlation table, not very easy to read.

In the correlation table, r values are given if they are statistically significant (at the 95% probability level). A correlation value is omitted if it's not statistically significant (low correlation or small amount of data).

Regression equations and R2 values are given for each pair of metrics that are statistically correlated. Pay attention to the R2 value. Even though 2 metrics may be statistically correlated, the effect may be very low (a low R2). The regression equation is more meaningful when the R2 value is high.

Notice that a metric that is defined via another metric correlates with the other metric. This happens for a number of metric pairs. An example is IFIO=IFIN*IFOUT. IFIO correlates with IFIN and IFOUT because of its definition.

The more data you have, the better the correlation analysis. It's most useful on procedure-level data in a large project. If you have less than 10 lines of data (say project-level data), the results are probably not that interesting.

Regression analysis on page "Project in time"

On this page you get a simple changes / day regression analysis. When you select DATE as the X metric, you can project the development another metric (such as LLOC) by time.

Example. The below results mean that in this project, the developers have historically written about 24 lines of code per day, and 5 lines of comments.

LLOC: 23.82 / day (Rē=95%) LLOC': 5.06 / day (Rē=87%)

The equations show how your project has developed in time (assuming 7 days/week).

Use these historical time analyses with care. Especially, don't require your developers to write more code/day solely based on the above result. This is only a historical average. It depends a lot on what values DATE has. If you saved the metrics often during the test phase, for example, and less often during the coding phase, the test phase has got more weight in the analysis. The analysis gives equal weight to each DATE regardless of how many days passed. Thus, you get a more reliable average by saving metrics at fixed time intervals. What is more, LLOC is not a perfect way to measure programmer output, as it doesn't take changed or delete lines into account, nor bug fixing, meetings or planning efforts.

You might want to analyze historical project versions with the same version of Project Analyzer to make sure that no changes in different Project Analyzer versions affect the data.

Correlations on page "All analyses"

You cannot do a correlation analysis on the All projects page. This page may display several versions of one project. In this kind of a setting, the data lines are strongly related to each other and the correlations would be exaggerated.

©Aivosto Oy - Project Analyzer Help Contents