Spontaneous, authentic comments by users of Project Analyzer: "It really solved a big problem we had. We have had a team of 3 people working on the same program for the past 3 years, it is very large. We were able to immediately shave 300k off the size of our executable!" Lance Keene, "I purged about 20 dead source files and over 50 dead routines, and also found a number of bugs thanks to your option explicit checking. In our 150,000 line application, it only takes about a minute and a half to perform the initial analysis." Justin H. "The dollars spent have already paid themselves back. What your program did in less than 17 minutes would have taken me well over 200 hours. I used the time saved to add comments and do further optimizing of the code (original exe was about 1.2 meg, now it is only 700k)." Rich Ciszek - ciszek@sprynet.com |
Dead code detection and removalIncrease quality, decrease files and save money by removing unused, dead code. Dead code – what's that?Dead code means unnecessary, inoperative code that can be – and should be – removed. The opposite of dead code is live, operational code. You can significantly improve a program's quality by removing dead code. While the functionality won't change, it will improve the internal quality – the quality of the source code. This will help in maintenance by decreasing the maintained code size, making it easier to understand the program and preventing bugs from being introduced. If you're working with Visual Basic (or VB.NET or VBA), you can use our Project Analyzer to detect and even remove dead code. Project Analyzer shows how much dead and semi-dead code there is and where. Utilizing advanced analysis techniques and an extensive rule base tuned for Visual Basic and VB.NET, Project Analyzer can reliably detect dead code from live code. The analysis works reliably also with the most complex code that utilizes object-oriented techniques and late binding. Types of dead codeDead procedure or functionA dead procedure is not called by any other procedure. It is thus never executed and not required for any purpose. Dead produres make up the most important part of dead code. They bloat the executables, make systems harder to understand and might even introduce new errors later in the life-cycle of the program. Bloating. Many compilers include all procedures in the executables regardless of whether they are used or not. This bloats the file size – and the bloating effect may be really high. Even if you don't care a about the executable size, dead procedures have another adverse effect: they make your program harder to understand. A person trying to make sense of your program spends time reading the dead procedures. If a dead procedure is later reused in the program, it may introduce new bugs in it. Why was the procedure dead in the first place? Did it function incorrectly? Was it superseded by a better function? Did it perform useless or wrong actions? The dead code may not have been tested thoroughly and it may contain bugs that are hard to detect. Even if the dead procedure was good and well tested 5 years ago when it was last used, it might not work in today's program, in the current environment, on modern hardware or according to the current requirements and business rules. A special case of a dead procedure is one that has callers, but the callers never execute. In this case you need the procedure to compile the program, but not to execute it. The way to solve this is to delete the callers along with this procedure. Dead variableA dead variable is not read nor written. It is completely useless, only taking up memory and a line or two of code. Not a big problem, right? Wrong. You might later write code that reads the dead variable without noticing there's nothing to read, really. You might be building your code against missing data. Alternatively, you might assign the dead variable a value expecting some other part of the program to pick it up and take action. Again, the program would not perform as you expected. A written-only variable is one that is given a value (or several values) but the value is never read. The program may be doing some serious calculation to set the value, yet it's all in vain. This may indicate a flaw in the program. Maybe the function or class that should read the value was not implemented properly. Maybe the variable is an important status or error code and the developers just forgot to take the appropriate action on it. Checking out these written-only variables may reveal hidden bugs. Dead parameterA dead parameter is passed to a procedure but not used by it. Passing parameters takes a little bit of time during each call. Dead parameters can make your program slower, especially if they are in the public interface and require mashaling. An unused parameter also gives callers a wrong idea of what your code does. Suppose you have a function called WriteFile with a dead Boolean parameter called Overwrite. A person writing a call to WriteFile expects the parameter to have an effect. It's probably for controlling whether an old file will be overwritten or not: a True value would overwrite, a False value would ensure preserving the old file. Well, as this is a dead parameter, it has no effect whatsoever. The function might be overwriting the old file or failing, and the caller would have the wrong idea of what it's doing. Thus, a dead parameter may indicate a flaw in the function or at least give the wrong indication leading to mistakes in the future. Dead return valueA dead return value of a function is not stored or used by any of the callers. This indicates one of two alternatives:
Sometimes a part of the callers use the return value, but the other callers ignore it. This indicates a potential flaw in the ignoring callers. Dead event declarationA dead event does not fire. A truly dead event is not handled nor fired. A semi-dead event just does not fire and its handlers are not executed. Publishing dead events in your classes or controls may cause headache to the users of these classes/controls. The users write code to handle the event, yet the event never fires. Where's the problem? It's hard to tell when the class/control doesn't work "as advertised". How about an event that fires but that's not handled? Is that a dead event? No, that's not dead. Most events are designed (or should be) so that handling them is optional. The class or control should function correctly even if no events are handled. Dead enumeration or constantA dead enumeration or constant value is not required by the program. You should remove such unused data. Depending on the program, the data might be compiled in the executable, thus bloating its size. In Visual Basic, dead numeric constants do not end up in the executable, but string constants do. Dead user-defined typeA user-defined type (structure, record) is one that is not used anywhere. It might not do much harm, but if it's useless, there is no point keeping it available. It will only make the program harder to read. Dead classA dead class is not used anywhere. It may still be compiled in the executable and even published as a part of the library interface. This bloats the executable and makes the library unnecessarily complex. And what if the class should become alive again? Would it still be functioning correctly? The class may contain untested or old code. A developer using the class might not notice this. By reintroducing the class in the program he could be adding errors. A semi-dead class may be in use as a data type but not instantiated at run-time. Again, it's a useless class that should probably be removed. A special case of a semi-dead class is an abstract base class that is not inherited. The base class is useless as it cannot be instantiated. Child classes should be added or the base class should be removed. Dead interfaceA dead interface is not used anywhere. It's only making the program more complex and harder to understand. A semi-dead interface may be in use as a data type but not implemented by any class. In this case, the users of the interface are written against non-existing code. When the users execute, they may fail due to a missing object reference (NULL pointer). Alternatively, the semi-dead interface may be implemented by a class (or two or more), but never really used for making calls. This has the same drawbacks as a dead class has. Dead module, dead fileA dead module or file is one whose contents are not used for any purpose. They are only making the program more complex, more bloated and harder to understand. Such modules and files should be excluded from the program and stored at a special location for potential reuse. Dead controlA control on a form may be invisible, outside of form borders or disabled during its lifetime. Such a control is possibly a leftover from ealier UI changes. Carrying invisible or disabled controls will bloat the executable size and consume extra system resources at run-time. A needless control might also cause an extra run-time file dependency. Suppose the needless control requires an additional OCX. You need to include the OCX in the setup package and distribute to the users. They need to have the correct version installed or your program might crash. The OCX needs to be loaded and initiated when your form shows up. All this for no use. If you can remove the control and get rid of the OCX, your program will be easier to distribute and faster to run. Sometimes an invisible or disabled control is required, though. The code might access the control's methods or properties for some purpose. Remove dead controls with care. Dead external declarationSeveral languages allow external declarations. For example, Visual Basic provides the Declare statement for calling API functions. If such a declaration is not used, it is dead. Dead declarations may bloat the executable size. They also make it harder to understand what external calls the program actually makes. If a dead declaration should have a bug, such as an incorrect parameter data type, a developer using that declaration later could introduce a bug that is hard to detect. (Yes, this happened to us too.) How does dead code get injected in programs?So who writes the dead code? Bad coders, perhaps? Believe it or not, everyone writes dead code. It happens because of code changes. An old function becomes obsolete, the program is changed to use another class, a constant is replaced by another one etc. These changes occur in time as the software is maintained, fixed and developed further. Another source of dead code is code reuse. If you copy or link existing modules into a new program, you often include useless parts as well. The new program may not be reusing all of the old functionality, just a part of it. What's so nasty about it is that the dead code accumulates without anyone really noticing it. When you discard a function or a class, you should either delete it, move it to a backup location, comment it out or exclude it by a conditional compilation directive. How many of us really do that? Even if we notice a function is now dead, do we actually do something about it? It gets worse as we don't always notice that our code just became out of use. We can't take action if we don't notice what's happening. That's why dead code keeps accumulating. Dead code may account for 30-40% of the size of an old program. When to be concerned about dead code?Dead code removal is a good idea when your program is stable. When you're working hard on it, the "deadness" is constantly changing. We suggest you finish your work first, then clean it up. There are certain good times to clean up your code.
How do I detect and remove dead code?Manual code reviewYou can conduct a manual code review of your system. For each file, class, module and interface, list where it is being used. Each part for which a use location is not found is a candidate for removal. As a by-product, you get a useful list of dependencies. You can also do the same for each function, variable, constant and so on. This can get tiresome in a large program. Fortunately, there are automated tools to help you out. Automated code review toolsAn automated dead code detection tool inspects your program to find unused parts. It works by recording each use of each programming object. After recording, it produces a listing of the parts for which use was not found. If the tool is good enough, it provides an option to delete or comment out the code automatically. It is important for such a tool to be reliable. As the output of the tool is used to delete source code, it needs to produce accurate results. A false alarm may lead to the deletion of good code. Because of this, the tool needs to read the entire source code – and properly interpret every single part of it. Missing just one use location can result in declaring half of the program as unused. An example shows us what the tool needs to be aware of. Suppose two classes, A and B, both having a method named M. The tool must be sophisticated enough to tell when syntax such as "p.M" refers to M in class A, when it's M in class B and when it can be either. Thus, it needs to know p's data type. In fact, it needs to know all possible data types of p for polymorphic calls. Simply looking for all occurrences of the name M is not enough. If you're working with Visual Basic (or VB.NET or VBA), you can use our Project Analyzer to detect and even remove dead code. Project Analyzer shows how much dead and semi-dead code there is and where. Utilizing advanced analysis techniques and an extensive rule base tuned for Visual Basic, VB.NET and VBA, Project Analyzer can reliably detect dead code from live code. The analysis works reliably also with the most complex code that utilizes object-oriented techniques and late binding. Beware of bad toolsA bad approach to dead code detection is a run-time analysis that logs all calls when the program executes. After execution, a list of the unused parts is built. While this appears to produce a dead code list, it only produces a list of potentially dead code. The program may not have executed all the lines under all possible conditions. Thus, some live code was most probably not executed and the dead code list includes code that executes during another run. What is more, compile-time requirements are not detected. Such requirements can be the use of constants, user-defined data types, class data types and interfaces. Run-time analysis has its uses in performance optimization and testing, but it's not perfect for dead code detection. Yet another, but almost as bad, approach to automated dead code detection is programmatic reading of the compiled files: the .exe file or the intermediate code (byte code, MSIL). Again, compile-time requirements are not detected. What is more, compiled files contain less information than the source code. Only the source code provides enough detail for reliable and complete dead code detection. Since dead code detection is a delicate process that requires absolute certainty of the deadness status, developers should only use a tool specifically designed to read source code to find the unused parts. Project Analyzer is such a tool. ©Aivosto Oy -
|