Dead code detection and removal
Increase quality, decrease files and save money by removing unused, dead code.
Dead code, what's that?
Dead code means unnecessary, inoperative code that can—and should—be removed. The opposite of dead code is live, operational code.
You can significantly improve a program's quality by removing dead code. While the functionality won't change, it will improve the internal quality—the quality of the source code. This will help in maintenance by decreasing the maintained code size, making it easier to understand the program and preventing bugs from being introduced.
If you're working with Visual Basic (or VB.NET or VBA), you can use our Project Analyzer to detect and even remove dead code. Project Analyzer shows how much dead and semi-dead code there is and where. Utilizing advanced analysis techniques and an extensive rule base tuned for Visual Basic and VB.NET, Project Analyzer can reliably detect dead code from live code. The analysis works reliably also with the most complex code that utilizes object-oriented techniques and late binding.
Types of dead code
Dead procedure or function
A dead procedure is not called by any other procedure. It is thus never executed and not required for any purpose.
Dead produres make up the most important part of dead code. They bloat the executables, make systems harder to understand and might even introduce new errors later in the life-cycle of the program.
Many compilers include all procedures in the executables regardless of whether they are used or not. This bloats the file size—and the bloating effect may be really high.
Even if you don't care a about the executable size, dead procedures have another adverse effect: they make your program harder to understand. A person trying to make sense of your program spends time reading the dead procedures.
If a dead procedure is later reused in the program, it may introduce new bugs in it. Why was the procedure dead in the first place? Did it function incorrectly? Was it superseded by a better function? Did it perform useless or wrong actions? The dead code may not have been tested thoroughly and it may contain bugs that are hard to detect. Even if the dead procedure was good and well tested 5 years ago when it was last used, it might not work in today's program, in the current environment, on modern hardware or according to the current requirements and business rules.
A special case of a dead procedure is one that has callers, but the callers never execute. In this case you need the procedure to compile the program, but not to execute it. The way to solve this is to delete the callers along with this procedure.
A dead variable is not read nor written. It is completely useless, only taking up memory and a line or two of code. Not a big problem, right? Wrong.
You might later write code that reads the dead variable without noticing there's nothing to read, really. You might be building your code against missing data. Alternatively, you might assign the dead variable a value expecting some other part of the program to pick it up and take action. Again, the program would not perform as you expected.
Truly dead variables are one problem, but a more significant problem lies in partially used variables. A variable should be both written to and read from. If the other side is missing, the variable is read-only or written-only. These variables indicate hidden bugs in the program.
A written-only variable is one that is given a value (or several values) but the value is never read. The program may be doing some serious calculation to set the value, yet it's all in vain. This may indicate a flaw in the program. Maybe the function or class that should read the value was not implemented properly. Maybe the variable is an important status or error code and the developers just forgot to take the appropriate action on it.
Similarly, a read-only variable is one that is being read, but never assigned a value. Even harder to detect, it may be assigned an empty value only, keeping the variable empty at all times. This may indicate missing logic in the program. The function or class responsible for storing the value was not written or it was deleted by accident. Read instructions don't yield a proper value and reader procedures may not behave the expected way.
A variable may also be referenced in the code, but not actually used at run-time. In such a case, the reader and writer procedures are all dead. The variable is left effectively dead too. What is more, some readers may be alive but all the writers dead (or vice versa). This means the variable is in partial use only, and a potential flaw exists in the code. This type of dead code is very hard to detect without a proper code analysis tool.
A dead parameter is passed to a procedure but not used by it. Passing parameters takes a little bit of time during each call. Dead parameters can make your program slower, especially if they are in the public interface and require marshaling.
An unused parameter also gives callers a wrong idea of what your code does. Suppose you have a function called WriteFile with a dead Boolean parameter called Overwrite. A person writing a call to WriteFile expects the parameter to have an effect. It's probably for controlling whether an old file will be overwritten or not: a True value would overwrite, a False value would ensure preserving the old file. Well, as this is a dead parameter, it has no effect whatsoever. The function might be overwriting the old file or failing, and the caller would have the wrong idea of what it's doing. Thus, a dead parameter may indicate a flaw in the function or at least give the wrong indication leading to mistakes in the future.
Dead return value
A dead return value of a function is not stored or used by any of the callers. This indicates one of two alternatives:
- The callers are not written correctly. The callers are ignoring the return value, which may be a status code or other important data. The callers may be expecting that the return value never changes. What if the value is different sometimes, due to an error perhaps? Are the callers still functioning correctly?
- The callers are written correctly. The return value is probably useless. The function itself could be rewritten as a sub-program or a void function, depending on the language. By removing the return value altogether you ensure that developers writing new calls to the function are not taking any action based on such a useless return value.
Sometimes a part of the callers use the return value, but the other callers ignore it. This indicates a potential flaw in the ignoring callers.
Dead event declaration
A dead event does not fire. A truly dead event is not handled nor fired. A semi-dead event just does not fire and its handlers are not executed.
Publishing dead events in your classes or controls may cause headache to the users of these classes/controls. The users write code to handle the event, yet the event never fires. Where's the problem? It's hard to tell when the class/control doesn't work "as advertised".
How about an event that fires but that's not handled? Is that a dead event? No, that's not dead. Most events are designed (or should be) so that handling them is optional. The class or control should function correctly even if no events are handled.
Dead enumeration or constant
A dead enumeration or a dead constant is not required by the program. You should remove such unused data. Depending on the program, the data might be compiled in the executable, thus bloating its size.
In Visual Basic, dead numeric constants do not end up in the executable, but string constants do.
Dead user-defined type
A dead user-defined type (structure, record) is one that is not used anywhere. It might not do much harm, but if it's useless, there is no point keeping it available. It will only make the program harder to read.
Fields of a user-defined type are like variables. They can be dead too.
A dead class is not used anywhere. It may still be compiled in the executable and even published as a part of the library interface. This bloats the executable and makes the library unnecessarily complex. And what if the class should become alive again? Would it still be functioning correctly? The class may contain untested or old code. A developer using the class might not notice this. By reintroducing the class in the program he could be adding errors.
A semi-dead class may be in use as a data type but not instantiated at run-time. Again, it's a useless class that should probably be removed.
A special case of a semi-dead class is an abstract base class that is not inherited. The base class is useless as it cannot be instantiated. Child classes should be added or the base class should be removed.
A dead interface is not used anywhere. It's only making the program more complex and harder to understand.
A semi-dead interface may be in use as a data type but not implemented by any class. In this case, the users of the interface are written against non-existing code. When the users execute, they may fail due to a missing object reference (NULL pointer).
Alternatively, the semi-dead interface may be implemented by a class (or two or more), but never really used for making calls. This has the same drawbacks as a dead class has.
Dead module, dead file
A dead module and a dead file is one whose contents are not used for any purpose. They are only making the program more complex, more bloated and harder to understand. Such modules and files should be excluded from the program and stored at a special location for potential reuse.
A control on a form may be invisible, outside of form borders or disabled during its lifetime. Such a control is possibly a leftover from ealier UI changes.
Carrying invisible or disabled controls will bloat the executable size and consume extra system resources at run-time. A needless control might also cause an extra run-time file dependency. Suppose the needless control requires an additional OCX. You need to include the OCX in the setup package and distribute to the users. They need to have the correct version installed or your program might crash. The OCX needs to be loaded and initiated when your form shows up. All this for no use. If you can remove the control and get rid of the OCX, your program will be easier to distribute and faster to run.
Sometimes an invisible or disabled control is required, though. The code might access the control's methods or properties for some purpose. Remove dead controls with care.
Dead external declaration
Several languages allow external declarations. For example, Visual Basic provides the Declare statement for calling API functions. If such a declaration is not used, it is a dead declaration.
Dead declarations may bloat the executable size. They also make it harder to understand what external calls the program actually makes. If a dead declaration should have a bug, such as an incorrect parameter data type, a developer using that declaration later could introduce a bug that is hard to detect. (Yes, this happened to us too.)
An unreachable statement is one that can never execute. There exists no control flow path to the statement. Unreachable code is typically found after an unconditional jump or in a branch that cannot happen. It can also exist in a loop that doesn't get iterated, or after a forever loop where execution never flows out of the loop.
Unreachable code may indicate an error in the program: important code doesn't execute. Unreachable code can also be a remnant of previous functionality which as been disabled, either deliberately or accidentally.
A block of code can be unreachable due to either compile-time or run-time reasons. Compile-time unreachability means that the code will never execute unless it is changed. Such a change can involve moving some lines, deleting a jump or rewriting a conditional statement or loop.
Run-time unreachability means that the code could execute, but the value of one or more variables prevents it from executing. Such an issue can happen if a flag variable always contains the value False, for example. If the code is later changed so that a new value is being assigned to the variable(s), the code can become reachable again.
How does dead code get injected in programs?
So who writes the dead code? Bad coders, perhaps?
Believe it or not, everyone writes dead code. It happens because of code changes. An old function becomes obsolete, the program is changed to use another class, a constant is replaced by another one etc. These changes occur in time as the software is maintained, fixed and developed further.
Another source of dead code is code reuse. If you copy or link existing modules into a new program, you often include useless parts as well. The new program may not be reusing all of the old functionality, just a part of it.
What's so nasty about it is that the dead code accumulates without anyone really noticing it. When you discard a function or a class, you should either delete it, move it to a backup location, comment it out or exclude it by a conditional compilation directive. How many of us really do that? Even if we notice a function is now dead, do we actually do something about it? It gets worse as we don't always notice that our code just became out of use. We can't take action if we don't notice what's happening.
That's why dead code keeps accumulating. Dead code may account for 30-40% of the size of an old program. A typical value is 15%.
When to be concerned about dead code?
Dead code removal is a good idea when your program is stable. When you're working hard on it, the "deadness" is constantly changing. We suggest you finish your work first, then clean it up.
There are certain good times to clean up your code.
- Before release. This is the ideal timing for dead code removal. The program is almost ready and the team has fresh knowledge of how the system works. Now is the time to use that knowledge for something productive. Dead code removal will be most efficient in this case. Besides, as dead code can indicate a flaw in the program, the flaws will get revealed before the users find them. When the dead code is cleaned out, the released program will be small and efficient. Sadly, dead code removal is seldom done at this phase as the team hurries to get the program released under a tight schedule.
- When a new person takes over old code. The new developer needs to learn the system before making changes to it. Dead code detection is a great way to learn old code. After cleaning, the new developer has some knowledge of the system and can then focus on the live parts without worrying about the dead parts any more.
- Before team members change. Again, the new members need to learn old code. By removing dead code before the new members come in you can help their efforts and prevent them from reintroducing hidden bugs.
- Before migration or conversion. When you're planning to convert old code to a new system or a new language, removing the dead code can really pay off. It doesn't make sense to migrate the dead parts.
- Before reuse. As reuse old modules or classes in a new program, it can result in doubled dead code: both in the old program and in the new one. Clean it up first, then reuse.
- Before writing a new version. Writing a new version is good to start by cleaning up the old version first. The old version is probably cluttered with dead bits resulting from the numerous changes and fixes done. This way you can also learn about problems with the old code, find unimplemented functionality etc.
- When purchasing someone's code. If you're paying by the number of lines, make sure you're not paying for needless lines.
How do I detect and remove dead code?
Manual code review
You can conduct a manual code review of your system. For each file, class, module and interface, list where it is being used. Each part for which a use location is not found is a candidate for removal. As a by-product, you get a useful list of dependencies.
You can also do the same for each function, variable, constant and so on. This can get tiresome in a large program. Fortunately, there are automated tools to help you out.
Automated code review tools
An automated dead code detection tool inspects your program to find unused parts. It works by recording each use of each programming object. After recording, it produces a listing of the parts for which use was not found. If the tool is good enough, it provides an option to delete or comment out the code automatically.
It is important for such a tool to be reliable. As the output of the tool is used to delete source code, it needs to produce accurate results. A false alarm may lead to the deletion of good code. Because of this, the tool needs to read the entire source code and properly interpret every single part of it. Missing just one use location can result in declaring half of the program as unused.
An example shows us what the tool needs to be aware of. Suppose two classes, A and B, both having a method named M. The tool must be sophisticated enough to tell when syntax such as "p.M" refers to M in class A, when it's M in class B and when it can be either. Thus, it needs to know p's data type. In fact, it needs to know all possible data types of p for polymorphic calls. Simply looking for all occurrences of the name M is not enough.
If you're working with Visual Basic (or VB.NET or VBA), you can use our Project Analyzer to detect and even remove dead code. Project Analyzer shows how much dead and semi-dead code there is and where. Utilizing advanced analysis techniques and an extensive rule base tuned for Visual Basic, VB.NET and VBA, Project Analyzer can reliably detect dead code from live code. The analysis works reliably also with the most complex code that utilizes object-oriented techniques and late binding.
Beware of bad tools
A bad approach to dead code detection is a run-time analysis that logs all calls when the program executes. After execution, a list of the unused parts is built. While this appears to produce a dead code list, it only produces a list of potentially dead code. The program may not have executed all the lines under all possible conditions. Thus, some live code was most probably not executed and the dead code list includes code that executes during another run. What is more, compile-time requirements are not detected. Such requirements can be the use of constants, user-defined data types, class data types and interfaces. Run-time analysis has its uses in performance optimization and testing, but it's not perfect for dead code detection.
Yet another, but almost as bad, approach to automated dead code detection is programmatic reading of the compiled files: the .exe file or the intermediate code (byte code, MSIL). Again, compile-time requirements are not detected. What is more, compiled files contain less information than the source code. Only the source code provides enough detail for reliable and complete dead code detection.
Since dead code detection is a delicate process that requires absolute certainty of the deadness status, developers should only use a tool specifically designed to read source code to find the unused parts. Project Analyzer is such a tool.