Duplicate Finder, Part of ReSharper Command Line ToolsSeptember 3rd, 2013 by Dmitry Matveev
Along with ReSharper 8 EAP earlier this year, we have made ReSharper Command Line Tools available for you to download and try. We have already written about one of the tools included in this package — InspectCode, which analyzes your code outside of Visual Studio using hundreds of ReSharper code inspections. But the package also includes another tool, dupFinder and we’ll take a closer look at it in this post.
As its name suggests, dupFinder finds duplicates in C# and Visual Basic .NET code. Being a JetBrains tool, dupFinder does it in a smart way. By default, it considers code fragments as duplicates not only if they are identical, but also if they are structurally similar, even if they contain different variables, fields, methods, types or literals. Of course, you can configure allowed similarity level as well as the minimum relative size of duplicated fragments.
Running Duplicate Analysis
dupFinder is not exactly a new kid on the block. For quite a while, JetBrains TeamCity has included it out of the box, and this is probably the easiest and the most efficient way to make use of dupFinder. However, from now on you can get it running with your custom CI, version control, quality control or any other server and here is how:
- Download and unzip ReSharper Command Line Tools
- Run the following command:
dupFinder [OPTIONS] source
One way to define the target sources is to specify a solution file: dupFinder understands solution files of Visual Studio 2003, 2005, 2008, 2010, and 2012. Alternatively, you can provide a specific list of source files as a set of newline-delimited wildcards.
Using optional parameters, you can configure how dupFinder should analyze your source code. To explore the full list of options, run
dupFinder /help. Below are some of the options that you might be interested in:
/excludeallows excluding files from duplicate code search. The value is a set of newline-delimited wildcards (for example,
**Generated*.cs). Note that the paths should be either absolute or relative to the working directory.
/exclude-code-regionsallow excluding files by substrings of opening comments and regions. The value is a set of newline-delimited keywords (e.g. ‘generated code’ will exclude regions containing ‘Windows Form Designer generated code’).
/discard-fields, /discard-literals, /discard-local-vars, /discard-typesspecify whether to filter out similar fragments as non-duplicates if they have different variables, fields, methods, types or literals. The default value for all of them is ‘false’. To illustrate the way it works, consider the following example. There are two code fragments otherwise identical, one contains
myStatusBar.SetText("Logging In...");, the other contains
myStatusBar.SetText("Not Logged In");. If ‘discard-literals’ is set to ‘false’, these fragments are considered duplicates.
/discard-costallows setting a threshold for code complexity of duplicated fragments. The fragments with lower complexity are discarded as non-duplicates. The value for this option is provided in relative units.
Using this option, you can filter out equal code fragments that present no semantic duplication. E.g. you can often have the following statements in tests:
Assert.AreEqual(gold, result);. If the ‘discard-cost’ value is less than 10, statements like that will appear as duplicates, which is obviously unhelpful. You’ll need to play a bit with this value to find a balance between avoiding false positives and missing real duplicates. The proper values will differ for different codebases.
/show-text: if this parameter is used, detected duplicate fragments will be embedded into the report.
The resulting output is a single XML file that presents the following information:
Statisticsnode is an overview of analyzed code, where
CodeBaseCostis the relative size of target source code,
TotalFragmentsCostis the relative size of the code for analysis after applying filters (’discard-cost’, ‘discard-literals’, etc.), and
TotalDuplicatesCostis the relative size of detected duplicates.
Duplicatenodes, which in turn contain two or more
Duplicatenode has a
Costattribute: duplicates with greater cost are the most important ones as they potentially present greater problems.
Fragmentelement contains file name as well as duplicated piece presented in two alternative ways: as a file offset range and as a line range. If the
/show-textoption was enabled for analysis, then a
Textnode with the duplicated code is added to each fragment.
We are now ready to have some practice with dupFinder. In the steps described below we’ll take a solution, e.g. SolutionWithDuplicates.sln and see how to start duplicate analysis using an MSBuild target with a simple HTML report based on the dupFinder output.
First, we unzip ReSharper Command Line Tools somewhere, e.g. in C:\programs\CLT.
Now let’s think ahead to processing the dupFinder output. If we leverage the
/show-text option, we’ll be able to build an HTML report by applying an XSL transformation to the dupFinder XML output; something like this will do:
We put this XSL stylesheet with the rest of the tools into C:\programs\CLT.
The easiest way to run duplicate analysis and the ensuing transformation is specify a new MSBuild target. Since we are now in the solution directory, we go into one of its project subdirectories and open the project file (*.csproj) with a text editor, then add the following element into the root
In this build target, which executes after the project build is finished, we move the working directory one folder up from the project directory to the solution directory, run dupFinder, and then apply an XSL transformation to the dupFinder outpiut using our XSL stylesheet.
Finally, all we have to do is to build our solution. If everything goes right, we’ll get two new files in the solution directory: dupReport.xml and dupReport.html. If we open dupReport.html, we can look through the list of all detected duplicates right in the web browser:
This simple example can be extended and customized in many ways, but we hope it shows you that there is nothing difficult in integrating ReSharper Command Line Tools into your workflow.