Clean Code, Technical Debt, and Documentation in Python – Part 2

Several automatic tools to improve code quality

software developers working in the office

In the previous post we have seen what clean code is, and why it is important to have high quality code. We have also described Technical Debt as one of problems that may occur as a result of not choosing the right direction at the time of solving a technical problem, and how this could negatively impact our projects.

Now it’s time to talk about the different automation tools that we can use in order to analyze our source code and perform several checks to make sure it meets the standards. This article is based on the Clean Code in Python book (2nd. edition), by Mariano Anaya.

Cool tools to run checks on the source code

Let’s remember that we should be able to understand how the code is written, therefore only we are able to discern good code from bad code.

So we should invest time in doing code reviews and thinking about how understandable and readable it is. There are three basic questions one should ask when reviewing code:

  1. Is it easy to understand and follow if read by a colleague?
  2. Does the code explain the problem domain well?
  3. If a new person joins the project, will he/she be able to understand this code?

 

As we saw previously, code formatting, consistency and indentation are important, but are not enough to achieve good, quality code. With the use of automated tools, we save time when checking these things and invest more in verifying that these three questions can be answered with a “yes.”

Type Consistency

What things can we check in the code? We can, for example, check the type consistency. As we saw before, Python allows us to add annotations in the code about the parameter types. This serves not only to improve readability, but also works in conjunction with type checking automation tools to detect possible errors in parameter use that could be potential bugs in our code. For example, if we define an input parameter as List, but actually inside a function we are manipulating it as if it was a dictionary, this kind of tool could detect it.

There are two commonly used Python packages, which are Mypy, and Pytype.

Here’s how this type of package works: you install it as a Python package and, when you run the command, it analyzes all the project’s files to check the  inconsistencies in the type’s usage. This allows you to detect possible errors in time, although sometimes it can even detect false positives. Sometimes, we have the use of the variable right, but the annotation is wrongly defined.

To use it in a simple way, just install it through the Python package manager, and then, standing in the directory where the files to check are located, run the mypy command. For example, if we have the following code:

				
					from typing import List
import logging

logger = logging.getLogger()

def broadcast_notification(
   message: str, relevant_user_emails: List[str]
) -> None:
   for email in relevant_user_emails:
       logger.info(f"Sending {message} to {email}")


broadcast_notification("welcome", "user@domain.com")
				
			

When we run this command:

We obtain the following error result:

This happens because we are invoking the function passing by parameter a string, when the expected type is a list of strings. If we change the type of the second argument to match the type hint indicated in the method, let’s say to enclose the email address in a List:

				
					from typing import List
import logging

logger = logging.getLogger()

def broadcast_notification(
   message: str, relevant_user_emails: List[str]
) -> None:
   for email in relevant_user_emails:
       logger.info(f"Sending {message} to {email}")


broadcast_notification("welcome", ["user@domain.com"])
				
			

When running the command in the console, we see as a result that everything is OK, which means that the type annotations are consistent with the rest of the code:

On the other hand, Pytype is a package used in a similar way to Mypy. The big difference is that Pytype not only checks the consistencies, but also tries to predict if the code is going to work or not at runtime with that type definition, and report the potential errors it may find.

Generic validations on the code

To validate other code aspects there are packages such as Pylint. Pylint is the most complete and strict tool available to perform validations on Python code and it’s highly configurable. To use it, you install the package, and run the Pylint command inside the directory (or Pylint followed by the file name) where you want to run the checks.

In this screenshot, we run Pylint to check a Python file, and it shows three errors. The first one claims we did not add a docstring for the module; the second one indicates we did not add a docstring for the function we have defined; and the third correction shows that we used the % symbol to format the strings instead of using an f-string, which is the most recommended alternative in Python.

How can we configure the Pylint package so it checks the things we want? This can be done through a configuration file called pylintrc. In that file we can define which rules to enable and which rules not to, and also parameterize others such as setting a maximum length of code line.

For example, if we want to disable the rule that verifies that every code function contains a docstring, we can add an option like the one shown below, which lets us disable that check. This allows us to tell Pylint that we do not want that rule to be checked at the time of running the command.

This possibility of configuring our own rules for the validation of code analyzers leads us to another tip that is recommended when it comes to improving the quality of what we do. First, document the code standards that were defined and agreed by the team, and transform them into a set of rules that can be configured in a file like this. It will allow us to automate these checks, and not depend as much on a human review.

Automatic code formatting

In addition to validating the code formatting, we may also want to automatically reformat the code to fulfill a standard without having to correct it ourselves.

For that, we have tools such as the package called black. Black automatically formats the code in a deterministic way, without allowing any parameterization save the length of each code line. For example, black always formats the strings with double quotes, since the order of the parameters always follows the same structure. This may sound a bit rigid, doesn’t it? But the advantage of this determinism and inflexibility in formatting code is that it ensures that differences in code formatting between all files in a repository are kept to a minimum. This is more restrictive than the PEP-8 standard, but it can be a convenient tool. If we do it automatically through a tool like black, we don’t have to worry about doing validations, as it happens in the Pylint package.

That’s the reason black exists; there are many ways to make a code conform to the PEP-8 standard, and one can have two files that respect PEP-8. Still, one can also have differences in their style. Black brings our code to a stricter set of rules than PEP-8, and always formats it the same way, which makes two different files in our repository follow the exact same style.

For example, if we have the following code that already respects the PEP-8 standard:

				
					def my_function(name: str) -> str:
   """
   >>> my_function('black')
   'received Black'
   """
   return 'received {0}'.format(name.title())
				
			

When we run the command below from the black package:

The function will become like this:

				
					def my_function(name: str) -> str:
   """
   >>> my_function('black')
   'received Black'
   """
return "received {0}".format(name.title())
				
			

In this example, although there was only a change in the code (only the single quotes were replaced by double quotes), both versions follow the PEP-8 standard. If we were to run the Pylint package to validate both codes, neither of them would have style errors. However, black always makes several versions of code that are PEP-8 compliant conform to a single version.

The default black command automatically formats the code, but if we don’t want that and what we want is to simply check that the code is OK and report errors when it finds inconsistencies, we must indicate the -check flag after the black command.

The problem about black is that it formats all the code in its entirety, and it is not possible to discriminate which parts of the code you want to format and which parts you don’t. For that, we can use another package called yapf. Based on comments inside the code or arguments in the command, we can indicate to the package which parts we want it to format or not.

Continuing with tools for automatic formatting, there is a very curious one for the particular function it has and it is the isort package. This package allows us to alphabetically sort the import statements in the Python code, separating them by sections and by type of imports. Sometimes as we are developing, we are adding imports to the code, and as the code grows, the imports appear disordered. In these cases, it is good to have a tool that orders and organizes them in a neat way without doing so manually file by file.

Cyclomatic Complexity

There are tools that allow us to analyze code and measure what is known as Cyclomatic Complexity. 

Cyclomatic Complexity is a metric that determines the level of stability and confidence in a program. It measures the number of linearly independent paths through a method or function. And if you don’t know, a linearly independent path is determined by the number of forks or decisions we have in our code.

There are several ways to calculate this metric, but one way to measure the cyclomatic complexity is through the formula: (number of ifs or branches) + 1.

Now, what is this metric used for? It is used to try and reduce the number of ifs in each function, because the fewer ifs there are, the easier it is to understand the function. Also, it becomes less risky to modify that code, because if we have a code full of ifs statements, one nested inside the other, fixing a bug or adding a modification can be time-consuming.

Let’s check some tools in Python to measure cyclomatic complexity.

Radon

Radon is a widely used package that we can install as usual via the package manager (pip install radon). 

Radon has four basic commands: 

  • cc command for cyclomatic complexity
  • raw, which is for raw metrics, such as number of code lines, number of commented lines, number of blank lines, etc.
  • hal, which is for Halstead metrics, another set of complexity metrics that are not relevant to this post, but do exist.
  • mi, which is the maintainability index, which indicates between 0 and 100 the level of code maintainability.

 

Let’s focus on the cc command with an example of execution:

code example

This is the Radon command followed by cc to tell it to calculate the cyclomatic complexity. Then, you must indicate the name of the file to be analyzed, followed by the options -na to show all the methods that have complexity grade A and below. Let’s see what the grade is, so -s will show the metric result for each method.

code example

The command shows us first the file name we are analyzing, and then a list with all the methods, classes, and functions that this file has. For each one of them, it shows M if it is a method, F if it is a function, C if it is a class, etc. Then it shows the name, category or grade that it assigned to this function, followed by the metric score.

What is the grade? It is a way to categorize the level of cyclomatic complexity that a class, method, or function has. Radon classifies based on a category that goes from A to F, A being  the best category and F being the worst. For this, there is a table that illustrates how well these categories are. For example, for a score from 1 to 5, the rank or grade is A, which means it is a low complexity or simple code block. For grade B, it goes from 6 points to 10 points, and Radon still considers it as low complexity. Well-structured and stable, already in the category of C from 11 to 20 points of complexity, it considers it moderate or slightly complex. Then, the D is more than moderate, followed by more complex block, high complexity one for the E. Finally, the F (the worst of all) refers to very high complexity, subject to errors, and unstable.

To give you an idea, in the project I’m currently working on at Octobot, we run this command before committing the developments to make sure the complexity level of all the new methods we add is an A, meaning we aim for scores between 1 and 5. This would be like having a maximum of 4 ifs within each method. Sometimes we accept a B grade, but in general, we always try to make it an A.

How does Radon calculate the cyclomatic complexity? 

We have another table to show the calculation in more detail:

Radon analyzes within the method how many decision structures there are or, in other words, how many if, elif, for, while, assert, there are within that method. For each one of them, it adds one to the total score. The table above shows us in which cases Radon adds one and in which cases it does not. For example, if there is an except, it adds one because the except adds a conditional branch. Since  the except is executed only if the condition is met, the except adds one more conditional decision to our code, and therefore adds one more point to the metric. 

There are additional options that can be used within the Radon command and one of them as we saw a while ago, is the -nc option. This option allows you to calculate all methods, functions, and classes having a degree of complexity below C (C, D, E and F). If you otherwise indicate -nb, then it would show you results below B, etc.

The -a option tells Radon to calculate the average complexity at the end of the calculation. For that, it adds up all the scores and divides by the number of methods, classes, and functions in order to throw in the average what the total score was.

Automatic checks through a Makefile

On Unix systems, you can automate the process of format and type validation quite easily using Makefiles. We can create a Makefile in the root directory with some commands configured to run format checks and code conventions.

For example, if we create a Makefile with the following content:

code example

We can run this command in the terminal:

What I would do is run the lint command first, which would do a format check using the Pylint package, and after that I’d run the type hint command, that would get the Mypy package started for type consistency checking. Here I added a flag that would ignore import errors.

We could also run the command make black:

What it would do as we define it in the makefile is an automatic code formatting by using the black package. If we look well in the makefile, we define an option -l, which sets a maximum number of code lines to allow at the time of the formatting. For example, it could be a maximum of 79 characters per line of code. And this will be applied to all files in the directory with extension .py.

Or, for example, we could run the command make typehint, to do only type checking.

code example

It is also worth noting that many of these things can be integrated into code editors so that we don’t have to define makefiles and run them on the command line. But it is good to know that a makefile is an option because this could be implemented in a CI/CD pipeline when pushing a commit to a repository.

What about Single Dispatch?

According to the document PEP-443, @singledispatch is a decorator implemented inside the functools module that allows us to define generic functions. A generic function is a function that implements the same operation for different types. 

For example, I could implement a function that compares two variables and, depending on their type, tells me which of the two is greater than the other. Now, the definition of “greater than” depends on the variables types involved. It is simple to compare two integers, but if both variables are of dictionary type, for example, we’d need a different implementation of the “greater than” operation for that particular data type. 

Suppose we have the following implementation of a method that lets us know whether one variable is greater than or equal to the other:

				
					def greater_than_or_equal(arg1: Any, arg2: Any) -> bool:
   """
       'Greater than or equal to' implementation.
       Args:
           arg1: Any
           arg2: Any
       Returns:
           bool - True if arg1 is greater than or equal to arg2.
       Raises:
           TypeError - Type of arguments do not match.
   """
   if isinstance(arg1, int) and isinstance(arg2, int):
       return arg1 >= arg2
   elif isinstance(arg1, dict) and isinstance(arg2, dict):
       return arg1["data"] >= arg2["data"]
   else:
       raise TypeError(f"Arguments cannot be compared because types do not match. \n")
				
			

Single dispatch then allows you to implement two versions of the same “greater than” operation: one for when the variables are integers and a different function for when both variables are dictionaries. This has the advantage of simplifying the complexity level of the code. Instead of having one big function with many if-elif statements for each of the variable types, we define n separate functions, each one implementing the particular case of that data type. That way we get rid of having to add if statements to ask what type the variable is before implementing the operation. The function is the same and is invoked only once, but when it is invoked, the implementation changes depending on the type of its arguments.

				
					@singledispatch
def greater_than_or_equal(arg1: Any, arg2: Any) -> None:
   """
       Raises a TypeError because types of arguments do not match.

       Args:
           arg1: Any
           arg2: Any

       Raises:
           TypeError - Type of arguments do not match.
   """
   raise TypeError(f"Arguments cannot be compared because types do not match. \n")
				
			
				
					@greater_than_or_equal.register(int)
def _(arg1: int, arg2: int) -> bool:
   """
   'Greater than or equal to' implementation for integers.
   """
   return arg1 >= arg2
				
			
				
					@greater_than_or_equal.register(dict)
def _(arg1: dict, arg2: dict) -> bool:
   """
   'Greater than or equal to' implementation for dictionaries.
   """
   return arg1["data"] >= arg2["data"]
				
			

This separation has the advantage of reducing the number of code lines of the methods, subdividing a method of many lines of code in 2 or more methods of fewer lines of code and fewer if statements.

Another important note: by reducing the number of if statements within each method, it also reduces the cyclomatic complexity degree of the functions, so it can be a useful tool to simplify code when we have many decisions that depend on the arguments type. Also, by reducing the code complexity, it becomes easier to read and understand, which is one of the fundamental pillars of clean code.

Closing Thoughts

We have seen throughout this article and the previous one several tools that will help  to validate that our code is consistent and follows the standards (packages such as pylint, black, and yapf); we have also reminded the importance of documenting the code – and the difference between documenting and commenting – as well as the advantages and disadvantages of both approaches. The use of tools saves us time correcting details in the code that are repetitive.

We have also learned how interesting it is to use the Cyclomatic Complexity metric to reduce the quantity of independent decisions in each method, which will lead us to write more simpler and maintainable code.

However, it is not enough with the use of automation tools for PEP-8 compliance and type-consistency to improve quality of code. It is also important to make time for code reviews, because the code we write is to be read by ourselves, by humans, so we need to conduct those reviews in order to detect some legibility issues and bad coding practices that automatic tools cannot detect.

 

Final Quotes

“Don’t document bad code – rewrite it.”

Brian Kernighan

“If I had more time, I would have written a shorter letter.”

Blaise Pascal

“Code is like humor. When you have to explain it, it’s wrong.”

Cory House

See related posts