Clean Code, Technical Debt, and Documentation in Python

Best practices for writing clean, well-structured Python code

Software developer working.

In this article, my goal is to review the best practices for writing clean, orderly, well-structured Python code that respects the principles of readability, understandability, and respect for standards. Many of my recommendations and references come from the book “Clean Code in Python” by Mariano Anaya.

It is likely that many of the tools and principles presented are already familiar, while other concepts may seem new. In any case, it’s good to remember these tips from time to time in order to continue day to day application.

Defining Clean Code and Technical Debt

Let’s get started with a question that I’ve asked many times when conducting technical interviews in order to evaluate candidates:

“What things do you do in your daily routine as a developer to improve the quality of your work? Or, in other words, if I gave you all the time in the world to improve your code and make it 100% high quality, what things would you do?”

Throughout the many interviews I’ve led, I’ve received varied answers. Some were very interesting, while others were insufficient or unacceptable. Here are a few examples:

I make sure the code works. I look at how it’s written, if it’s easy to read, easy to reproduce, and if it’s understood by someone else.”

“Efficiency in terms of runtime, and how understandable the code is so that tomorrow someone can use it.”

“If it solves problems, if it’s performant, clean, self-describing, mnemonic, etc. And I add comments if necessary. I also observe if it complies with the test suite. I don’t upload ‘broken stuff’, and always wait for QA feedback.

“I make sure to come up with clean code and keep it documented and organized. It is important to evaluate architecture from time to time, how we are writing the code, make schemes or mental maps on paper to have the components ordered.”

“I also like to think about how to generalize the code in order to make it more adaptable in the future. Check if the methods’ names are descriptive, if there is documentation, if the parameters received by the methods are appropriate, and promote the use of Type Hints.”

 

What is clean code?

Among all the answers I received, a concept that is mentioned several times because it should improve software quality is “write clean code.” But what is clean code?

It’s quite a broad concept, but if I had to adopt the best of the answers I received to this initial question, I’d use the following definition:

Clean code is code written in such a way that it is well-organized, standardized in its format, and understandable without major difficulties by other developers when they read it.

 

Why is it important to have clean code?

There are many reasons why clean code is important and most of them refer to concepts such as improving maintainability, reducing technical debt, working effectively in the context of agile development, and managing a project successfully.

There are several advantages in writing clean code:

  • Clean code is necessary for any developer, but it is particularly useful for juniors, because if they get accustomed to it right from the beginning, they will integrate good coding practices easily and the learning curve for them will be smoother.
  • Writing clean code can be more cumbersome at first, but after you get used to it, it takes less time because it becomes a habit.
  • Automated tools help code review comments focus on better ways to solve a problem than just correcting code formatting.


The other idea worth exploring about the importance of clean code is linked to agile development and continuous delivery. If we want our project to consistently deliver features at a steady and predictable pace, having a good and maintainable code base is a must.

Imagine you are driving a car on a highway, heading to a city you want to visit. You have to estimate your arrival time, so you can tell the person who is waiting for you. If the car is in good shape and the road is flat and smooth, there would be no reason to make a mistake in estimating arrival time. However, if the road is in poor condition, if there are rocks in the way, or if you have to stop to check the car every few miles because it’s not functioning well, it would be  difficult to estimate with any degree of confidence when you are going to arrive at your destination. 

In this example, the road is our code. That is, if we want to move at a steady and predictable pace, our code needs to be maintainable and readable. If it does not meet those conditions, then every time we need to implement a new feature, we will have to stop to refactor and correct our technical debt.

 

Technical debt

When we say technical debt, we refer to the problems that arise in the software as a result of a bad decision when implementing a solution. 

We can think of technical debt in two different ways: 

1) Looking back to consider the present: What if the problems we face today are the result of poorly written code in the past?

2) Thinking about the present and how it will affect us in the future: If we decide to take a shortcut today for solving a problem, instead of investing the appropriate time in implementing the best solution, what problems are we creating for ourselves down the road?

Technical debt becomes a “debt” because the code will be more difficult to change in the future as opposed to changing it now. That cost incurred is the interest on our debt. In other words, the technical debt we are creating today will be increasingly difficult to fix in the future, and therefore increasingly costly. Every time the team cannot deliver something on time and has to stop, fix, and refactor code, it is paying the price of technical debt.

 

Why is it important to avoid technical debt?

The worst thing about technical debt is that it represents a long-term problem, as it’s not something that triggers an alarm in the short term. Instead, technical debt is presented as a silent issue, occurring in different parts of the same project, and will become a critical problem  when least expected. On that day, technical debt will be a high priority, and it will be much more expensive and complex to solve than previously.

Our desire as developers is to prevent these things from happening, perhaps by implementing tools that allow us to automate the checks on our code and detect these kinds of problems early. Not all problems can be detected automatically, but if we implement a good code review before merging, with good automated tests, we will be able to detect many potential problems in our code that otherwise would become part of that latent technical debt.

Just as it is very important to have our code as clean as possible, it is also worth mentioning that there are certain situations that are exceptions to this principle. That is, there are times when it would not be applicable to have clean code. For example, during a hackathon, which is usually a competition where you have little time to develop a particular software prototype, it does not make much sense to invest time in writing the best possible code. Another exception may be when you are writing a script that performs a specific, single-use task.

In these and other exceptions, common sense applies since it is code that will probably not change much in the future, so it does not make sense to spend additional time improving the code quality.

A few guidelines for code formatting

Regarding code format, there are guidelines defined by the PEP-8 standard, which is a guide on how code should be written and formatted. However, in order to have clean code, it is not enough to have a standardized code in terms of PEP-8 guidelines. It’s also necessary to create robust code that is maintainable in the face of future changes. 

It should also be noted that formatting the code correctly is important to be able to work efficiently.

 

Why should you pay attention to code formatting?

Formatting the code has several advantages; here are some of them:

Searchability: It refers to the ability to identify keywords or tokens in the code quickly. 

Consistency: If the code has a uniform format, reading it becomes easier. This is useful for onboarding, for example, because when new developers join the team, the process of understanding how the code works becomes much easier if its format is standardized.

Better error handling: One of the suggestions in the PEP-8 format is to try to minimize the number of lines of code inside a try/except block. This reduces the probability of hiding an exception and masking a bug.

Code quality: Looking at the code in a well-structured way improves developers’ level of understanding when reading it, which leads to detecting potential bugs more easily. If we add to this the use of tools that allow automated code quality checks, we will further reduce the rate of bugs per line of code.

First tip: document well

Good code is self-explanatory code, but it must also be well-documented. And here we must make a clarification: documenting is not the same as adding comments. There are also other tools and practices to document. For example, in the case of Python there are docstrings and annotations.

Python is a dynamically typed language, which means that the same variable can take values of different types. Given these characteristics, it is very useful to document the code, because it makes our life easier when reading it. It’s easy to get lost when it comes to understanding what type a function returns or what type a variable is if we do not have it properly documented.

As for annotations or type hints, they are also very helpful when running type checks, such as mypy or pytype packages. These allow us to verify that the variables have the right types and we will see later on in the article that it helps a lot to perform this kind of quality checks.

 

Regarding comments

As a general rule, try to add as few comments as possible. This is because our code should be as self-explanatory as possible in the first place. If we make an effort to use proper abstractions (divide responsibilities in the code through intuitively named functions and instances of classes), and if we name things in a clear way, then comments are not necessary.

Before writing a comment, think first if it is possible to express the same comment using only code by adding a new function whose name explains what the code does, or by simply better renaming the variables. 

Of course, it is not always possible to do this, since sometimes the code performs a very complex operation and is impossible to be self-explanatory. In this case, it is advisable to add comments while still keeping them as concise as possible, explaining why we are adding the code, what problem it solves, or even a working example inside the comment block.

However, there is one kind of comment that is like a cancer for developers, which is leaving commented code. Commented code brings chaos and sloppiness  and can make things confusing to read. Also, having version management systems like Git makes commented code irrelevant. We don’t have to comment and uncomment to get back to another version when Git already solves that issue for us.

In short, comments are the devil. If there is no other choice, add them, but do not abuse these resources in situations that would be solved with more explanatory code.

 

What are docstrings: advantages and disadvantages

Docstrings are sections of documentation that are embedded inside our source code. A docstring is a text, located somewhere in the code, that serves to document part of that logic. It provides us with documentation for a particular component of our code such as a module, a class, a method or a function, and helps other developers to understand our code.

The idea is that when other people need to use the component that was implemented, they can simply look at the docstring and understand what that component’s function is and when it can be used. For this reason it is recommended, in each class, method, function or module to be created, to have a docstring defined that thoroughly explains this.

Docstrings also serve to document design and architecture decisions. Documenting the expected inputs and outputs of a function is a good practice that will help readers understand how it is supposed to be used.

Another advantage of docstrings, and in particular in the Python language, is that they are not something separate from the code, but are an attribute accessible via the doc name. For example if we have a code like the following:

				
					def my_function():
   """ Run some computation. """
   return None


my_function.__doc__
				
			

In this case, if in the Python interpreter we invoke my_function.doc enclosed in underscores, or the help function passing as a parameter to the name of the function, we can access the contents of our documentation.

This makes it possible to have automatic tools to extract documentation from the source code. Tools like the Sphinx package, through an extension called autodoc, allows us to take all the docstrings that are defined within a Python code and convert them into pages that show the functions’ documentation.

Once you have the tools to build the documentation, it is possible to make them public so that it becomes part of the project. For open source projects, you can use the readthedocs website which generates the documentation automatically by branch or by version.

But not everything is rosy with docstrings; the disadvantage in all this is that it requires constant manual maintenance. Every time we change the type of a variable, or its name, or some detail that modifies the behavior of the function, we have to manually edit the docstring to accommodate the new version of our code.

Another problem is that for docstrings to be really useful, detailed documentation is needed, which means it requires adding many lines of code to explain the function.

Taking into account these disadvantages, if the function you are writing is too simple and its operation is understood at a glance, then it is better to avoid adding a docstring since it would be redundant and require maintenance later on.

 

Annotations

PEP-3107 introduces the concept of annotations in Python. The basic goal of annotations is to help code readers understand what is expected as values for function arguments. Annotations allow the use of what are known as type hints, which we will see later.

They also allow you to specify the expected type of some variables that have been defined. It is any kind of metadata that helps us to have a better idea about what a given variable represents.

For example, in the following code:

				
					from dataclasses import dataclass

@dataclass
class Point:
   lat: float
   long: float

def locate(latitude: float, longitude: float) -> Point:
   """
       Find an object in the map by its coordinates.
   """

				
			

This example shows us that we can define the expected type of the variable’s latitude and longitude, both of which are of type float. We clarify that this is merely informative, since Python will not check that the types at runtime are those indicated in the annotations; it only serves as information for the reader.

We can also specify the type of data that the function will return. In this case, it will always return a variable of type Point, which is a class with two attributes, lat and long, both of type float. In addition, we can create annotations to make our code more expressive. For example, in this code where there is a variable that represents a wait:

				
					def launch_task(delay_in_seconds) -> None:
				
			

We can improve it a bit by adding a typehint to the input variable:

				
					Seconds = float
def launch_task(delay: Seconds) -> None:
				
			

This example is more understandable, because it first defines an annotation called “Seconds”, of type float, and assigns this annotation to the input variable that was renamed “delay” instead of delay_in_seconds. This makes it easier to understand which variable this function accepts and what it is used for. By defining an annotation with a given name like Seconds, this same annotation can be used anywhere else in the code where there is a function that uses a time variable expressed in seconds.

Also, just like docstrings, annotations are themselves another attribute for Python functions. That is, if in the interpreter we write the attribute annotations between underscores, the output of the command shows us the different type annotations that we have defined for the function:

				
					locate.__annotations__
{'latitude': <class 'float'>, 'longitude': <class 'float'>, 'return': <class '__main__.Point'>}
				
			

Just as the docstrings attribute allows us to have automatic documentation processors, this annotations attribute gives us the possibility to access tools that are in charge of running validations and function type checks on our code, if we consider it necessary for our project.

 

Do annotations replace docstrings?

This is a valid question, because in old versions of Python, when annotations did not exist, the only way to document was to put docstrings inside each method or function. Many of these issues were improved with the introduction of annotations, so one might ask whether it is worth having docstrings as well as annotations in the code.

The answer is that annotations do not entirely replace docstrings, rather, the two can be complementary if used well. For example, with docstrings you can add better documentation that does not include the types that a function receives and returns, but rather concentrates on showing examples of how the data types are used within a function. That way, the reader has a better idea of what the function itself does.

				
					def data_from_response(response: dict) -> dict:
   if response["status"] != 200:
       raise ValueError
   return {"data": response["payload"]}
				
			

In this code example we have a function that receives a dictionary and returns another dictionary. It can even throw an exception of type ValueError if the value under the key “status” is not the expected one (which is the status code 200). However, we don’t have much more information about what the internal structure of the response variable looks like. We only know that it is a dictionary and that it has two keys: status and payload. But, what does any instance of the response variable look like? How is its structure? We do not have answers for those questions.

				
					def data_from_response(response: dict) -> dict:
   """
   If the response is OK, return its payload.

   - response: a dict like::
   {
       "status": 200,  # <int>
       "timestamp": "..."  # ISO format string of current datetime
       "payload": "..."  # dict with returned data
   }
   - returns a dictionary like::
   - {"data": { ... }}
   - Raises:
   - ValueError if HTTP status is != 200.
   """
				
			

Instead, if we add a docstring like this, it gives us a better idea about the dictionary form that is received and the dictionary that is returned. This is valuable documentation to understand, for example, in how to build test cases that test this function. That’s because knowing the format of those dictionaries lets us find out which will be correct or incorrect values for use in the test cases.

Of course, take time to document this, because you have to think of examples and write them in the docstring. But it surely improves the comprehensibility from the reading point of view.

In the next article I will dig into the different Python tools that we can use with the objective of running checks in our source code, so as to detect formatting and typing errors automatically. This will allow us to devote more time to other quality tasks such as the code reviews.

See related posts