Writing Specification by Example Requirements with Gherkin

Tips for making the most out of the SBE methodology

Gherkin is a descriptive language used to write requirements under Specification by Example (SBE). This article will discuss how to write user stories with Gherkin and the best practices for a successful project. If you are interested in better understanding what SBE is all about, I recommend our first post on the subject here.

Getting to know Gherkin

Gherkin belongs to the Domain Specific Language (DSL) family. The advantage that a DSL language offers when writing system requirements is that it is a language that can be understood by all the roles that are part of the software development process.

The product owner, the developer, and the QA can understand a text written in Gherkin language, since it is a combination of English words and is free of all the typical symbology of a traditional programming language. This shared understanding makes it easier for all roles to collaboratively write a system’s requirements.

The basic syntax of Gherkin

The basic syntax of a file written in Gherkin language has the following structure:

					Feature: Title of the feature or requirement.

Business Definitions: Basic definitions necessary to understand the requirement (concepts and/or terms agreed upon by the different team roles).

Assumptions: What must be assumed to be able to execute the scenarios described here.

Background: Preloads the initial data for all system operation scenarios.
	Given ...
	And ...
	And ...

Scenario: Description of scenario 1 to be tested.
	Given ...
	When ...
	Then ...

Scenario Outline: Description of scenario 2 to test.
	Given ...
	When ...
	Then ...

  | Parameter_1 | ... | Parameter_N | ...
	| Data_1 | ... | Data_N | ... |

Scenario: Description of scenario 3 to be tested.
	Given ...
	When ...
	Then ...


Using Gherkin: Given, When, and Then


The Given keyword allows us to put the system in a known state before the user or other external agent starts interacting with it. In other words, the Given command allows us to set up the preconditions necessary to execute the requirement.

What is it useful for?

1) Initializing data, creating instances, and setting the initial state of the database.

2) Logging a user.


					Given a logged in user

	Given a user with these parameters
	| Name | Age |
	| Anthony | 18 |



The word When is used to describe the key action that the user performs on the system, which initiates the test. It represents the user’s interaction with the system, and this should cause a change, which is what you want to test.

When to use When:

1) Describing interaction with a web page (Requests/Twill/Selenium).

2) Interacting with another user interface element.

3) Invoking an API, or an external web service.

4) When testing a library, When can be used to describe the call to a function of that library.


					When we do the search

	When we set the user description to "description 1".



With the Then command we can observe the results of the test performed. At this point, we must indicate  the expected result that the system should return to after the action we performed through the When command. And it is related to the business value mentioned in the feature description.

When to use Then:

1) Check that something mentioned in the Given and/or the When is (or is not) within the expected output.

2) Check that some external system has received the expected message.


					Then the user "username1" state will be Logged out

	Then these user parameters will have these values
	| Name | State |
	| username1 | Logged Out | 

Using Gherkin: And and But

If you need to write more than one Given, When, and/or Then statement within the same test scenario, it is possible to do so through the And word:

					Scenario: A scenario to test multiple Givens/Thens.
  Given one thing
	Given another thing
	Given yet another thing
	When I open my eyes
	Then I see something
	Then I don't see something else

An equivalent way to write the above scenario in a simpler and more semantic form is the following:

					Scenario: A scenario to test multiple Givens/Thens.
  Given one thing
	And another thing
	And yet another thing
	When I open my eyes
	Then I see something
	But I don't see something else

And and But have the same effect; they both function as connectors that replace the repetition of Givens, Whens, and Then. The difference is that the word But is used to describe a negative action or result, making the code more understandable to those who will read it.

The Scenario and Scenario Outline

These words announce the beginning of a test scenario. It is like an individual testing scenario that is run within the feature.

The difference between the two words is as follows:

1) In the case of Scenario, only one example is tested.

2) In the case of Scenario Outline, two or more examples are tested within the same scenario. The examples are described through an example table (or data table).

Scenario example:

					Scenario: The employee's weekly pay must multiply the extra hours (more than 40) by a factor of 1.5.
   Given an employee with these parameters
   | Name      | Hours Worked   | Wage   |
   | Anthony  | 41                      | 3500     |
   When we calculate the employee weekly pay
   Then the employee weekly pay will be 145250


Scenario Outline Example:

					Scenario Outline: The employee's weekly pay must multiply the extra hours (more than 40) by a factor of 1.5.
   Given an employee with these parameters
   | Name    | Hours Worked   | Wage   |
   | <Name>  | <Hours_Worked> | <Wage> |
   When we calculate the employee weekly pay
   Then the employee weekly pay will be <Weekly_Pay>

   | Name      | Hours_Worked | Wage | Weekly_Pay  |
   | Anthony  | 41                     | 3500   | 145250          |
   | Alice       | 50                      | 3500  | 192500          |
   | Sarah     | 60                      | 3500  | 245000          |

In the previous example you can see how it is also possible to define variables. This is done by defining a variable name and enclosing it between major-minor characters or angle brackets, as in the example of <Name>, <Hours_Worked>, or <Wage>. 

This variable is used to loop inside the example table and this makes, in the example we are seeing, 3 iterations inside the same scenario, one for each of the values mentioned inside the example table. That is, the Given, When and Then are executed 3 times in order to run a single scenario with each of the values shown.

How to write good (or not bad) Gherkin code

We have already discussed the most important keywords in the Gherkin language, their definitions, and in which contexts these terms are usually used.

Now we are going to mention some of the good practices (or bad practices to avoid) that allow us to write understandable and reusable Gherkin code. In this way, we avoid making mistakes that are very common and usually happen  the first time we are faced with writing this kind of descriptive code.

1. Avoid using Gherkin to write scripts.

Because Gherkin is used to write requirements, you should avoid using it to write scripts.. And what is the difference between the two? A script is a sequence of actions to be performed (they may or may not be related to each other), while a requirement has a value or a business purpose.

					Scenario: Description ...
	Given we are on the login screen
	When we enter a valid username
	And we enter a valid password
	Then we will be on the welcome screen

In the above example, we have a sequence of actions: first, enter username and then enter password. This is what should be avoided when writing the code, since from a business logic point of view, what matters is not the sequence in which the data is entered, but that the user enters the correct credentials to be able to log in correctly. 

So, taking this into account, it would be better to change the previous code for this one:

					Scenario: Description ...
	Given we are on the login screen
	When **we enter valid login credentials**
	Then we will be on the welcome screen

In this way, we simplify code, because we convert two When statements into one, and at the same time, we change a sequence of actions for a higher level concept more associated with a business rule.

2. Not mentioning specific interfaces in the code

Many times we must implement an API, a web user interface, and/or eventually a mobile application for our system. In this situation, we would have three possible interfaces to test on, but in our Gherkin code we should avoid mentioning them. Also, we should avoid using terms that are specific to each of the interfaces. 

Why? So as not to tie our requirements to a single interface and allow the same requirement to be tested against all available interfaces. Also, by doing this we abstract from how the system is made in the background, focusing more on what is to be tested, than on whether the scenario is being run against the interface, or the API, or the mobile app. Again: Gherkin talks about requirements, not technical details of the system.

In the above example of the user login, the mistake is it mentions the welcome screen and the login screen. Those are typical terms for web and mobile user interfaces, but these concepts do not exist when testing APIs. Therefore, it is necessary to modify the example  in order to avoid this error. The following would be the correct way:

					Scenario: Description ...
	Given **we are not logged in**
	When we enter valid login credentials
	Then **the welcome message will be returned**

As you can see, we change the concepts of login and welcome screens for terms that are closer to business rules, and not so much implementation details. Instead of mentioning the login screen, we emphasize the condition that our user must not be logged in before starting the test (“Given we are not logged in”).

Also, instead of talking about splash screens, we use the concept “returned message”, since all interfaces themselves return a message as a result. It is not a specific behavior of a particular interface, since a web and mobile app return a message that the user can see, while an API or a web service returns a result message in plain text format. Either way, both messages are returned to the client regardless of the interface against which the system is tested.

3. Do not include implementation details

Implementation details are things that may change from one user to another, or that may change from one environment to another (e.g. Production/Staging/Development). In short, anything that does not relate to the business purpose shouldn’t be mentioned.

4. Do not write more than one When statement in the same scenario, unless it is necessary

The problem of writing more than one When within the same scenario can be solved by writing a single When statement, and indicating a data table that allows us to set more than one parameter at a time within the same statement.

For example:

					Scenario: Description .....
	Given the user is logged in
  When we set the user description to "Description 1"
	And we set the user email to "user@mail.com"
	And we set the user nickname to "user_nick"

We can change the 3 When statements chained through Ands, by the following single When statement through a data table:

					Scenario: Description .....
		Given the user is logged in
		When we set these user parameters
    | Description | Email | Nickname |
    | Description 1 | user@mail.com | user_nick |


The reason for this is to improve the readability of the Gherkin code, and thus condense several When statements into one, reducing the code and allowing reuse of these statements in the future. It is also related to the fact that more than one When in the same scenario can be a sign of writing scripts instead of writing business rules, which is ultimately what matters when writing Gherkin code.

A rule to consider: add an extra When only if it is not possible to include all the information in a single statement.

The same principle applies for Given statements. That is, if we have more than one Given within the same scenario, let’s think if it is possible to turn it into a single Given. This can be achieved either by changing the sentences by a parameter table, or by changing their meaning to join them into a single sentence, as long as it has some meaning from a business point of view.

5. Use the same words to refer to the same concepts

Since Gherkin gives us the flexibility to invent words following Given, When and Then, it is easy to fall into the temptation of using many different words or terms, which can lead to confusion as we add more requirements to our project.

That is why it is recommended to have a dictionary defined by the team, in which it is established as a rule that every time we talk about the same business concept and want to translate it into Gherkin code, we always refer to it with the same words. This avoids the use of synonyms. This consistent practice will lead to the reuse of Gherkin code, since the same term or sentence can be reused in other future requirements, thus reducing the cost of future maintenance of our automation framework.

It also helps  eliminate problems related to ambiguity, since there is only one definition for each concept, making it impossible to interpret it any other way.

Tips for writing better Given, When and Then statements

In the case of Given statements, expressions such as “given an object with these parameters”, or “given a logged in/logged out user with these parameters” are normally used. In other words, statements that allow initializing the necessary information to be able to perform the test afterwards.

When statements, on the other hand, usually describe actions to be performed on the system, and therefore must include the use of a verb in the present tense, for example: “When we do the search”, or “When we modify these parameters”, or “When we modify parameter X with value Y”.

Then statements must describe what the expected result will be after the action has been performed, so these statements are usually expressed in the future verb tense, sometimes combined with the passive voice. For example: “Then these parameters will have these values”, or “Then no result will be returned”, or “Then an error message will be returned”.

In the case of When, the first person plural is usually used, because we are several people writing the requirements together.

A tip that applies to all situations is the typical principle of less is more: the fewer words you use to write the requirements, the better. With Gherkin, each word also must have a clear intention and a reason for being in the code. This is because adding extra words makes the document difficult to read and can lead to confusion and misunderstandings, similar to the problem with using synonyms. Saying the same thing with fewer words is best.

If we need to add more words to explain things that are unclear, we should use the comments inside the code, or the definitions section that appears in the header of a Gherkin file, and that is ignored by compilers because it is treated as if they were explanatory comments.

Typical architecture of an automation framework with Gherkin

Typical architecture of an automation framework with Gherkin

How do we implement our automation framework so that everything works well?

First of all, on the left of the image we have a higher level of abstraction. These are the requirements. We start by writing requirements in a well-structured way, using a descriptive language like Gherkin to be able to describe how our product works.

Then we write an interpreter for the requirements, so we can run our requirements against the implementation of the system. In other words, just like running a Python program, we are running our requirements.

The automation code has to be of excellent quality, as good as the code the developers are creating. That means it has to be clean code, modularized so that each component can be reusable, structured, organized, and well thought out.

Why? Because when you write a requirement in Gherkin, you usually write 4, 5 lines of code (let’s say 2 Givens, 1 When, and 1 Then). Here is an example from my current project: for every statement on the Gherkin side, we have a step definition on the Python side, which is essentially a method that we implement to tell the system what it has to do in order to execute that Gherkin code. Essentially, it’s Python code that parses the Gherkin code.

So, the first time you write a requirement in Gherkin, you might use 4 step definitions. In turn, the step definitions invoke a pair of high-level helper methods, which communicate with some low-level helper method. Finally, the low-level one calls the interface to invoke the system implementation.

The second time a requirement is written, if we did things correctly and modularized the code well, you can reuse things already implemented for the previous requirement. Logically, some of them are going to have to be implemented because we don’t have them, but others are going to be reused because there are step definitions that we already implemented previously.

And maybe the third or fourth time you write a requirement, you will already have almost all the step definitions implemented. You might still need to implement an auxiliary method that does something specific, but most of the work is already done, and this is where we see the best results.

At the beginning we spend a lot of time thinking about the best way to write the code and the best way to solve each particular problem so that our code is reusable and maintainable in the future. However, once that issue is resolved and as we write more Gherkin code to test the system, we see that the additional effort required to add one more requirement is minimal – so minimal that in certain situations we can write new Gherkin files without implementing any Python code.

It seems like magic, but after all that initial effort in architecting the solution, you get to a point where you are writing requirements in natural language, without having to chop a single line of Python code. It’s like opening a chat tab in Whatsapp, and saying to the machine, “test this.” And it answers: “Yes, this works fine”, or “NO, this gives an error.” Believe me, the feeling you get knowing that you can generate new testing scenarios using a human language is amazing!

Some common design patterns that are used


Used when defining step definitions, since these Python methods are invoked every time a Given, When, or Then appears. What we do is overwrite the standard behavior of these methods, to modify and adapt them to our needs.

Factory Method  

Another pattern that is often used when you have more than one interface to test and you want the same Gherkin code to run against more than one interface, or you need to be able to choose at the time of running the test which interface you are going to test against: web, mobile, or an API endpoint. Factory Method allows us to instantiate the correct object depending on the name of the interface we want to test against.

In general, any other design pattern that allows us to reuse existing code or eliminate duplicate code is useful to implement the automation code architecture.

Would you like to keep learning about SBE?

You can get to know more about our experience with Gherkin and Specification by Example in these materials:

See related posts