BDD for C++ Projects – With Python and Behave


For my current project Fix, I use Python and Behave for Behavior Driven Design (BDD). Today I’ll describe my process and the tools I use.

In my initial post about Fix, I already wrote a paragraph or two about BDD: I use it mainly to have the I/O layers under test, which I can not achieve with unit tests.

Suffice it to say that the acceptance tests I wrote this way already paid off. They threw an error I committed due to sloppy design right back in my face.

The tools

BDD usually is tightly connected to Cucumber, which is what I wanted to use for this project. I came in touch with Cucumber first about a year ago when I joined my current project. While I am not doing any C++ in that project, I learned about a lot of other technologies, many of which I use with Fix today.

There seem to be not too many Cucumber implementations for C++ available. In addition, the implementation of the test driver does not need to be extremely performant, neither is it very complex. So I exchanged two of the main advantages of C++ for faster development by using Python instead (I learned that for the current project, too).

In addition to having Behave as an easy to use Cucumber framework, python also has requests. This is a package that makes it extremely easy to send HTTP queries to a service like Fix.

I have to admit, Fix is not yet truly a REST service, but it strives to be. And yes, I first came in contact with REST services in my current project. Oh, and of course, it is easy to deal with JSON (guess what…) in Python – but that is also the case in C++. On the server side, I use the great “JSON for modern C++” library by Niels Lohmann.

The process

Now I’ll walk you through the setup currently in action at Fix, and a simple example how I use Cucumber and Behave for development. You can find the Python code and the Cucumber feature files here on GitHub.

The acceptance tests written in BDD fashion test the observable behavior of a system. In the case of Fix, that means they mainly test the responses sent back from the server, and in some cases, the files used as storage.

The basic setup is as follows: Every scenario, i.e. every Given-When-Then sequence, creates an empty temporary directory and starts Fix in it. Depending on the scenario the test driver then adds directories and stored issues so Fix has something to work with. At the end of each scenario, the driver stops Fix again and cleans up the directory, unless a test has failed in which case the files are left intact for debugging purposes. Information like where the temporary directory resides is stored in a context object that is created for each scenario by Behave.

Define a new scenario

Let’s go quickly through the process of adding some functionality. BDD is like TDD when it comes to writing tests first. In this case, the new functionality is showing the details of a single issue. This will be achieved by querying <Fix-server>/issue/{id}, where {id} has to be the ID of the issue. If the issue exists, it should return a JSON object containing the issue details. If not, it should give us a HTTP 404 status.

Let’s write this down quickly in Cucumber:

Feature: show details of a single issue

  Scenario: Empty repository
    Given an empty Fix repository
    When we query the issue with ID 42
    Then the response has http code 404

  Scenario: Issue does not exist
    Given a Fix repository with issues
      | ID | summary       | description |
      | 1  | First issue   | Issue number one.  |
      | 7  | A Later issue | There will be more |
    When we query the issue with ID 4
    Then the response has http code 404

  Scenario: Issue exists
    Given a Fix repository with issues
      | ID | summary       | description |
      | 1  | First issue   | Issue number one.  |
      | 7  | A Later issue | There will be more |
    When we query the issue with ID 7
    Then the response has http code 200
    And the response is an object
      | ID | summary       | description |
      | 7  | A Later issue | There will be more |

We see three scenarios, each containing a sequence of Given-When-Then steps. Each table belongs to the step in the line before, and the line starting with And in the last scenario is simply a second Then step.

Write the step definitions

At first, running this will cause Behave to complain about a number of unknown steps, so we need to write step definitions for them. Since it denotes the new functionality we are about to implement, the lines When we query the issue... should be among those unknown steps. Here’s how this particular step is implemented:

@when('we query the issue with ID {issue_id:d}')
def step_impl(context, issue_id):
    context.rest_response = requests.get('http://localhost:8080/issue/' + str(issue_id))

It is pretty obvious that this calls the GET method on the endpoint as described above. The result is stored in Behave’s context object so it can be evaluated in the When steps. In case a step contains a table, it is stored in the context object, so it can be evaluated in the step definition. You can see examples of the use in the Fix GitHub repository.

Implement the functionality

Having implemented the step definitions, the scenarios will fail. It is now time to switch to TDD for the implementation. Often, the unit tests used for TDD will cover behavior similar to that described in the scenarios. This is OK because they describe it with a finer granularity and in a much more technical way.

In the case of Fix, TDD will bring us the logic needed inside the server, but not everything we need to make the Cucumber tests pass. The I/O layers are not under unit tests, so additional functionality may be needed to pass the acceptance tests. In the case of querying a single issue, this will probably be reading the contents of a single file – that’s not too complicated.


Behave was very easy to set up, and Python is a flexible but powerful language that enables us to write the necessary step definitions quickly and effortless. That way, we can focus on the functionality itself while still having the behavior covered by our feature files.

Previous Post
Next Post


  1. Maybe a bit late but maybe better late than never. There is a C++ Cucumber interpreter available on GitHub

    I started this as a side project but it turned out to me pretty useful and does it’s job.

    If you want check it out.

    Best Thomas


  2. Thanks for a really interesting and useful post. I’m looking to test a similar project (a C++ web service) and Behave looks like it could be a really useful tool for this.

    Can you explain a bit more about how you see the differences in application between Behave and Catch, or more specifically, why you didn’t use Catch for all of your tests? Catch allows for a BDD-style test specification (which I can see you actually use in your project), and indeed some of the Catch tests appear to test the same thing as the Behave tests (e.g. the REST API tests).

    Put simply, I can appreciate the difference between testing an external web API and testing internal functional “units”, but I don’t really understand how this is applied in practice in your project. This isn’t a criticism, I just curious about how you decided to use which tool and when…


    1. Hi, thanks for your question – and your interest!

      The decision about using both Catch and Behave is partly due to my interest in getting to know both tools. The other part is indeed a distinction between unit tests and integration/system/acceptance tests (the definition of those seems to vary and is a bit blurry).

      I use Catch only for unit tests, and I currently do not use its BDD style features. The test cases using Catch (here) currently only use the TEST_CASE/SECTION style. They also directly use the classes under test, i.e. no HTTP traffic etc. involved, and they use only those classes. For example, the storage (persistence layer) is mocked out, so there is no I/O at all, which is one of the requirements for actually having a real unit test.

      The Behave tests, on the other hand, use the full application (as far as we can talk about a full application here), including a file based storage and the HTTP server. They talk to the application via HTTP, which is much easier to set up with Python’s requests library. I could use C++ with Catch for that part, too, but it would involve more boilerplate, and I’d have to use Poco for the test client as well. And, testing the integration of the server business logic with the Poco HTTP server using the Poco HTTP client as the test client felt slightly wrong.

      The fact that some Catch tests appear to test the same thing as some of the Behave tests is mostly due to the fact that I currently have only a single class (RestAPI) that implements the logic of the REST calls, so that the Poco server only is an adapter that passes any requests through to that class. In addition, that is the usual connection between unit tests and acceptance tests: Acceptance tests describe use cases, and some units have to implement that functionality, leading to overlaps between acceptance and unit tests. In cases like this, where the use case is very small, a single unit can implement the functionality, and the only difference between unit test and acceptance test is the added infrastructure layers involved in the acceptance test (in this case, HTTP server and file storage). On the other hand, every functionality covered by unit tests will be touched by at least one acceptance test, if the application is developed using BDD/TDD: Functionality is only implemented if a unit test requires it. Unit tests are only written if the functionality is needed for an acceptance test to pass.


  3. Hi
    It is not quite right, that there is no “Cucumber” implementation for C++. There exists Cucumber-cpp ( which allows to write step definitions in Cpp. This is possible because Cucumber (the Ruby tool) support the wire protocol ( which allows to implement a “wireserver” in any language. That is exactly what cucumber-cpp does. You use cucumber-cpp to implement the step definitions in cpp and the result after the compilation is a wireserver. To run the tests you put a cucumber.wire file in the step definitions folder containing the ip and port of the wireserver (usualy localhost & port 3902) and start the wireserver. When now cucumber (the ruby implementation) is started as usual it forwards the steps to wireserver.
    The advantages is, that you can use all features of cucumber (e.g. the html outputter), the disadvantage is, that you have two tools you need to run.


    1. Hi, thanks for tuning in. I intentionally wrote, “There seem to be not too many Cucumber implementations for C++”. In retrospect that was nonsensical because I’d need only one, right? 😉 I was aware of cucumber-cpp, and there are one or two more. I have to confess, I find the whole process of setting up a “wireserver”, connecting to it etc. sounds a bit cumbersome.
      Be that as it may, there were other reasons why I decided to go with Python and Behave: It’s much quicker to set up and write the step defs in, and I simply wanted to play a bit with Python.


      1. I’m sorry. I miss interpreted your statement and answer to
        Aurelien. You are of course right, a single implementation is indeed ‘not many’. Just out of curiosity what are the other frameworks you are aware of ?
        Cucumber-cpp worked quite good for me, but I also use behave.
        IMO it depends where/what the user-interface is. For an application/service/library which is ‘controlled’ through a network connection, serial port , commandline…., (like your Fix) it is usually faster and simpler to use a framework in a language like ruby (cucumber) or python (behave). However if the application has a GUI I prefer to use the language the application was written in, in order to directly access the API of the Bussiness logic. But I do not recommend to write step definitions which operate on the GUI elements, since this may result in brittle tests and a huge maintenance effort.


        1. I agree. It depends on the interface the tests are targeting. If it is a C++ interface, e.g. when you develop a library, then the language of that interface is the natural language you’d want to use for your step definitions. Back in the day when I learned about Cucumber I had searched for C++ Cucumber frameworks and found one or two besides cucumber-cpp, but a quick search today did not turn up anything else.
          For GUI testing there are tools that can be used for this task, but as far as I can remember they have their own DSL to drive the tests or have dedicated test drivers written in a language that probably won’t be C++. I guess if the application does not have too many GUI elements, it could be worth to write an access layer that lies directly below the GUI itself and drive test through that. Separating the GUI layer into model-view-control or model-view-presenter or similar patterns usually is a good idea anyway.


  4. Oh… I am in the process of evaluating BDD in C++, and I was about to try with cucumber. So if I understand correctly, there’s no good C++ library to deal with gherkins style files? And it’s not clear to me what does exactly do Cucumber itself?

    Clearly testing the C++ code with python scripts is of great help. I am doing it with my gRPC (micro)servers, but I didn’t think about doing it BDD style. Thanks for the idea!

    However, not everything can be tested that way, and I would like a way for a C++ module to prove it is behaving correctly, via C++ code. The Catch testing lib has a BDD style interface, but it’s not designed to match an input gherkin file. Any suggestion?


    1. Hi Aurelien,
      I would not say that there is no C++ library for Gherkin files at all. There are a few implementations, with varying quality and ease of use, according to the feedback I have read. I guess in this case there is no “Cucumber itself”. If I understand the terminology right, Gherkin is the name of the Given-When-Then language, and Cucumber is a (set of) software tool to automate tests in that language via step definition functions. As far as I know, the original Cucumber is a Ruby implementation, but there at least also is Cucumber for Java. People are sloppy and tend to use the Term “Cucumber” for any such tool that works with Gherkin, in this case, the name of the tool is Behave.

      I would say since BDD is focused on testing behavior from the outside, it is not meant to test everything. Testing single modules is, as I understand it, not in the scope of BDD, and a classic TDD approach fits better. That’s what I do in Fix: I test the general behavior of the whole thing BDD style with Behave in Python, the single module(s) TDD style with Catch in C++.


      1. Thanks for the details.

        I agree with the scope of BDD, but when it comes to testing a regular desktop UI, it’s tricky sometimes to validate what is on screen from the outside.

        So I was looking for a way from C++ to generate an output file describing what has been done in a way that facilitates the validation with an input Gherkins spec. Something like C++ specs (

        But you make me realize that it’s probably better to expose a kind of “automation” API so that a python script can pilot the application and do the BDD validation. I think it’s smarter because such an API is not restricted to BDD testing and could probably be reused for something else. As I’m already using gRPC, I think I’ll expose another service for that purpose.



Leave a Reply

Your email address will not be published. Required fields are marked *