By Alon Gouldman | June 14, 2022

This is the first installment of Claroty’s Engineering Deep Dive series, in which members of the Claroty Engineering Team will share key engineering insights and best practices in their own words. Our engineers’ expertise and perspective have been honed through ongoing development and innovation of our unparalleled platform for securing the Extended Internet of Things (XIoT) across industrial, healthcare, and commercial environments. For more insights from the Claroty Team, follow our new page on Medium.
–––
In this Engineering Deep Dive, Back-End Software Engineer Alon Gouldman shares four simple ways for fellow engineers to speed up their continuous integration/continuous deployment (CI/CD) pipeline. For those who are unfamiliar, the CI/CD pipeline refers to the full set of processes needed to bring software into production. A streamlined CI/CD pipeline can enable faster, more effective, and more reliable software deployment.
–––
We’ve all been there: You’ve just finished three days of intensive debugging of some critical bug—it was a tough one! You came up with a neat solution, added unit tests, and input your code. 20 minutes later, the CI pipeline fails due to a syntax error that fell through the cracks. You fix the error, try again, and after another costly 20 minutes, the CI pipeline indicates that your change broke a different area of the code.

That’s good—the pipeline exists to save you from these kinds of mistakes. You change your solution and push again. Oops, this syntax error again! Fix, push, wait 20 minutes…

This was how the development flow at Claroty looked a year ago. Our pipelines have protected us so many times from making expensive mistakes. But it slowed down the development process, and reduced the joy of coding at speed.

At Claroty alone, we have more than 7,000 unit tests, along with integration tests and some entire system tests. This gives us amazing coverage, but back then, we had one issue: We ended up with pipelines that took more than 20 minutes on average. We decided to take action, and we were able to reduce the time of our pipelines to less than 10 minutes!

In this article, I’ll share what the Claroty Engineering Team did to speed up our CI/CD pipeline, so that you too can make your pipeline faster. Let’s jump right in…

Three things to keep in mind before you start

1. Only optimize bottlenecks

Any optimization that doesn’t address a bottleneck will not reduce the total run time. Your time is valuable—don’t waste it on optimizing tests that won’t reduce the total run time! Measure every step of your pipeline and ask yourself: Is it the build time that’s slowing you down? Specific test or job? Maybe it’s due to slow network or input/output (I/O) operations? Optimize only what make an impact.

2. Take care of specific slow tests

Sometimes you have specific tests that are extremely slow. You can run pytest with –durations to find the slowest tests and take care of them one by one.

3. Measure everything

During our optimization journey, some of what we did turned out to actually be slower. Make sure to profile every change before and after, and make sure your change actually helped. At Claroty, we mostly use cProfile and SQLTap.

Side note: A nice little hack with SQLTap + pytest

A lot of our tests access a DB during the test run, which can slow things down. For this reason, I wanted to use SQLTap to find where I could optimize the way we interact with the DB.

Usually, you can surround the code section that you want to test with SQLTap. But when running many tests, where should you place your profiler code? I ended up editing pytest’s source code on my local machine. You can find the place where the tests start their run here:
“`

“””The pytest entry point.”””

import pytest

if __name__ == “__main__”:

—> sqptap code goes here ←-
raise SystemExit(pytest.console_main())

“`
That way, I could profile many tests and find what was slowing me down.

With that out of the way, we are ready to start speeding up your pipelines:

Run static tests first, and stop on failure

It’s frustrating to wait 20 minutes, just to find out that you failed on something like a syntax error or mypy’s tests. A best practice to avoid this frustration would be to:

  1. Run the static tests first—they are usually faster. This way, if your code fails, you’ll get faster feedback.
  2. There is no need to wait for a 5 minute test suite, just to find out that your second test failed. Run your tests with `pytest -x`, to stop at the first test that fails.

Running database in memory

Some of our tests use a database. To make things faster, we can run the DB in memory instead of writing to files. Writing to a file is slow, while accessing the RAM is blazing fast.

In a production app, you should never run your DB in memory, because your data will be deleted on every restart. But for tests it doesn’t matter, as the data is temporary anyways.

We encountered a problem—in some of our tests, we’re using MariaDB, which doesn’t support running tables in memory. To solve this, we used RAMFS. RAMFS is a file system that runs in memory, and this is exactly what we need! This is how you setup your MariaDB to run in memory:

  1. Stop the MySQL engine: /etc/init.d/mysql stop
  2. Set up the RAMFS: mkdir -p /var/ramfs; mount -t ramfs -o size=$size ramfs /var/ramfs/
  3. Set up the MySQL to use the new RAMFS:  cp -R /var/lib/mysql /var/ramfs/; chown -R mysql:mysql /var/ramfs/mysql; echo “datadir = /var/ramfs/mysql” >> /etc/my.cnf.d/icsranger.cnf
  4. Start the MySQL engine again: /etc/init.d/mysql start

From the DB engine’s perspective, everything is normal. It writes and reads from files as usual. But from the operating system’s point of view, the files are actually in memory and are not permanent.

This is how much time a sample of 215 of our tests takes without RAMFS solution:

215 passed, 1 skipped, 827 warnings in 270.76 seconds

and this is the same tests, with RAMFS – more than double the speed (!):

215 passed, 1 skipped, 827 warnings in 102.74 seconds

Other than MariaDB, there are also solutions for other DBs.

Quick win: Parallel testing

The idea here is simple. Instead of running all your tests one at a time in linear fashion, run them in parallel simultaneously!

Some unit tests are isolated and can run in parallel. They are separated from one another, and are completely independent. For those, you can run your tests with xdist.

But other tests are not isolated, and will affect each other when run in parallel – some of the tests might share DB, Redis, or other services. Perhaps there is another internal package that doesn’t know how to deal with isolating each test. Sometimes. you can solve the root cause of this by changing the way your services behave. But what if you can’t change it?

Here is a quick win, that doesn’t require any changes in your app itself: Our tests are running on Amazon Web Services (AWS), so in general, it doesn’t matter if you run one job for 20 minutes or four jobs for five minutes each. The cost will be practically identical. Instead of running the tests in parallel as part of the same job, we can split them into parallel jobs. That way, each job gets an isolated environment, and won’t share resources or state.

Here is how you do it:

  1. GitLab offers a feature that splits one job into multiple jobs. In your gitlab-ci.yml file, add a `parallel` keyword, with the number you want to split your tests to: unit_tests: | parallel: 5 |. script:- python -m pytest. This adds two environment variables to your jobs: CI_NODE_INDEX and CI_NODE_TOTAL. GitLab makes sure the CI_NODE_INDEX changes for every sub-job it runs.
  2. Install pytest-test-groups in your project dev environment: pip install pytest-test-groups
  3. When running your tests, use: pytest –test-group-count=$CI_NODE_TOTAL –test-group=$CI_NODE_INDEX
  4. Done! With almost no setup and no time, you managed to split your job and run the tests in parallel.

We are running our pipelines on GitLab. But there are also similar solutions for GitHub actions.

Truncate > drop and create

A lot of our tests use the same table structure. To have a clean state for every test, we used to drop the database before each test and create a fresh one. After running SQLTap, we realized this operation takes time. As it turns out, it’s cheaper for the DB to truncate all the tables one by one, instead of dropping the entire DB.

Here is a sample of 204 tests when dropping and creating the DB:

215 passed, 1 skipped, 827 warnings in 1278.33 seconds

and here are the same tests, when truncating the tables instead:

215 passed, 1 skipped, 827 warnings in 227.10 seconds

That’s an 82% decrease for this single step!

Here is how you can achieve that:

  1. Move the creation of the DB to the setup of the test suite. You can do that using pytest’s fixture with scope=session.
  2. In the setup of each test, truncate all the tables instead of dropping them:
    for table in Base.metadata.tables.keys():
    session.execute(“TRUNCATE TABLE %s” % table)

This syntax uses SQLAlchemy for getting the tables’ names, but you can plug in whatever you want.

Where do you go from here?

What I’ve covered is just a fraction of all the things we did. There are a lot of other ways to make sure your CI pipeline is blazing fast. For us it was a very exciting journey – we managed to reduce the pipeline’s time by more than 50% percent! Ask yourself: What other things can you do to reduce the pipelines’ time?
It will pay off big time!