If you are working with a skilled automated testing team and a thoughtfully-designed testing framework, then test suite size should be logarithmically linked to time/labor investments. The idea here is similar to economies of scale; an additional test case for a highly-developed testing framework and test suite is a small amount of work, whilst writing the first few test cases is much more resource-intensive for a new test suite. This perhaps is arguable, particularly if maintenance complexity is considered. At the very least, however, it seems clear to me that the amount of time to develop a test case does not exponentially increase with the size of the suite. Otherwise, we wouldn’t see industry examples such as the Microsoft Office 2007 automated test suite, which had a test case count in the millions.
While not the case for human hours worked, increasing testing rigor can result in exponential growth of resource requirements for computational and network resources. This is sometimes referred to as “combinatorial explosion.” Let’s say we have 300 tests with the longest one being 4 minutes long. To have the full suite execute in 4 minutes, we would need 300 machines, Docker containers, or some other node for test execution. We can’t stack the quick tests on a single machine as we have no tests shorter than 2 minutes. So that we are functionally testing the software, and not performance testing the browser, let’s give each browser/system 4GB of RAM. CPU and related computational resources can be thought of in a similar way, so we will ignore them here. Creating 300 Docker containers may be no problem, but the resource requirements quickly increase to over 1TB of RAM to meet that 4 minute target at 4GB/container. Many organizations use on-demand cloud testing to run massive scale tests like this. An EC2 instance with 96 CPU cores, 384GB of RAM, and 25 Gbps network connection is around $1.50 per hour. However, on-demand cloud resources are not a perfect solution.
On-demand cloud resources can quickly become prohibitively expensive. In the previous example, we triple the resource requirements by running our tests on two more browsers. We increase by a factor of nine if we do three browsers and three operating systems. The frequency of testing also has a multiplicative effect on testing resources. Are we running the suite every week, every night, every pull request, or every commit? It may be possible to run one million tests on all combinations of 100 browsers and 100 operating systems, for every pull request, and in just the amount of time to run the longest test. Just don’t send me the invoice.
Use-case-based performance tests are affected by the same considerations as detailed above. However, with the tests all targeting a single server or a cluster of servers, the networking details can be especially problematic. 50 Mbps per connection for test nodes would require 1.5 Gbps total bandwidth to simulate thirty users. Some supporting “Quality of Service” setup would also be needed to ensure appropriate bandwidth sharing. Distributed testing quickly becomes necessary for these testing situations.
Some ways that small businesses can combat the combinatorial explosion problem:
Work on integrating a small subset of the full test suite into the continuous integration pipeline. Selecting a subset of tests for regression testing is the source of PhD dissertations, but it doesn’t have to be too complicated. Having a few core subsets, based on system component, or always running a “smoke test” subset can do the trick.
Execute the full suite at regular times that match the resources you have available. For example, a nightly full suite execution might be enough.
Leverage usage metrics to understand which browsers and operating systems to test. Just because an additional operating system is “easy” to add, doesn’t mean it should be added.
Consider “distributability” of the testing system early in the planning process. Technology selections and design decisions need to pay attention to this.