Why my performance benchmark gives me wrong results? - Debian

Pierre
May 28, 2020
135 views
1 vote
2 Answers

There is clang-tidy option performance-faster-string-find that detect the use of the std::basic_string::find method (and related ones) with a single character string literal as argument. According to them, the use of a character literal is more efficient.

I wanted to perform a little benchmark to test that. Therefore, I made this little program:

#include <string>
#include <chrono>
#include <iostream>

int main() {
    int res = 0;
    std::string s(STRING_LITERAL);

    auto start = std::chrono::steady_clock::now();

    for(int i = 0; i < 10000000; i++) {
#ifdef CHAR_TEST
        res += s.find('A');
#else  
        res += s.find("A");
#endif
    }

    auto end = std::chrono::steady_clock::now();

    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "sn";

    return res;
}

Two macros are used in this program:

STRING_LITERAL which will be the content of the std::string on which we will call the find function. On my benchmark, this macro can have two values: a small string, let’s say "BAB" or a long string, let’s say "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB",
CHAR_TEST, if defined, run the benchmark for character literal. If not, find is called with single character string literal.

Here are the results:

> (echo "char with small string" ; g++ -DSTRING_LITERAL="BAB" -DCHAR_TEST -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "string literal with small string" ; g++ -DSTRING_LITERAL="BAB" -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "char with long string" ; g++ -DSTRING_LITERAL="BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB" -DCHAR_TEST -O3 -o toy_exe toy.cpp && ./toy_exe) ; (echo "string literal with long string" ; g++ -DSTRING_LITERAL="BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB" -O3 -o toy_exe toy.cpp && ./toy_exe)

char with small string
elapsed time: 0.0551678s
string literal with small string
elapsed time: 0.0493302s

char with long string
elapsed time: 0.0599704s
string literal with long string
elapsed time: 0.188888s

My quite ugly command runs the benchmark for the four possible combinations of the macros and I found, with a long std::string, it is indeed more efficient to use a character literal as argument to find but it is no longer true for small std::string. I repeated the experiment and I always find an increase of around 10% of the execution time for character literal with small std::string.

In parallel, one of my workmates made some benchmarks on quick-bench.com and found the following results:

Small std::string with character literal: 11 units of time
Small std::string with single character string literal: 20 units of time
Long std::string with character literal: 13 units of time
Long std::string with single character string literal: 22 units of time

These results are coherent with what claims clang-tidy (and sounds logical). So, what is wrong with my benchmark? Why have I consistent wrong results?

EDIT: This benchmark has been performed using GCC 6.3.0 on Debian. I also run it using Clang 8.0.0 for similar results.

Tags: c#microbenchmark stdstring

Answers

- YitshakYarom
- June 3, 2020 at 3:23 pm
- 0 votes
0
I am not sure something is wrong with your bench marking. I run the exact same code on repl.io platform, and I get results that are matching “quick bench”:

char with small string
.elapsed time: 0.402103s

string literal with small string elapsed time: 0.489828s

char with long string elapsed time: 0.400224s

string literal with long string elapsed time: 0.53304s

There is one thing that comes into mind, your profiling is done over the loop, I would profile just whats in the loop.

Login or Signup to reply.

I did not like the idea of controlling the program via macros, so I rewrote it into this:

#include <string>
#include <chrono>
#include <iostream>

template <typename T>
int test(std::string s, T pattern, const std::string & msg, size_t num_repeat)
{
  int res = 0;
  auto start = std::chrono::steady_clock::now();

  for(int i = 0; i < num_repeat; i++) 
  {
    s.find(pattern);
    s[0] = '.';
  }

  auto end = std::chrono::steady_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  std::cout << msg << " elapsed time: " << elapsed_seconds.count() << "sn";

  return res;

}


int main(int argc, const char* argv[]) 
{
  const int N = 10'000'000;
  int res = 0;
  std::string s = (argc == 1) ? "MNBVCXZLKJHGFDSAPOIUYTREWQ" : argv[1];

  res += test(s, 'A', s + ".find(char): ", N);
  res += test(s, "A", s + ".find(string): ", N);

  return res & 1;
}

The main idea was to fool the compiler enough to make it abandon any idea of optimizing things out (that’s the purpose of s[1] = '.' and reading s from commandline). I wanted to avoid the situation where the compiler knowss both the string and the pattern searched for, as this could let it use some optimization tricks we do not want to take int account.

I compiled it using gcc 10.1.0 and clang 10.0.0, with -O3 as the only command-line option. (g++ was run with -std=c++17, I have it aliased).

The results depend on the compiler (which can be already observed in the benchmark linked in the question!)

Ok.
small strings, g++:

pA1.find(char):  elapsed time: 0.124409s
pA1.find(string):  elapsed time: 0.125372s

clang:

pA1.find(char):  elapsed time: 0.122489s
pA1.find(string):  elapsed time: 0.126854s

The difference is hardly measurable. clang yields systematically larger times for strings, but this is typically on the 3rd significant digit, hardly worth mentioning.

Now medium-size strings, g++:

00000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.139219s
00000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.137838s

clang:

00000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.13962s
00000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.153506s

The ressult for clang are systematically in favor of “char” method; as for g++, the winner fluctuates.

Now even larger strings, g++:

111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.170651s
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.177381s

clang:

111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(char):  elapsed time: 0.172215s
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000000000000000000pA1.find(string):  elapsed time: 0.206911s

For g++ the difference hardly starts to be possible to observe, it is within expected range of random fluctuations. For clang the difference is clear & systematic.

I repeated it with a string composed of approx 1000 chars. For g++ there’s no difference, for clang it’s about 10%.

So, my conclusion is that all depends on the compiler. For clang it is reasonable to follow the advice issued by clang-tidy. For g++, this need not be the case.

This answer is not complete, though, for it would be interesting to know the differences in implementation of std::string::find between clang and g++.

Please signup or login to give your own answer.

Click here to cancel reply.

Why my performance benchmark gives me wrong results? – Debian

Answers