Thursday 27 March 2014

Ubuntu thread sync performance much lower than windows'

I was benchmarking Ubuntu 13.04 performance against Windows 7 concerning thread sync, when I found something interesting.


Here’s the code I’m using for the benchmarking:


#include <mutex>

#include <time.h>

#include <iostream>

#include <vector>

#include <thread>


using namespace std;


mutex m;


unsigned long long dd = 0;


void RunTest()


    for(int i = 0; i < 100000000; i++)


        unique_lock<mutex> lck{m};






int main(int argc, char *argv[])



    clock_t tStart = clock();

    int tCount = 0;

    vector<shared_ptr<thread>> threads;

    for(int i = 0; i < 10;i++)


        threads.push_back(shared_ptr<thread>{new thread(RunTest)});





    for(auto t:threads)





    cout << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << endl;


    return 0;



If we run this on Windows vs Ubuntu, Ubuntu beats Windows  7 hands down. But this is because in Windows, mutex allows for between-process synch,  while in linux it does not.


So I created another test, with Windows Critical Section:


#include <stdafx.h>

#include <mutex>

#include <time.h>

#include <iostream>

#include <vector>

#include <thread>



#include <Windows.h>


using namespace std;


mutex m;


unsigned long long dd = 0;




void RunTest()


                for (int i = 0; i < 100000000; i++)


                                //unique_lock<mutex> lck{ m };








int _tmain(int argc, _TCHAR* argv[])




                clock_t tStart = clock();

                int tCount = 0;

                vector<shared_ptr<thread>> threads;

                for (int i = 0; i < 10; i++)


                                threads.push_back(shared_ptr<thread>{new thread(RunTest)});





                for (auto t : threads)





                cout << ((double)(clock() - tStart) / CLOCKS_PER_SEC) << endl;



                return 0;



And in this test, windows code executes much much (!) faster than Linux code:


                critical section  =  38.807

                mutex on linux    = 453.01

                spinlock on linux = 974.15


Compared like that, it looks like on linux, there is something in the pthread mutex code, slowing it down.


I don’t think that I’m competent enough to fix it, but is it possible that someone from the developers has a look at this?