Thursday 27 March 2014

Ubuntu thread sync performance much lower than windows'

I was benchmarking Ubuntu 13.04 performance against Windows 7 concerning thread sync, when I found something interesting.

 

Here’s the code I’m using for the benchmarking:

 

#include <mutex>

#include <time.h>

#include <iostream>

#include <vector>

#include <thread>

 

using namespace std;

 

mutex m;

 

unsigned long long dd = 0;

 

void RunTest()

{

    for(int i = 0; i < 100000000; i++)

    {

        unique_lock<mutex> lck{m};

        dd++;

    }

}

 

 

int main(int argc, char *argv[])

{

    

    clock_t tStart = clock();

    int tCount = 0;

    vector<shared_ptr<thread>> threads;

    for(int i = 0; i < 10;i++)

    {

        threads.push_back(shared_ptr<thread>{new thread(RunTest)});

    }

   

    RunTest();   

    

    for(auto t:threads)

    {

        t->join();

    }

 

    cout << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << endl;

 

    return 0;

}

 

If we run this on Windows vs Ubuntu, Ubuntu beats Windows  7 hands down. But this is because in Windows, mutex allows for between-process synch,  while in linux it does not.

 

So I created another test, with Windows Critical Section:

 

#include <stdafx.h>

#include <mutex>

#include <time.h>

#include <iostream>

#include <vector>

#include <thread>

#include<memory>

 

#include <Windows.h>

 

using namespace std;

 

mutex m;

 

unsigned long long dd = 0;

 

CRITICAL_SECTION critSec;

 

void RunTest()

{

                for (int i = 0; i < 100000000; i++)

                {

                                //unique_lock<mutex> lck{ m };

                                EnterCriticalSection(&critSec);

                                dd++;

                                LeaveCriticalSection(&critSec);

                }

}

 

 

int _tmain(int argc, _TCHAR* argv[])

{

                InitializeCriticalSection(&critSec);

 

                clock_t tStart = clock();

                int tCount = 0;

                vector<shared_ptr<thread>> threads;

                for (int i = 0; i < 10; i++)

                {

                                threads.push_back(shared_ptr<thread>{new thread(RunTest)});

                }

 

                RunTest();

 

                for (auto t : threads)

                {

                                t->join();

                }

 

                cout << ((double)(clock() - tStart) / CLOCKS_PER_SEC) << endl;

                DeleteCriticalSection(&critSec);

 

                return 0;

}

 

And in this test, windows code executes much much (!) faster than Linux code:

 

                critical section  =  38.807

                mutex on linux    = 453.01

                spinlock on linux = 974.15

 

Compared like that, it looks like on linux, there is something in the pthread mutex code, slowing it down.

 

I don’t think that I’m competent enough to fix it, but is it possible that someone from the developers has a look at this?