Jump to content

C++: Multidimensional Arrays; Performance Using Column Major vs Row Major Assignment.

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
12 replies to this topic

#1
Rabtherab

Rabtherab

    Newbie

  • Members
  • Pip
  • 6 posts
This may be obvious to some, but I see this come up from time to time in people's code where they assign values to an array in column major order. I've looked into execution times using both formats and Row Major is typically several orders of magnitude faster than Column Major assignment.

I'm using Code::Blocks w/ MinGW on Win64.

Consider the following code that investigates both methods:


#include <iostream>

#include <ctime>

#include <windows.h>


class CTimer {

	private:

		LARGE_INTEGER lFreq, lStart;


	public:

		CTimer() {

			QueryPerformanceFrequency(&lFreq);

		}


	inline void Start(void) {

		QueryPerformanceCounter(&lStart);

	}


	inline double Stop(void) {

		LARGE_INTEGER lEnd;

		QueryPerformanceCounter(&lEnd);

		return (double(lEnd.QuadPart - lStart.QuadPart)

			/ lFreq.QuadPart);

	}


};


class CAssignArray {

	private:

		long iIndex;

		long*** iTestData;

	public:

		CAssignArray() {

			iIndex = 0;

			iTestData = new long** [iIndex];

			for (long i = 0 ; i < iIndex ; ++i) {

				iTestData[i] = new long* [iIndex];

				for (long j = 0 ; j < iIndex ; ++j) {

					iTestData[i][j] = new long[iIndex];

				}

			}

		}


		CAssignArray(long index) {

			iIndex = index;

			iTestData = new long** [iIndex];

			for (long i = 0 ; i < iIndex ; ++i) {

				iTestData[i] = new long* [iIndex];

				for (long j = 0 ; j < iIndex ; ++j) {

					iTestData[i][j] = new long[iIndex];

				}

			}

		}


		~CAssignArray() {

			for (long i = 0 ; i < iIndex ; ++i) {

				for (long j = 0 ; j < iIndex ; ++j) {

					delete [] iTestData[i][j];

				}


				delete [] iTestData[i];

			}


			delete [] iTestData;

		}


		inline void ColumnOrder(void){

			for (long k = 0 ; k < iIndex ; ++k) {

				for (long j = 0 ; j < iIndex ; ++j) {

					for (long i = 0 ; i < iIndex ; ++i) {

						iTestData[i][j][k] = 0;

					}


				}


			}


		}


		inline void RowOrder(void){

			for (long i = 0 ; i < iIndex ; ++i) {

				for (long j = 0 ; j < iIndex ; ++j) {

					for (long k = 0 ; k < iIndex ; ++k) {

						iTestData[i][j][k] = 0;

					}


				}


			}


		}


};


using namespace std;


int main (int argc, char** argv){

	CTimer TotalTime;

	TotalTime.Start();

	for (long index = 0 ; index <= 640 ; index += 128) {

		CAssignArray Array(index);

		CTimer Timer;

		Timer.Start();

		Array.ColumnOrder();

		float fColTime = Timer.Stop();

		Timer.Start();

		Array.RowOrder();

		float fRowTime = Timer.Stop();

		cout << "Array Size in 4-Byte Longs:\t[" << index << "] ["

			<< index << "] [" << index << "]" << endl;

		cout << "Total Mem Usage for Array:\t" <<

			index * index * index * 4 / 1024 / 1024 << " MB" << endl;

		cout << fixed << "Column Order Assignment Time:\t"

			<< fColTime << " s" << endl;

		cout << fixed << "Row Order Assignment Time:\t" <<

			fRowTime << " s" << endl;

		float fDeltaTime = fColTime / fRowTime;

		cout << fixed << "Delta Value (Col/Row):\t\t"

                        << fDeltaTime << endl << endl << endl;

	}


	cout << fixed << "Total Running Time:\t" << TotalTime.Stop()

                << "s" << endl;

	return 0;

}



On my system this produces the following output:


Array Size in 4-Byte Longs:     [0][0] [0]

Total Mem Usage for Array:      0 MB

Column Order Assignment Time:   0.000002 s

Row Order Assignment Time:      0.000001 s

Delta Value (Col/Row):          1.200000



Array Size in 4-Byte Longs:     [128][128] [128]

Total Mem Usage for Array:      8 MB

Column Order Assignment Time:   0.027539 s

Row Order Assignment Time:      0.003824 s

Delta Value (Col/Row):          7.202236



Array Size in 4-Byte Longs:     [256][256] [256]

Total Mem Usage for Array:      64 MB

Column Order Assignment Time:   0.681002 s

Row Order Assignment Time:      0.031258 s

Delta Value (Col/Row):          21.786558



Array Size in 4-Byte Longs:     [384][384] [384]

Total Mem Usage for Array:      216 MB

Column Order Assignment Time:   2.881100 s

Row Order Assignment Time:      0.106730 s

Delta Value (Col/Row):          26.994347



Array Size in 4-Byte Longs:     [512][512] [512]

Total Mem Usage for Array:      512 MB

Column Order Assignment Time:   8.174504 s

Row Order Assignment Time:      0.252594 s

Delta Value (Col/Row):          32.362213



Array Size in 4-Byte Longs:     [640][640] [640]

Total Mem Usage for Array:      1000 MB

Column Order Assignment Time:   17.574194 s

Row Order Assignment Time:      0.495773 s

Delta Value (Col/Row):          35.448040



Total Running Time:     31.961836s


Process returned 0 (0x0)   execution time : 31.999 s

Press any key to continue.


So as you can see substantial run-time penalties can be incurred when using Column Major assignments.

If your code is time sensitive you should keep this in mind.

I hope this helps someone ^^.

Edit: Please feel free to make whatever comments you have about this.

Edited by Rabtherab, 10 July 2010 - 01:04 AM.
Spurious assumption about big-endian systems.


#2
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,720 posts
Interesting information! +rep.

I think this might be better off in the Tutorial section, though. Don't double-post it there until I ask one of the other mods what they think. (I'm new and still learning the ropes.)

EDIT: Have you tried this on Linux or on different processors?

Edited by dargueta, 08 July 2010 - 11:56 PM.
Grammar fail.

sudo rm -rf /

#3
JCoder

JCoder

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 245 posts

Quote

I suspect the reverse may be true for Big-Endian systems.



It has nothing to do with endianess. It is caused by how caches work. The CPU reads / writes memory in chunks (cache lines, usually several bytes long). When you read in row order, you usually you read the whole line, then move to the next line etc. In column-major order, you read just the beginning of the line and you request another line which must be downloaded from the RAM to the cache. When the structure does not fit in the cache, by the time you move to the next column, the lines that were read first have been already evicted and must be downloaded from RAM again.

If it was Java or any other higher level language without pointer arithmetic, a clever compiler could reorder the loops to make it always row-major. In C++ such optimization is impossible due to pointer aliasing.



#4
Rabtherab

Rabtherab

    Newbie

  • Members
  • Pip
  • 6 posts
Higher level languages would try to shield me from shooting myself in the foot, which I quite enjoy.

Here's an article that explains more clearly what is happening:
Locality of reference - Wikipedia, the free encyclopedia

I prefer the freedom C/C++ allows me.

Also:

Quote

If it was Java or any other higher level language without pointer arithmetic
Java and other HLL do have pointer arithmetic, they simply hide it under the covers so you don't have to worry about it.

#5
Rabtherab

Rabtherab

    Newbie

  • Members
  • Pip
  • 6 posts

dargueta said:

Interesting information! +rep.
EDIT: Have you tried this on Linux or on different processors?

Yes, I checked it out on linux on an x64 and on a power pc with similar results

#6
JCoder

JCoder

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 245 posts

Quote

Java and other HLL do have pointer arithmetic, they simply hide it under the covers so you don't have to worry about it.
No it's not only that. C++ compiler cannot do any really interesting optimizations - like this one: to reorder the loops to increase locality of reference, or even better: to parallelize loops for multicore, or even more better: to eliminate those loops completely as they do nothing (this is what actually Java would do here - just get rid of these loops). C++ can do just trivial low level things like register allocation or loop unrolling or sometimes inlining something (but now Java is better at inlining). Even common subexpression elimination is a pretty tough stuff, because it is extremely hard to determine if there are no side-effects. In HLLs, the compiler has much more room to do complex optimizations. Of course, this also makes the compiler much more complex so most HLL compilers are at the beginning of the road, while the C++ compilers cannot do much more than they do now. The lower level is the language, the less can optimize the compiler and the more the programmer. In assembly, the compiler can do nothing. In C/C++, the compiler can do very little. If you programmed in C++ with all the security checks that normally Java does for you (array range checking, cast checking, automatic memory management), your C++ program would be much slower than the Java equivalent.

#7
Rabtherab

Rabtherab

    Newbie

  • Members
  • Pip
  • 6 posts
I guess you're on some kind of crusade, which to be honest I find a dreary bore. You also presuppose I want a HLL to do things for me, which frankly, I don't. Furthermore, this discussion is rapidly turning into one about how this language is better than that, or this compiler is superior to that, which also doesn't interest me. You can program in Java all you like, and I'll program in what I like. Is that sufficient? Religious wars are lose lose.

#8
JCoder

JCoder

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 245 posts
Read the previous posts - it is you who started this offtopic. I just provided the explenation for what you observed and additionally said that some clever compilers could reorder these loops.

Quote

Higher level languages would try to shield me from shooting myself in the foot, which I quite enjoy. (...)
Java and other HLL do have pointer arithmetic, they simply hide it under the covers so you don't have to worry about it.



These are the first offtopic statements (and apparently the first one is just an opinion, and the second one is simply false).



#9
Flying Dutchman

Flying Dutchman

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 889 posts

"JCoder" said:

If it was Java or any other higher level language without pointer arithmetic, a clever compiler could reorder the loops to make it always row-major. In C++ such optimization is impossible due to pointer aliasing.
I believe offtopic started here. And here's my offtopic bit; I read your other thread, JCoder, where you state that C++ is a joke. Fine. Your opinion. Some people like fast cars while others like luxury ones and you're saying that we all should like only one kind.
A conclusion is where you got tired of thinking.
#define class struct    // All is public.

#10
JCoder

JCoder

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 245 posts
I think that some C++ programmers treat their language too religiously. Whenever I post something technical saying that C++ is limited in some way compared to some other language, they treat is as personal offence.

Quote

Some people like fast cars while others like luxury ones and you're saying that we all should like only one kind.


I've never said so. I've said that sometimes your super fast car is not as fast as other cars, often __believed__ as being slower. The facts are that there are cases, where Java code can run much faster than C++ code; especially when you do not want to trade readability of the code for its efficiency.



#11
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,720 posts
I have a feeling this is quickly going to turn into an off-topic flame war. This seems very much like the off-topic bit in the "OS in C++" thread. Why don't we start a "Technical Differences Between C++ and Java" thread in Programming Theory and go from there? JCoder, you can start.
sudo rm -rf /

#12
Rabtherab

Rabtherab

    Newbie

  • Members
  • Pip
  • 6 posts
I would appreciate if this thread could be trimmed of the junk as I have nothing else to say on that matter.

If someone wanted to contribute something in the original spirit of the thread that would be lovely.