Portability is becoming increasingly more important nowadays with the 32-/64-bit dichotomy. Writing portable code means that something you write will compile and run on a variety of different systems with no modification to the source code. Just because it compiles doesn't mean it'll run properly.
Here I will outline some common mistakes and assumptions that make code unportable, and how to avoid them.
Don't Assume Pointer Size
I've seen n00b programmers do this various times:
This is completely wrong for multiple reasons:Code:char *mystring = "Hello, World!"; int pointer_value = mystring; printf("The string \"%s\" is located at %d\n",mystring,pointer_value);
1) This code assumes that an integer is the same size as a pointer. This is not the case on most 64-bit systems, where a pointer is 64 bits and integers stay at 32 bits. You're cutting off half the pointer.
2) Pointers are by nature unsigned. Assigning this to a signed integer is just plain wrong. If you're going to do this--which you shouldn't anyway--at the very least use an unsigned integer.
How to fix this:
1) Don't. Usually there are ways around this. For example:
This will work on any system, and give you the correct (unsigned) value in hexadecimal.Code:printf("The string \"%s\" is located at %p\n",mystring,mystring);
2) If you must, include stdint.h and use the intptr_t or uintptr_t types. Those are always guaranteed to be at least the size of a pointer on the system that you compile on.
3) If your library doesn't include stdint.h then use unsigned long as a last resort; they're usually the same size as a pointer but not guaranteed to be so. (Yes, a long is typically the same size as an int on 32-bit systems.)
Don't Assume Datatype Sizes
Common misconceptions:
- A char may not be eight bits. What?! I hear you say. Yes, that's right. On some systems a byte is seven bits; this is why when you transfer files by email, your computer converts files into base64 or an equivalent encoding to avoid using that eighth bit that might get dropped. Use the CHAR_BITS macro, defined in limits.h (climits if you're using C++) instead of hard-coding to eight bits.
- A short is not 16 bits by definition. It's only guaranteed to be bigger than a char. How much bigger is up to the implementor.
- An int is not 32 bits by definition. It's only guaranteed to be bigger than a short.
- A long is not 64 bits by definition. It's only guaranteed to be at least as big as an int.
Don't Assume Endianness
Endianness refers to the order of the bytes in a multibyte integer in memory. Little-endian machines store multibyte integers from least significant to most significant; big-endian machines store multibyte integers with the bytes ordered from most significant to least significant. For example:
The number 0x01234567 is stored the following manner (assuming addresses increase from left to right):
Big-endian: 01 23 45 67
Little-endian: 67 45 23 01
For example, let's say you're writing a simulator program for the 8086, a 16-bit little-endian processor. You may be tempted to do the following:
This will work fine on a little-endian machine, but on a big-endian machine if you were to change al you'd be altering the value of the high byte, not the low byte as you want.Code:uint16_t ax; uint8_t al = &ax; //pointer to low byte of ax
Don't Assume Floating-Point Representations Are The Same
There are different standards for representing floating-point numbers, such as IEEE-754, binary fixed-point, or binary-coded decimal (BCD). When sending data from one computer to another over a network, make sure that you send floating-point numbers as text strings or some other machine-neutral format; the other machine may not have the same internal representation of a floating-point number as yours, which will result in incorrect values being used for computations.
Don't Assume Alignment Requirements
Take the following struct definition:
It's clearly five or nine bytes, right? Wrong. Hardware generally doesn't like reading data from addresses that aren't multiples of the data size, i.e. &x % sizeof(x) should always be 0. This means that the compiler is going to stick either three or seven bytes after a to force things to align. So this struct is actually either eight or sixteen bytes long. To avoid wasting space like this, put your fields in decreasing order of size, i.e. long longs go first, then longs, then ints, etc. If you draw yourself a diagram you'll see why. Point is, don't rely on your elements in your structs being right next to each other, because chances are they're not.Code:struct MyStruct { char a; long b; };
Don't Assume Codepages Or Character Encodings
Let's say I want to write a program in German. German has several characters not in the ASCII set, namely ä, ö, ü, and ß. If I wanted to write "Wählen Sie eine Nummer von 1 bis 10", I can't substitute \204 for the 'ä'. Why? Because this'll only work on Windows consoles, since it uses the IBM/DOS Extended Character Set. If my client is using ISO 8859-1 (which is more than likely in Germany), it'll show up as [?]. Better to check the current locale and adapt accordingly.
Locale and C/C++
Don't Assume File System Properties
File Path Lengths
Restrictions on how long file names can be vary from operating system to operating system, and even file system to file system. FAT12 and FAT16 limited filenames to eight characters plus a three-character extension; NTFS imposes 255 bytes (not necessarily characters) as the limit to a path; HPFS allows 256, etc. I guess you can safely assume that you're not subject to the 8.3 restriction, simply because it's so old; but otherwise, just keep your filenames short and everyone will be happy.
Case Sensitivity
Windows regards foo.txt and fOo.TxT as the same file. *NIX systems do not.
Allowed Characters
*NIX systems are typically more permissive about filenames than Windows. ISO 9960 filesystems are incredibly restrictive--they don't even allow the dash as part of a filename. Keep this in mind when creating file names; to be safe, just use letters, numbers, and underscores, and don't rely on case sensitivity.
Path Separator
Windows uses \ to separate portions of a file path; *NIX systems use /. Be careful when hard-coding file paths into your programs, because Windows will choke on a / and a *NIX shell will interpret the \ as an escape sequence.
Character Encodings
Some file systems allow Unicode names, others don't. Use system API functions to check before you try creating a filename in Chinese or Russian, or something like that.
Don't Use system() For Executing Commands
system("cls") will clear the screen on a Windows system, but will fail on *NIX systems because the command is clear, not cls. In this case you should just print a whole bunch of blank lines instead. In general, though, you should not use system() for invoking anything on the command line, because this relies on the environment as well as the operating system. If you're writing specifically for *NIX, or specifically for Windows, then go ahead. However there are more secure ways of invoking commands; take a look at the exec* functions on the Linux man pages.
Don't Assume Environment Properties
I spent an entire day at work fixing a utility script for Linux written in TCL. Why? It assumed that the current directory, . , was in $PATH. This is almost never the case on *NIX systems; however, Windows gives you no other option. Another example: I was writing a Perl script that would execute some command that took forever, and then send the user a notification email. A mistake I made was to assume that the current username was stored in the environment variable $USER. This is not always the case. On Windows, it's $USERNAME. On *NIX, it's $USER. So don't assume things about environment settings--you won't get far.
Last edited by dargueta; 09-08-2009 at 09:17 AM.
sudo rm -rf /
Nice written! +rep
Very nice. We had a customer whose old system broke because it choked on .xlsx extensions. Our system supports the filenames they needed with Office 2007.
The system was old enough to choke on a non 8.3 filename but new enough to run Office 2007?
sudo rm -rf /
Absolutely!+rep
Hey! Check out my new Toyota keyboaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
No, the program was from the days of win 3.1. It was running on an XP machine, but still suffering the limitations of the old days.
HAAAA...the 486 in my basement used to run 3.1 (before the 400MB hard drive crashed).
sudo rm -rf /
+rep
Don't assume... words to live by
Good Stuff dargueta![]()
Sorry.. I had a doubt.. But I understood it myself..
Last edited by veda87; 09-11-2009 at 12:31 AM. Reason: doubt clarified...
All right, well for those of you who still don't get it, if you have this:
If a long is four bytes, the compiler will change the struct to look like this:Code:struct MyStruct { char a; //offset: 0 long b; //offset: 1 - hardware won't like it };
If a long is eight bytes long, it will do this:Code:struct MyStruct { char a; //offset: 0 char __invisible_padding__[3]; //offset: 1 long b; //offset: 4 };
Of course, this padding is completely transparent to the programmer. There's no way to directly access it unless you rely on it being in between specific members, and a specific number of bytes, which is not portable. So don't do that either.Code:struct MyStruct { char a; //offset: 0 char __invisible_padding__[7]; //offset: 1 long b; //offset: 8 };
sudo rm -rf /
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks