In case you didn't know, most instructions take one or two cycles to execute. (For you anal-retentive people, this is actually just their apparent latency. The number of cycles is actually much larger and pipeline-dependent.)

Anyway, I'm going to take my instruction timings from an Intel Core2 L64. Let's take a look:
Code:
add     1 cycle
mov     1 cycle
shl     1 cycle
mul     8 cycles
div     116 cycles
Clearly the speed of your program does not hinge on how few instructions you use. Do you have any idea how much time you waste by dividing? Great thing is, this problem isn't unavoidable. There's ways of getting around it to cut down the number of cycles, which is critical in loops.

Multiplying
When multiplying by powers of two, we can use SHL:
Code:
SHL     EAX, 1      ; EAX *= 2
SHL     EAX, 2      ; EAX *= 4
...
This completes in one cycle instead of eight. Downside is that MUL stores the result in EDX:EAX, meaning your result could be a huge 64-bit number. With SHL, if you multiply by too much you could very well end up with zero as your answer.

For numbers less than 10, we can also use LEA. If you ever find the need to multiply, add, and store, you can use this handy-dandy instruction to get it all done in one cycle, and probably with fewer registers too. For example:
Code:
MOV     ECX, EAX
SHL     EAX, 2
ADD     EAX, ECX
ADD     EAX, 10
This snippet multiplies EAX by five (clobbers ECX) and adds ten for a grand total of four cycles. Still better than the nine it would take with a MUL and an ADD. But what if we don't have any other free registers? We're going to have to use the stack, which'll add two more cycles onto our snippet for a total of six. Not saving too much there.

However, our friend LEA can do all that in one cycle, without touching ECX:
Code:
LEA     EAX, [EAX*4 + EAX + 10]
Note that we have a few restrictions:
- We can only multiply one register by 1, 2, 4, or 8.
- We can only add one register. (No subtracting, sorry.)
- We can only add one signed constant, either 8 or 32 bits (whatever fits).

You can get creative and use it to add two registers and store the result in another register:
Code:
LEA     EDX, [EAX + ECX]            ; EDX = EAX+ECX
LEA     EBX, [EDX*4 + EAX - 1492]   ; EBX = EDX*4 + EAX - 1492
Note that we have a problem: You can't multiply a register by anything more than 9, and you can't do 6 or 7 either. Well, we can use two instructions for 6:
Code:
; multiply EAX by 6
LEA     EAX, [EAX*2 + EAX]      ; EAX *= 3
SHL     EAX, 1
Grand total of 2 clock cycles, which is 1/4 of what it takes using MUL.

Multiplying by 7 is a little more annoying. Here's one way to do it without clobbering any other registers:
Code:
PUSH    EAX
SHL     EAX, 3      
SUB     EAX, [ESP]
ADD     ESP, 4
Even though this code looks like a lot more work, it's actually twice as fast as a single MUL.

Multiplying by larger numbers requires a series of shifts, adds and LEAs which isn't really worth it anymore. Way back when MUL used to take around fifteen cycles, it was possible. But eight? Eh...highly depends on how cunning you are. Plus you need to know your algebra.

Multiply EAX by 47 (same as EAX*32 + EAX*16 - EAX)
Code:
MOV     ECX, EAX                ; save original EAX in ECX
SHL     EAX, 4                  ; EAX *= 16
LEA     EAX, [EAX*2 + EAX]      ; EAX = EAX*16*2 + EAX*16
SUB     EAX, ECX
Four cycles! Not bad. If we want to save ECX we can use the stack:
Code:
PUSH    EAX                     ; save original eax
SHL     EAX, 4                  ; EAX *= 16
LEA     EAX, [EAX*2 + EAX]      ; EAX = EAX*16*2 + EAX*16
SUB     EAX, [ESP]
ADD     ESP, 4
Six cycles. Not that much of a benefit, but in a frequently called loop it'll be worth it.

Now something harder: multiply EAX by 19. This is the same as EAX*16 + EAX*2 + EAX. This isn't going to work, though. Better redistribute the multiples to EAX*8 + EAX*4 + EAX.
Code:
MOV     ECX, EAX
SHL     EAX, 2
LEA     EAX, [EAX*2 + EAX]
ADD     EAX, ECX
Ha! Exactly the same as multiplying by 47, except for how much we shift by.

But enough of that. Let's move on to dividing.

Dividing
Not a whole ton you can do here, but as long as you're dividing by the right kind of numbers you can speed this up A LOT. Downside: you really can only do this quickly by shifting right, which means that you don't get a remainder. DIV gives you the quotient in EAX and the remainder in EDX. But if you don't need that remainder, you can shift using SAR and SHR. You need to know just one thing: is the number you're dividing signed or unsigned? (If you don't know the difference you kinda shouldn't be here. ) Anyway, if you're dealing with signed numbers use SAR (shift arithmetic right), if you're using unsigned numbers use SHR (shift right).
These are useful for dividing by powers of two, just as SHL is useful for multiplying by powers of two. If you want to divide by something that's not a power of two...sorry, but as far as I know you're up the creek without a paddle. IF, however, you want to do fractions, hold on a sec. I'll get to that.

Let's try something easy first - dividing by eight.
Code:
; unsigned numbers
SHR     EAX, 3

; signed numbers
SAR     EAX, 3
Dividing by any power of 2 is this simple, and 1/116 of the time it'd take with a DIV. Small savings, right? (That's my humor for ya.)

Fractions
Now about those fractions...we can only divide by powers of two, but we can at the same time multiply by another number. A little algebra:
Code:
A/2 - A/8 = (8A - 2A)/16 = 6A/16 = 3A/8

A/16 + A/2 = (2A + 16A)/32 = 18A/32 = 9A/16

5A/16 + 47A = 757A/16
The last one takes nine to thirteen cycles by my count. Neat, huh?

Caveat: You're going to get an integer answer unless you decide to use fixed-point arithmetic.

Sources: "Instruction latencies and throughput for AMD and Intel x86 processors" by Torbjör Granlund. (C) 2009.