Jump to content

Concerns about the issue of collisions?

- - - - -

  • Please log in to reply
3 replies to this topic

#1
Fighter

Fighter

    Newbie

  • Members
  • PipPip
  • 28 posts
In my web application, and a few other places I am using the first five or so digits of random MD5's, as a02f2 is less less revealing to the eye than 123, 124, etc and the string needs to be fairly simple to our users/us to remember.

My main concern is that the numbers will repeat, I believe the first numbers of the MD5 will not be dependent of eachother however I have heard voiced concerns over MD5 collision in web and other security applications.

Are the first 5 or so characters smart to use? Can you give me some solid examples of when exactly it would become an issue of collision?

#2
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
You could understand your choice as the following:
  • Base 16 (hexadecimal), allows 16 unique symbols to represent itself
  • 16*16*16*16*16, or 16^5, can represent the equivalent of twenty bits of information.
Knowing that, there are roughly 1 million (2^20) possible sums available to you.

A rough formula, to avoid using extremely large numbers (i.e. 65536!) using the Taylor series expansion (of the exponential function e) can be used in finding the probability two random hashes may collide.


Posted Image





where n is 2, and k is 2^20, will equal 0.0000009 percent (or. 9/10000000 hashes)

In reverse you can also calculate n from your specified p using the natural logarithm of e. p will be 0.5 in the following (fifty percent chance is more than significant)

Posted Image






1205 hashes will theoretically contain a 50 percent chance of collision, and is significant enough and also dangerously likely if you are using it to keep track of files or records.

If you could use at least the full lower case range,with numbers, that will easily increase the amount of possible combinations (26^5 = 11 mil) and you may need to worry less of the issue.

Some references you could use, based on well known problems used to describe this:
Birthday attack - Wikipedia, the free encyclopedia
Birthday problem - Wikipedia, the free encyclopedia

Alexander.

Attached Files

  • Attached File  ;.gif   585bytes   20 downloads
  • Attached File  ;.gif   845bytes   25 downloads

Edited by Alexander, 14 September 2011 - 03:47 AM.

Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#3
Fighter

Fighter

    Newbie

  • Members
  • PipPip
  • 28 posts
I had never realized it could be calculated even by hand, and that the character range played a huge® part in the amount permutations. I see no problem in adding more of the characters, there was an option to use either five or ten length however I'll increase that a little depending on what they wish.

Thank you for the informative writeup Alexander.

A small request, and you do not have to do this: Is it possible to generate a graph (in a simple manner) to see the formula in action for myself?

Edited by Fighter, 14 September 2011 - 04:01 AM.
Typo


#4
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
You are welcome, there are a lot of little neat functions out there to calculate these things.

Fighter said:

A small request, and you do not have to do this: Is it possible to generate a graph (in a simple manner) to see the formula in action for myself?

Have you heard of Mathematica? They've an online service called Wolfram|Alpha which shares a fair amount of features, you can plug this in (excuse the encoded url) and it should give you the appropriate graph:
[url=http://www.wolframalpha.com/input/?i=Plot%5B1-e^%28-n^2%2F%282*16^5%29%29+%2C+{n%2C0%2C4000}%5D]Plot[1-e^(-n^2/(2*16^5)) , {n,0,4000}] - Wolfram|Alpha[/url]

The right hand column is for the iterator, telling it to make n go from 0 to 4000 on the x axis.

Alexander.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users