I have a very large list of strings, and I want to make roughly 2 million checks of a given string against this list.
What is the best performance method of doing this? I have tried using an array and the find method, but I am curious if a better object/method is available for simple string comparison.
4 replies to this topic
#1
Posted 11 May 2011 - 09:19 AM
|
|
|
#2
Posted 11 May 2011 - 09:36 AM
interesting question
#3
Posted 11 May 2011 - 10:49 AM
In a list of strings an iterative method such as find would likely be the best you can do, using objects or classes would add unnecessary overhead even if they would make things a little faster.
Maybe try to tell us the speed it takes (i.e. measure the amount of strings, and the amount of time, and extract strings per second out of this) so we could see if this is slow.
Maybe try to tell us the speed it takes (i.e. measure the amount of strings, and the amount of time, and extract strings per second out of this) so we could see if this is slow.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
#4
Posted 12 May 2011 - 12:50 PM
Depending upon the content of strings you should be able to come up with a good Hash function. Can you explain the contents of the string? Can you make any assumptions about them?
What i am trying to say is say you have indexes between 0 - 2 million. And you come with a scheme that say string "i am a string" would result into hash 345 and you store that on index 345.
This way you would avoid doing string comparison and instead only check a specific index.
If your string is too generic, you can still come up with some partial hashing rules to improve that. Say you create a hash of 26 indexes.
a is 0, b is 2, c is 3 ..... z is 25.
Now you store pointers to all strings starting with char 'a' in index 0. So when you find a string starting with a you only do string comparisons with all in a.
Even better idea is to store all of your 2 million strings if possible in a binary search tree (parse trees are made this way in a compiler).
At each level of the tree you compare the current character with that in the string, if equal you would choose right or left child depending upon incoming character being less or more.
This would get you to required string in way much less comparisons.
However, whatever i have recommended requires building a hash or another efficient data structure using memory and run on your entire 2 million entries once. If that is not possible, your best shot is still ordinary string comparisons.
Else you need to know and tell us what sort of organization those 2 million strings are placed in at present
What i am trying to say is say you have indexes between 0 - 2 million. And you come with a scheme that say string "i am a string" would result into hash 345 and you store that on index 345.
This way you would avoid doing string comparison and instead only check a specific index.
If your string is too generic, you can still come up with some partial hashing rules to improve that. Say you create a hash of 26 indexes.
a is 0, b is 2, c is 3 ..... z is 25.
Now you store pointers to all strings starting with char 'a' in index 0. So when you find a string starting with a you only do string comparisons with all in a.
Even better idea is to store all of your 2 million strings if possible in a binary search tree (parse trees are made this way in a compiler).
At each level of the tree you compare the current character with that in the string, if equal you would choose right or left child depending upon incoming character being less or more.
This would get you to required string in way much less comparisons.
However, whatever i have recommended requires building a hash or another efficient data structure using memory and run on your entire 2 million entries once. If that is not possible, your best shot is still ordinary string comparisons.
Else you need to know and tell us what sort of organization those 2 million strings are placed in at present
#5
Posted 14 July 2011 - 07:01 AM
Today is the first day of the rest of my life
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account

Back to top









