Hi, let's say I have two files:
1st. contains: string_with_fixed_length::random string e.g. abcdef::12s5g9
2nd. contains: string_with_fixed_length::random string e.g. abcdef::7j93097j
And I want to compare those strings with fixed length(abdcef with abcdef) and if they are the same(in this case they are) take those random strings and write them out in format 12s5g9::7j93097j.
I'm fighting with this task for a long time and I suppose it has some easy solution but I'm new to python so I don't know how to deal with it.
If somebody could help me I would appreciate it ;)
5 replies to this topic
#1
Posted 18 August 2011 - 01:05 PM
|
|
|
#2
Posted 18 August 2011 - 02:41 PM
Well, str.split is your friend here.
By applying splitResult = "abcdef::12s5g9".split("::") you get a list containing "abcdef" on the first position and "12s5g9" on the second one. From here on it's a simple matter of comparing splitResult[0] of each string and then concatenating the splitResult[1] of each string.
Hope I was able to explain properly!
By applying splitResult = "abcdef::12s5g9".split("::") you get a list containing "abcdef" on the first position and "12s5g9" on the second one. From here on it's a simple matter of comparing splitResult[0] of each string and then concatenating the splitResult[1] of each string.
Hope I was able to explain properly!
#3
Posted 18 August 2011 - 09:18 PM
You sure did! Thanks a lot Revolt ;)
#4
Posted 21 August 2011 - 03:12 AM
It seems I have another problem with reading lines of those 2 files.
I use for loop, here's my code:
And it seems that after breaking second "for" loop (with a Successful comparison) and iterating the first "for" loop to second item (in this case 2nd line) it doesn't start iterating second "for" loop from the start again, but holds "\n" value.
Example:
1st: file contains:
xxxx
aaaa
bbbb
2nd file contains:
aaaa
cccc
xxxx
So my comparison looks like:
xxxx -> aaaa - Nothing.
xxxx -> cccc - Nothing.
xxxx -> xxxx - Success!
and it should continue like this:
aaaa -> aaaa - Success!
and so on ...
But it doesn't, instead of comparing "aaaa" with "aaaa" it compares "aaaa" with "\n".
Thanks for advice :)
I use for loop, here's my code:
#! /usr/bin/env python
import string
import sys
file_1 = open("jednicka")
file_2 = open("dvojka")
for line in file_1:
for line_2 in file_2:
if line != line_2:
print "[*]Nothing."
print " "+line+ " "+line_2
elif line == line_2:
print "[**]Success!"
print " "+line+ " "+line_2
break
else:
print "[---]Failed!"
And it seems that after breaking second "for" loop (with a Successful comparison) and iterating the first "for" loop to second item (in this case 2nd line) it doesn't start iterating second "for" loop from the start again, but holds "\n" value.
Example:
1st: file contains:
xxxx
aaaa
bbbb
2nd file contains:
aaaa
cccc
xxxx
So my comparison looks like:
xxxx -> aaaa - Nothing.
xxxx -> cccc - Nothing.
xxxx -> xxxx - Success!
and it should continue like this:
aaaa -> aaaa - Success!
and so on ...
But it doesn't, instead of comparing "aaaa" with "aaaa" it compares "aaaa" with "\n".
Thanks for advice :)
Edited by Slammerek, 21 August 2011 - 04:26 AM.
#5
Posted 21 August 2011 - 09:20 AM
The problem is that you are iterating over the file handles. Each file handle points to the current position on that file so that successive read operations know where to start from.
When you enter the innermost for for the second time, you have already read all lines from file_2 and it won't read them again.
If the files are guaranteed to be small, you could read all lines to temporary arrays and use them on the for:
However, if you stumble upon gigantic files (1GB+), reading all lines to memory may not be the best approach. The alternative is calling file_2.seek(0) to return the file cursor to the start of the file right after the second for. This way it may take more time to parse (since it will be constantly reading from the file) but you will be able to read 1GB+ files even if you only have 512mb ram for instance.
When you enter the innermost for for the second time, you have already read all lines from file_2 and it won't read them again.
If the files are guaranteed to be small, you could read all lines to temporary arrays and use them on the for:
file2_lines = file_2.readlines() for line2 in file2_lines
However, if you stumble upon gigantic files (1GB+), reading all lines to memory may not be the best approach. The alternative is calling file_2.seek(0) to return the file cursor to the start of the file right after the second for. This way it may take more time to parse (since it will be constantly reading from the file) but you will be able to read 1GB+ files even if you only have 512mb ram for instance.
#6
Posted 21 August 2011 - 10:16 AM
Great, It works!
Thank you so much Revolt ;)
Thank you so much Revolt ;)
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account

Back to top









