Jump to content

String comparison

- - - - -

  • Please log in to reply
5 replies to this topic

#1
Slammerek

Slammerek

    Newbie

  • Members
  • Pip
  • 8 posts
Hi, let's say I have two files:

1st. contains: string_with_fixed_length::random string e.g. abcdef::12s5g9
2nd. contains: string_with_fixed_length::random string e.g. abcdef::7j93097j

And I want to compare those strings with fixed length(abdcef with abcdef) and if they are the same(in this case they are) take those random strings and write them out in format 12s5g9::7j93097j.


I'm fighting with this task for a long time and I suppose it has some easy solution but I'm new to python so I don't know how to deal with it.
If somebody could help me I would appreciate it ;)

#2
Revolt

Revolt

    Programmer

  • Members
  • PipPipPip
  • 99 posts
Well, str.split is your friend here.

By applying splitResult = "abcdef::12s5g9".split("::") you get a list containing "abcdef" on the first position and "12s5g9" on the second one. From here on it's a simple matter of comparing splitResult[0] of each string and then concatenating the splitResult[1] of each string.

Hope I was able to explain properly!

#3
Slammerek

Slammerek

    Newbie

  • Members
  • Pip
  • 8 posts
You sure did! Thanks a lot Revolt ;)

#4
Slammerek

Slammerek

    Newbie

  • Members
  • Pip
  • 8 posts
It seems I have another problem with reading lines of those 2 files.
I use for loop, here's my code:
#! /usr/bin/env python  


import string

import sys



file_1 = open("jednicka")

file_2 = open("dvojka")



for line in file_1:

 for line_2 in file_2:

        if line != line_2:

            print "[*]Nothing."

            print "   "+line+ "   "+line_2

           

        elif line == line_2:

            print "[**]Success!"

            print "    "+line+ "    "+line_2

            break

        else:

            print "[---]Failed!"

            

        

    

 

And it seems that after breaking second "for" loop (with a Successful comparison) and iterating the first "for" loop to second item (in this case 2nd line) it doesn't start iterating second "for" loop from the start again, but holds "\n" value.


Example:

1st: file contains:
xxxx
aaaa
bbbb

2nd file contains:
aaaa
cccc
xxxx

So my comparison looks like:
xxxx -> aaaa - Nothing.
xxxx -> cccc - Nothing.
xxxx -> xxxx - Success!

and it should continue like this:
aaaa -> aaaa - Success!
and so on ...

But it doesn't, instead of comparing "aaaa" with "aaaa" it compares "aaaa" with "\n".

Thanks for advice :)

Edited by Slammerek, 21 August 2011 - 04:26 AM.


#5
Revolt

Revolt

    Programmer

  • Members
  • PipPipPip
  • 99 posts
The problem is that you are iterating over the file handles. Each file handle points to the current position on that file so that successive read operations know where to start from.

When you enter the innermost for for the second time, you have already read all lines from file_2 and it won't read them again.

If the files are guaranteed to be small, you could read all lines to temporary arrays and use them on the for:

file2_lines = file_2.readlines()

for line2 in file2_lines

However, if you stumble upon gigantic files (1GB+), reading all lines to memory may not be the best approach. The alternative is calling file_2.seek(0) to return the file cursor to the start of the file right after the second for. This way it may take more time to parse (since it will be constantly reading from the file) but you will be able to read 1GB+ files even if you only have 512mb ram for instance.

#6
Slammerek

Slammerek

    Newbie

  • Members
  • Pip
  • 8 posts
Great, It works!
Thank you so much Revolt ;)




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users