Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

[PYTHON] Replace strings in all files in directory and subdirectories

string

  • Please log in to reply
7 replies to this topic

#1 ZekeDragon

ZekeDragon

    CC Leader

  • Retired Mod
  • PipPipPipPipPipPipPip
  • 1263 posts

Posted 01 September 2009 - 09:55 PM

It's probably a worthless script, I'd imagine there are plenty of tools on Linux to do what I wanted to do, I just didn't know about them. This is a really simple and extremely badly programmed script, while writing this I realized just how rusty I was with Python and how much better I had gotten since I used it. I know there are better ways to do what I did, and I'd like to hear from anyone who knows the language better than I (Hignar?) what parts I did terribly wrong. I want to get back in gear with this language!

I know I probably could have done this with awk and sed or even Perl, but I don't know those, I at least can code in Python!
import os,sys

def getFiles(dir):
  foundFiles = []

  if dir[-1:] == "/":
    dir = dir[0:-1]

  for x in os.listdir(dir):
    if os.path.isdir(dir + "/" + x):
      foundFiles.extend(getFiles(dir + "/" + x))
    else:
      # You can replace/comment out this if you want to, it'll only work
      # with "html" files if you don't.
      if x[-4:] == "html":
        foundFiles.append(dir + "/" + x)
  return foundFiles

def fixFile(file):
  """
   I would use "r+" here, but it didn't work right, it was acting
   to append, not to write over. So since I was far too lazy to
   do it right the script ends up with this hackish open/close
   cycle that is probably really inefficient. Oh well.:P
  """
  infile = open(file, "r")
  text = infile.read()
  if text.find(sys.argv[2]) != -1:
    infile.close()
    infile = open(file, "w")
    text = text.replace(sys.argv[2], sys.argv[3])
    print "Replaced strings in " + file
    infile.write(text)
  infile.close()

# This program doesn't have a concept of input checking, you better know how
# to use this, don't look at me!
if len(sys.argv) != 4:
  print "Usage: %s <dir> <starting_string> <replacing_string>" % sys.argv[0]
  sys.exit(1)

files = getFiles(sys.argv[1])

for file in files:
  fixFile(file)
This script takes in a directory as the first argument, then searches through all of the files in that directory, searches through each file that ends with an "html" (that's hard coded, but you can change that or even comment it out if you want to), and replaces every instance of the first string (second argument) with the second string (third argument). Example would be like this, which is why I threw it together:
python FixFiles.py /usr/local/share/jdk/docs http://java.sun.com/docs/books/tutorial "file:///home/zekedragon/Documents/Programming/Java Docs/Trails"
I used it when I downloaded both the Java Documentation and the Java Tutorials to make them link to each other in my local drive instead of the ones on the java.sun.com website. I thought it would be more convenient for me if they all linked to each other, not to the internet. That's why I wanted to go through each file in a directory and all it's subdirectories. That way I'd only have to issue the command once. ^_^

I post it here in the hope someone might find it useful, though I don't think it will be. :P

Edited by ZekeDragon, 01 September 2009 - 11:22 PM.

  • 1
If you enjoy reading this discussion and are thinking about commenting, why not click here to register and start participating in under a minute?

#2 debtboy

debtboy

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 499 posts

Posted 02 September 2009 - 04:16 AM

Thanks for the code ZekeDragon, :thumbup1:
I'm just now getting around to learning Python.

You are right, you could have done it in sed.
I am lazy, so I try to find the easiest way to do something. ;)

assuming all the files are in your home directory...

sed -i 's/original_text/replacement_text/g' ~/*.html

  • 0

#3 debtboy

debtboy

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 499 posts

Posted 02 September 2009 - 12:27 PM

To descend through sub-directories, a few more lines are needed

#!/bin/env bash
find ~/ -name '*.html' > holding

while read line
do
   sed -i 's/original_text/replacement_text/g' $line
done < "holding"

Looking at your Python code now,
I'll break it down then try it out.

Thanks again for posting it :thumbup1:
I needed some examples to work through.
  • 0

#4 ZekeDragon

ZekeDragon

    CC Leader

  • Retired Mod
  • PipPipPipPipPipPipPip
  • 1263 posts

Posted 02 September 2009 - 12:41 PM

Yeah I knew there was probably an easier way to do it, I just wasn't familiar with the tools. The good thing about this is that it can be very easily retrofitted to work with Windows, so it's not entirely *nix specific, like (I believe, don't shoot me if I'm wrong) sed is. Maybe through cygwin, but I can't see any other way.

I could have cut down on my code line count using sed, but again, I really am not familiar with sed's syntax (nor awk), and I really should learn more about those two, I've read that they're very powerful for certain needs. Like regular expressions. :P
  • 0
If you enjoy reading this discussion and are thinking about commenting, why not click here to register and start participating in under a minute?

#5 debtboy

debtboy

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 499 posts

Posted 02 September 2009 - 12:54 PM

Yea I limit my bash shell scripting to LOCAL admin tasks.
Now Tcl and Perl are a different story.

Keep the Python scripts coming...
this Python newbie really appreciates it!!! :thumbup:

Thanks
  • 0

#6 debtboy

debtboy

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 499 posts

Posted 02 September 2009 - 05:06 PM

The python code works like a charm!!

One thing is it reports...
"Replaced strings in" and list every html file it finds
even if it didn't replace any strings in that file.
(I understand that it didn't find the string it was searching for,
but it would be better if it only listed files where it actually replaced a string)

Thanks again for posting, I'm still disecting ;)
  • 0

#7 ZekeDragon

ZekeDragon

    CC Leader

  • Retired Mod
  • PipPipPipPipPipPipPip
  • 1263 posts

Posted 02 September 2009 - 05:11 PM

It should only report that for files that it replaced strings in, that worked for me. While it doesn't tell you where it replaced the string, it still doesn't tell you it's replacing strings in files it didn't replace any in. Did it still do that on your computer? O_o
  • 0
If you enjoy reading this discussion and are thinking about commenting, why not click here to register and start participating in under a minute?

#8 debtboy

debtboy

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 499 posts

Posted 02 September 2009 - 05:35 PM

My bad, your code is good!!

input:
[B] python FixFiles.py ~/ "test" "testing"[/B]

output:
debtboy@Linuxserver ~ $ python FixFiles.py ~/ "test" "testing"
Replaced strings in /home/debtboy/.mozilla/firefox/hmanch9t.default/bookmarkbackups/bookmarks-2009-08-28.html
Replaced strings in /home/debtboy/.mozilla/firefox/hmanch9t.default/bookmarkbackups/bookmarks-2009-08-29.html
Replaced strings in /home/debtboy/.mozilla/firefox/hmanch9t.default/bookmarks.html

Looking at the source, I see you are correct!! :thumbup:
I was looking for test that became testing
my bad. As you can see from this portion of the HTML

ID="rdf:#$GvPhC3">Getting Started</A>
<DT><A HREF="http://en-US.fxfeeds.mozilla.com/en-US/firefox/livebookmarks/" LAST_MODIFIED="1251435011" FEEDURL="http://en-US.fxfeeds.mozilla.com/en-US/firefox/headlines.xml" ID="rdf:#$HvPhC3">Latestinginginging Headlines</A>


it changed latest to latesting and you can also see that
I ran it 4 time (Ha! Ha! Ha!)

Your code is GOOD, me the user is the problem :D
  • 0





Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download