Jump to content

New BASH programmer needs to rename thousands of files

- - - - -

  • Please log in to reply
5 replies to this topic

#1
JAS

JAS

    Newbie

  • Members
  • Pip
  • 5 posts
I have some programming experience and I am trying to help my friend - I was enlisted by my adviser to help my friend :) - rename literally thousands of files using BASH on Linux (probably Gentoo Linux).

We created the files doing research and have thousands of them with the following name format:

DATA2_<date>_<gridno>_<height>_<delta>

For example:

data2_22-Jun-2011_112_17.8571_22
data2_21-Jun-2011_86_22.9885_22

Basically, a bunch of fields separated by underlines. There are two problems with these names. First, we were - well, he was :) - supposed to use length, not height, and we need all the fields to have the same length (so all the filenames are the same length). In other words, I need this code Bash script, using regexp:

[LIST=1]
[*] Get next (or first) file (call it fname)
[*] Parse fname by "_" into fields [I]data2[/I], [I]date[/I], [I]grid[/I], [I]height[/I], [I]delta[/I].
[*] Let length=grid*height
[*]Left and right pad [I]length[/I] with zeros so all the same length.
[*]Left pad [I]grid [/I]with zeros so they're all the same length
[*] Let newfilename = concatenate fields [I]data2[/I], [I]date[/I], [I]grid[/I], [I]length[/I], [I]delta[/I], with "_" separator.
[*] Rename fname to newfilename
[*] Get next file (or end)
[/LIST]
I don't know how to do any of this in BASH and (frankly), don't have the time to learn a whole new language right now (my thesis is due soon). So...

Can someone get me started programming this? Can someone (could be the same person :) ) point me to a good BASH programming tutorial?

Thanks much!
Jeff

#2
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Multiplying the floating-point numbers are going to be a real problem for bash. I can do it relatively quickly in C if you like.
sudo rm -rf /

#3
JAS

JAS

    Newbie

  • Members
  • Pip
  • 5 posts

dargueta said:

Multiplying the floating-point numbers are going to be a real problem for bash. I can do it relatively quickly in C if you like.

You mean just the multiplication or the whole script?

Since I need to learn C on BASH for my thesis, I'll take it in C. It will be good sample code for me to model.

#4
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Just the floating point numbers. Couple of questions
1) What's the largest number of digits for the grid number you have? Is 4 okay?
2) What precision do you want for the calculated length?

EDIT: Nevermind, just change the constants in the sprintf statement.

#include <stdio.h>

#include <stdlib.h>

#include <string.h>


/*

WARNING:

This code makes a few assumptions:

 - Only the file name contains underscores, not any other component of the path

 - The entire path is short enough to fit into FILENAME_MAX as defined on the

        system this is compiled for. There are no checks against buffer overflows.

 - The format of the file names passed in is exactly as specified in your post.

        There is no error checking for this.

 

I am not responsible for loss of data or any other damage caused by this.

*/


int main(int argc, char **argv)

{

    char *dup, *data2, *date, *gridstr, *heightstr, *delta;

    char new_name[FILENAME_MAX];

    unsigned long grid;

    double height;

    int file;

    

    /* Rename each file */

    for( file = 1; file < argc; ++file )

    {

        /* Copy the file name so we can modify it */

        dup = strdup(argv[file]);

        

        /* Split the file name into constituent parts. */

        data2 = strtok(dup, "_");

        date = strtok(NULL, "_");

        gridstr = strtok(NULL, "_");

        heightstr = strtok(NULL, "_");

        delta = strtok(NULL, "_");

        

        /* Convert strings to integers/floats */

        grid = strtoul(gridstr, NULL, 0);

        height = strtod(heightstr, NULL);

        

        /* Create the file name */

        sprintf(new_name, "%s_%s_%04lu_%04.06f_%s",

                data2, date, grid, grid * height, delta);

 

        /* Move */       

        rename(argv[file], new_name);

        

        /* Free the memory we allocated */

        free(dup);

    }

    

    return 0;

}


Edited by dargueta, 14 July 2011 - 09:10 PM.
Added code

sudo rm -rf /

#5
JAS

JAS

    Newbie

  • Members
  • Pip
  • 5 posts
Thanks dargueta. Can this be compiled using GCC (that's a General C Compiler which is included with Gentoo), and when executing it, how do I pass the file names in? Sorry for so many basic questions, I'm very unfamiliar with this platform.

#6
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Yes, compile it with GCC. You can use name globbing to get the file names you want.

If you're in a folder containing only your target files then you can do this:

./a.out *


Or maybe you just want files that start with data2_:

./a.out data2_*


If you have subdirectories as well whose names do not contain underscores then you can do one of these...

find . -type f -print0 | xargs -0 ./a.out

find . -type f -name 'data2_*' -print0 | xargs -0 ./a.out

...and it'll run the program on every single reachable file in the current directory and below.
sudo rm -rf /




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users