Jump to content

How to do regexes in C.

- - - - -

  • Please log in to reply
5 replies to this topic

#1
rocketboy9000

rocketboy9000

    Learning Programmer

  • Members
  • PipPipPip
  • 79 posts
Regexes are an extremely useful feature found in many higher level interpreted languages, like Python, Perl, Ruby, and PHP. But with the standard posix header "regex.h", you can have them in regular ol' C! How lovely. I'll assume you know how to write them, so let's get down to business. Say we want to match big words (at least ten letters) that ends in "tion". We write the following regex into a string. We're using extended regular expressions which have the more fancy syntax we've come to know and love.
const char *pattern="[a-zA-Z]{6,}tion";
Now, how to use this? Well, it turns out that first we have to compile it into a structure called a regex_t, using the function regcomp.
regex_t rx;
regcomp(&rx,pat,REG_EXTENDED);
We must also make a variable of type regmatch_t to hold the data we'll get from the regex. We then call regexec to match the pattern to a string:
char s[100];
fgets(s,100,stdin);
regmatch_t res;
int matched regexec(&rx,s,1,&res,0);
Now, matched will be 0 if it matched, or E_NOMATCH if it didn't. res.rm_so contains the coordinate of the start of the matching substring if any, and res.rm_eo contains the end coordinate.
You can call regexec over and over on different strings. Eventually, though, you'll want to stop using the regex. Then you should call regfree to free the memory used by the compiled regex.
regfree(&rx);
To demonstrate, here is a function which matches a string to a pattern and returns a copy of the result.
regmatch_t rxmatch(char *s,char *pat){//returns a dst copy of matching substring.
    regex_t rx;
    regmatch_t res;
    regcomp(&rx,pat,REG_EXTENDED);
    regexec(&rx,s,1,&res,0);
    regfree(&rx);
    return res;
}
Generally though you should compile each regex you use only once, especially if you use a single regex on many strings.

Edited by Alexander, 03 December 2011 - 01:56 AM.


#2
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
Ah, a taste of regular expressions for POSIX - a great starter. Thank you.

An ironic thing, I had used some regular expressions to translate [tt] in to [noparse][inline][/noparse] tags in your post that we have available instead!

Now that the tutorial is "regexes in C", do you know what other operating system's (i.e. Windows) headers for regular expressions are?
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#3
TheCompBoy

TheCompBoy

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 272 posts
Great tutorial! Very usefull.
Think my post we're usefull? Please take your time and press the Like button at my post, Big Thanks!
For great C# & Android tutorials visit my blogg: http://www.thecompboy.com/

#4
DarkLordofthePenguins

DarkLordofthePenguins

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 409 posts
I thought regex.h was deprecated.
Programming is a journey, not a destination.

#5
rocketboy9000

rocketboy9000

    Learning Programmer

  • Members
  • PipPipPip
  • 79 posts
Nope that's regexp.h

#6
fread

fread

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 787 posts
Vary handy, gave me a couple of ideas.
Perfection of means and confusion of ends seem to characterize our age. Albert Einstein :confused:




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users