so I'm trying to write a program that reads in a text file and creates a histogram of all the words that are present within that text (how many times each word appears in the sentence). I'm basically supposed to use a two dimensional array (where the first element contains the word and the second element contains the number of occurrences of of that word). the output should look like this:
this: * 10%
is: ***** 50%
word: ** 20%
hello: *** 30%
I'm not sure even how to begin as I don't know how to separate words from a text file (search for white spaces?). And I don't have enough experience working with character functions to know shortcut functions to analyze the words.. I would really appreciate any piece of advice on this.. thanks so much
Let me give you my way of solving this program...
Since the number of words in the file is not known, static array is not possible.. So a dynamic array is needed. So I would go for Linked list. For example,
The words are to be separated by white-spaces, full-stop, comma, end of line and end of file... So these words in a temporary variable (say char *tempWord)Code:struct array { char *text; int occurance; struct array *next; };
As soon as the word is found do these following steps:
1. Find the next word
2. check whether the list is free
2.a. If yes, create an entry, add the word to the entry, increment the occurance to 1 and make the next to point NULL.
Now search for the word in the entire file using strstr ( strstr is the command that points to the first occurrence in word of any of the entire sequence of characters specified in file, or a null pointer if the sequence is not present in file. )
when the word is found, increment the occurance and move the file pointer to next...
do this till EOF is reached. then go to step 1 if EOF is reached.
2.b. If no, search for the word in the list
2.b.(i). if present in the list, then go to step 1.
2.b.(ii). if not present, follow the step 2.a and do update the list.
follow these steps till EOF is reached...
----------------------
this is an idea.. you have to use variables accordingly..
I agree with linked lists, but in case you dont understand the concept, you also might want to allocate dynamically a new "page" each time you find x new words.
What I mean is that you start with an empty char* array[32], which you extend when you find the 33rd word(extend it another 32 char*s).
As for finding words, you simply need to keep two pointers around:
-1st the pointer to the last end of a word
-2nd the pointer to the end of the present word.
You need to increment the 2nd one until you find a word separator(space, dot comma).
If you want to compare two strings, you can use strcmp(I think in stdlib.h, or string.h), or you could easily write your own function(which I'd recommend if you're in a learning context)
The first thing I would suggest is making sure you have a list of useful functions: strcmp() comes to mind. I'm guessing this is homework, so you can probably assume words won't be overly long: 30 characters should be good, but be sure to test for it. Also, make sure you understand how scanf() interprets a word. If you can make scanf() do most of the work, that will help a LOT.
Thanks for the responses!.. the professor doesn't want us to delve too deep into pointers for this. He claims it can be done without. I have been able to read in the number of words in the text, and want to store each word in a row of a two-dimensional array (i was hoping to then loop through each row and use strcmp() to compare) but I can't put the array together. I'm looking at scanf and sscanf but am confused as to how they work to separate words.
Thanks for all the responses so far..
Take this small example:
Compile that program and run it. Type in two words and press enter. The program should only output one word. scanf stops putting data into the word variable when it hits a newline or a space so that should be convenient to get separate words. The only problem is that scanf will include things like periods and commas, so you will have to filter those out.Code:#include <stdio.h> int main() { char word[50]; scanf("%s", word); printf("%s", word); return 0; }
Root Beer == System Administrator's Beer
Download the new operating system programming kit! (some assembly required)
I disagree with Guest. scanf() has the same problem gets() does, in that there's no way to tell scanf() what the buffer size is, which will very easily lead to buffer overflows. You need to use something like fgets() to collect it onto a buffer, check to ensure the word isn't too large, then you can sscanf that line and do any necessary strncmp()s, however I'd still use a custom function. So, yeah, that's not the only problem with scanf.![]()
Wow I changed my sig!
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks