Be sure you have read (or at least skimmed) the assigned readings from chapter 7.
Do the following programming problems. You will end up with at least one code file per problem. Submit your program source (and any other needed files) by sending mail to bmassing@cs.trinity.edu, with each file as an attachment. Please use a subject line that mentions the course and the assignment (e.g., ``csci 1312 homework 6'' or ``CS1 hw6''). You can develop your programs on any system that provides the needed functionality, but I will test them on one of the department's Linux machines, so you should probably make sure they work in that environment before turning them in.
Yes, this writeup is long. But I think the code you write need not be, and it's an interesting problem!
You may have heard claims that E is the most frequently-used character in English text, followed by T, and so forth. Your mission for this assignment is to write two programs that together will allow you to find out how well this claim holds up in practice (and, okay, to give you some practice working with files in C):
sort -n -r filenameto display the results in a way that shows the most-often-used letter first, etc. One place to look for interesting (or at least non-trivial) text files is Project Gutenberg, though you should be careful to get the plain-text version of whatever book(s) you select (really-plain-text, not UTF8).
Note: For pedagogical reasons I do want you to write two programs; I don't think you'll learn quite as much writing only one. Also for pedagogical reasons -- and because I think it works out better for you and for me -- please go along with the part of the writeup that says that these programs get filenames from command-line arguments rather than by prompting the user as you've done in previous programs.
testing 1 2 3 4? TESTING 4 3 2 1!the output file should look like this:
24 total text characters 2 e 2 g 2 i 2 n 2 s 4 t
I recommend that you use an array of 26 counters, one for each character of the Roman alphabet. To help(?) you, the file alpha_index.c contains a function that examines a character read from the input file and either returns its index into the alphabet (0 for ``a'' or ``A'', 1 for ``b'', or ``B'', etc.), or -1 if the character is not alphabetic. You can either copy and paste the function from this file into your program, or you can put the file in your directory and use the line #include "alpha_index.c" in your program to have the compiler include it with your code.1
Of course(?), the program should check that the user supplied two command-line arguments and that the input and output files could be opened.
Hints:
24 total text characters 2 e 2 g 2 i 2 n 2 s 4 t
56 total text characters 3 a 1 c 2 d 6 e 2 f 1 g 3 h 4 i 2 l 2 m 2 n 9 o 2 p 4 r 3 s 7 t 1 w 1 ywhich, for the curious, is the output from processing this input:
Now is the time for all good persons to come to the aid of their party!
processing input file sample1-step1-out.txt 24 total text characters processing input file sample2-step1-out.txt 56 total text characters summary: 3 a (3.7500%) 1 c (1.2500%) 2 d (2.5000%) 8 e (10.0000%) 2 f (2.5000%) 3 g (3.7500%) 3 h (3.7500%) 6 i (7.5000%) 2 l (2.5000%) 2 m (2.5000%) 4 n (5.0000%) 9 o (11.2500%) 2 p (2.5000%) 4 r (5.0000%) 5 s (6.2500%) 11 t (13.7500%) 1 w (1.2500%) 1 y (1.2500%) 11 non-alphabetic text characters (13.7500%)Of course(?), the program should check that the user supplied at least one command-line argument and that all of the input files could be opened. It probably should also check, as it reads through the input files, that each one is in the right format (output of the first program), since otherwise the program might easily crash. It doesn't have to do this all at once: it's probably simpler to process the input files one at a time, and it's okay if the program starts processing and producing output and then bails out if it encounters an error.
Hints:
while (fgetc(infile) != '\n');Notice that in processing the single letter read from the ``how many occurrences of this letter'' lines you could once again use my alpha) function.