ECS 110: Data Structures and Programming Discussion Section Notes -- Week 7 Janine Taylor (taylorj@cs.ucdavis.edu) FORWARD RADIX SORTING ===================== This week is all about programming assignment 3. The first half of the section will be devoted to demonstrating the required algorithm, and the second half will be about using the supplied input/output classes and answering questions. 1. Forward Radix Sorting ------------------------ Many students are having trouble understanding the original paper that assignment 3 is based upon. So, the first half of section will be devoted to demonstrating the algorithm. Too many pictures are required to demonstrate it, so no notes will be offered. Come to section to see how it works. (It might be useful for you to try the algorithm on your own before discussion on an example or even the example we will do in discussion to find out what questions you have about the algorithm.) The example we will be going over will sort a list of characters strings that have length <= 3 and have only the charaters a, b, c, d. The list of charaters strings is: cdb, dca, cba, dcb, dbb, acd, bdb, aab, cab, cbd 2. I/O Libraries ---------------- We have written some C++ classes to help you with the file input and output for assignment 3. The files can be found at: ~krovetz/ecs110/sortio.h -- Header for the classes ~krovetz/ecs110/sortio.C -- Source code for the classes ~krovetz/ecs110/sortio_sample.C -- Sample use of the classes You are welcome to copy any of these files. Sortio.C may undergo some change to speed it up or to fix any bugs. So, if you include sortio.C in your makefile, but leave the file where it is, your program will automatically benefit from any improvement. 2.1 Sortio.h ------------ This file contains the class interface to the supplied classes. You will need to include this file anywhere that you use any of the classes. There are three classes declared here: class Timer; class AutoTimer; class File; There is also one struct defined: struct Line; 2.1.1 class Timer ----------------- The timer class keeps track of time between certain events. One of Timers private data members is a variable which can be set to the current time. Upon construction, a variable of type Timer will automatically set its time to the time of instantiation. Any calls to Timer::reset() will reset the variable to the current time. The function member Timer::get_seconds() will return the number of seconds since construction or reset(), whichever is more recent. The declaration looks like this: class Timer { private: clock_t tick_count; public: Timer(); // (constructor) Calls reset() void reset(); // Stores current clock() float get_seconds(); // Seconds since last reset() }; NOTE: The SGI timing function (clock()) returns information about CPU time used, whereas the DEC timing function returns wall time used. Thus, when running these timing functions, the SGI results will be just a few seconds and the DEC results could be up to a minute. 2.1.2 class AutoTimer --------------------- An alternate way to output times is to instantiate a variable of type AutoTimer. Upon construction it notes the current time. Later, upon destruction, it notes the current time and outputs the difference since construction to cerr. Since construction and destruction for statically declared variable happen at the opening and closing of scope, this has the effect of timing the duration of a particular function. The declaration looks like this: class AutoTimer { private: clock_t tick_count; public: AutoTimer(); // Constructor automatically calls clock() ~AutoTimer(); // Destructor writes to cerr secs since construction }; 2.1.3 struct Line ----------------- What you are going to need to sort lines of text is an array of pointers to the data to be sorted. That is exactly what these routines will provide to you, an array of the following struct. Each struct will contain a pointer to the string of characters, and a length variable which tells you how many characters are in the string (excluding the NEWLINE character). The declaration looks like this: struct Line { char* str; // Pointer to string of characters int len; // Number of characters (excluding NEWLINE termination) }; 2.1.4 class File ---------------- This class does all the work. The constructor for the class requires a file name. The constructor then opens the file, reads it into memory and creates an array of Line after parsing for NEWLINE characters. If successful, you can then get the base of the array using File::get_array_of_line(), and the length of the array using File::get_num_lines(). Sorting can then be done directly on the array. Some additional utility has been included in the class. Operator<< has been overloaded to allow the writing of the strings to a stream. The strings are output in the current order of the array of Line. Also, operator[] has been overloaded, allowing range checked but slightly slower access to the array. For example, if f is a variable of type File, and a text file has already been read into f's array of line, You could access the length of the first string in the array in one of two ways: Line* array_of_line = f.get_array_of_line(); int len = array_of_line[0].len; // direct access to the array or int len = f[0].len; // [] has been overloaded to allow this. The declaration looks like this: class File { private: Line* array_of_line; long num_lines; char* base_write_pos; char* max_write_pos; long cur_line_pos; long max_line_pos; void set_next_line(char* next_loc, int num_read); public: File(char*); // Constructor reads and parses named file ~File(); // Destructor releases memory Line* get_array_of_line(); // Return pointer to array of Line long get_num_lines(); // Return number of entries in array Line& operator[] (long i); // Return reference to ith entry in array friend ostream& operator<< (ostream& os, File& f); // Write file }; 2.2 Sortio_Sample.C ------------------- This file uses the supplied I/O classes to read a file and write it to cout. You may want to copy it into your directory and use it as a starting point. #include #include "sortio.h" //------------------------------------------------------------------ // main - // // Sample program reads a file and then writes it to cout. // //------------------------------------------------------------------ int main(int argc, char* argv[]) { if (argc != 2) { cerr << "Usage: " << argv[0] << " file_name" << endl; return 1; } AutoTimer auto_timer; // Cause automatic timing File file(argv[1]); // Open and process file into array Line* lines = file.get_array_of_line(); // Get array of Line long num_lines = file.get_num_lines(); // Get length of array // MySort(lines, num_lines); // Call your sort here cout << file; return 0; } 2.3 Special considerations -------------------------- Note that the strings that the array of Line point to are _NOT_ terminated by '\0'. This means that none of the usual C or C++ string related functions will not work on them. This is okay. Associated with each string pointer is the length of the string pointed to. This should be enough for the sort. The length is the number of the bytes in the string _minus_one_ (the terminating '\n' is not included in the length). Also, if the file you are sorting does not end with a '\n', then one will silently be added for you. If this is the case, your output file will be one byte longer than your input.