ECS 110: Data Structures and Programming Discussion Section Notes -- Week 6 Ted Krovetz (krovetz@cs.ucdavis.edu) ======================================== This week: o Reading from the command line o Reading from files o Skip Lists 1. READING FROM THE COMMAND LINE -------------------------------- In the current assignment, you must get some information from the command line. The way both C and C++ pass this information to the main program is the same. The parameters argc and argv contain the number of terms in the command and an array of char*, respectively. Argv[0] is guaranteed to be the name of your program and argv[argc] is guaranteed to be NULL. Argv[1]..argv[argc-1] are each term the user typed into the command interpreter after the program's name. Thus, the command line prog1 7.5 -d would fill argc and argv as follows: argc = 3 argv[0] = "prog1" argv[1] = "7.5" argv[2] = "-d" argv[3] = NULL Notice that argv[1] is a string of characters and that we need to convert it to a floating point type to be useful in our program. Here are two examples of getting the command line and converting the command strings into useful types. Our first example relies on the C libraries. It is really an example of mixing C library functions with C++ code. This type of mixing is often necessary because there often is no equivalent function in C++. #include #include int main(int argc, char* argv[]) { if ((argc != 2) && (argc != 3)) { cerr << "Usage: " << argv[0] << " mean_arrival_interval [-d]" << endl; return(1); } mai = atof(argv[1]); verbose = (argc == 3); cout << "Verbose = " << (verbose ? "TRUE" : "FALSE") << endl; cout << "MAI = " << mai << endl; return 0; } Our second example doesn't change very much, but the change that we make eliminates the need for the C libraries. Instead, we include a new C++ system header called which allows you to define a custom I/O stream initialized by a string. Because C++'s I/O streams have powerful type conversions built-in, we can easily use the stream to convert our string into a floating-point number. [Note: this level of C++ knowledge is not part of the class. It is only given here to demonstrate C++'s power and flexibility.] #include #include int verbose; double mai; int main(int argc, char* argv[]) { if ((argc != 2) && (argc != 3)) { cerr << "Usage: " << argv[0] << " mean_arrival_interval [-d]" << endl; return(1); } istrstream ist(argv[1]); ist >> mai; verbose = (argc == 3); cout << "Verbose = " << (verbose ? "TRUE" : "FALSE") << endl; cout << "MAI = " << mai << endl; return 0; } 2. READING FROM FILES --------------------- File I/O is handled by including fstream.h. This contains the classes ofstream and ifstream for output and input file stream creation and manipulation. To properly open and manage an ifstream or ofstream related to a system file, you first declare it with an appropriate constructor. ifstream(); ifstream(const char* name, int mode = in); ofstream(); ofstream(const char* name, int mode = out | trunc); The constructor of no arguments creates a variable that will later be associated with a file. The constructor of two arguments takes as its first argument the named file. The second argument specifies the file mode. On newer systems, there is a third argument for file protection. The default file mode arguments are usually appropriate. For ifstreams, the default is to open the file and start reading at the beginning. For ofstreams, the default is to create the named file and begin writing from the beginning, and if the file already exists then erase its contents first. The file mode can be set explicitly by bitwise-or of any of the following flags. in - input mode app - append mode out - output mode ate - open and seek to end of file nocreate - open but do not create trunc - discard contents and open noreplace - if file exists open fails If the opening fails, the stream is put into a bad state. It can be tested with the operator ! . If you wish to explicitly open and close the files, the following member functions are available. void ifstream::open(const char* name, int mode = in); void ifstream::close(); void ofstream::open(const char* name, int mode = out | trunc); void ofstream::close(); For example two ways to open the file "foo.txt". #include void my_func(void) { ofstream os("foo.txt"); ifstream is; is.open("foo.txt"); if ( ! is || ! os) cerr << "Opening failed" << endl; } Once the file stream has been opened, you may use it just as you have been using the predefined cin and cout. For example, is >> x; os << "Hey buddy! Look at me." << endl; 2. SKIP LISTS ------------- You have seen binary search trees and hash tables as ways to store and retrieve information. These are common ways to implement a data structure commonly known as a dictionary or map and support such functions as: insert, delete, and find. Today we introduce to you an alternative implementation of a dictionary. We could implement a dictionary as nothing more than a sorted linked list. To insert, first we would traverse the list until we came to the point where we expect the item to reside and do whatever is appropriate. Delete and find would likewise require traversing the list to the point where we expect the element. So, all the operations require O(n) time. This motivates a story. Let's say you're at the beginning of a long street in a big city. There are lots of buildings along the street. About 1/2 of them are 1 storey buildings, about 1/4 are 2 storeys, 1/8 are 3 storeys, and in general about 1/(2^n) of them are n storeys tall. Further, let's say you are in front of building addressed 1 (which happens to be the tallest building in town) and you are looking for address 500. Also, let's say the addresses increase from building to building, but not by any predictable amount. How do you find your address? Well, the simple way is to walk along the street and look at the address of each building until you find it. However, for finding an arbitrary address, this would take roughly O(n) time (where n is the total number of buildings). This is analogous to searching through a linear linked list. [How would you do it if you had the ability to teleport to an arbitrary building number. Not address number, but building number. i.e. the 7th building.] Unluckily, you left your teleporter at home and grabbed your super jumping shoes instead. So, the special ability you do have is to jump out of a window and land in the next building of the same height (or higher). When you land in the building, you immediately know what its address is. How might you get to 500 now? The answer lies in the skip list. If we look at the number of buildings which are at least 1 storey high, we see all n of them are. At least 2-storeys tall, n/2 of them are; 3-storeys, n/4; etc. At some point we expect there to be only around 1 building of a particular height. What is that height? Well it happens around when 2^s = n. Solving for s gives us an expected maximum height of log n. How can we exploit the fact that at some level there is only a single building of that particular height? Well, you can climb up building 1 to the height of log n and jump out of the window. You expect there to be only a single building of that height, and if it was built at a random location along the street, then you can expect it's building number to be roughly n/2. You check the address of the building you landed in and you immediately know if you went too far or not. But the best thing is, you should expect to have cut your problem in half. The same idea can be applied to a computer. A skip list has at its core a sorted linked list of data. However, each node in the list also has a number of extra "forward pointers". If node b has height x, then it has x forward pointers, each pointing to the next node of at least the same height. So, lets say node b has height 4. That means it has one forward pointer to the next node of at least height 4. One to the next node of at least height 3. Etc. [Draw example] The key is to make each node have its height randomly selected as were the buildings. 1/2 height 1, 1/4 height 2, 1/8 height 3, etc. This happens at insertion. When we insert a new node in the list, we assign it a height in the following manner. int get_rand_height(void) { int lvl = 1; Random r; while (r.get_rand(0,1)) lvl++; return (lvl); } Given a sorted list where each node has some number of forward pointers, we can quickly find our location in the following manner bool find(x) { cur = head; for i = highest_level downto 1 while (cur.next[i]->data < x) cur = cur.next } We know that the maximum height is expected to be log n, and so the for loop activates the traversal O(log n) times. But, we also know that as we halve our problem each time and descend a level we expect only around a single node (or building) to exist in the current subproblem. Thus we expect to find our element in O(log n) time. Note that we get this time bound no mater what the input. The random heights are assigned independently of the input data. So, unlike binary search trees, we expect our structure to be well balanced no matter what the input, and without any expensive balancing routines. Code for a templatized skip list will be placed on the course web site soon.