Answers to the reading questions are due in class, in writing, on the day that the reading is due.

You do not have to copy the questions.

• Tu Sept 29: Get the book. While we don't have the book, read Section 1 in the Blelloch/Maggs Parallel Algorithms chapter. We won't be getting into network architectures, so you can skim 1.1.1 and Theorem 1.2, and skip part 1.5 entirely. We will discuss Theorem 1.1!
• Assume a circuit only includes addition gates with two inputs, as in Figure 4. Is it possible to design a circuit that adds up n numbers in O(1) time? Why or why not?
• On page 9, it says, "The latency is often modeled by considering the worst-case time assuming that the network is not heavily congested." Using this definition, what is the latency of an n-by-n two-dimension grid (page 7)? Give a one-sentence explaination of your answer.
• For Thurs 10/1: Prefix operations. Read Section 1.1 and 1.2, and the radix sort algorithm in Section 1.3 in Blelloch's chapter on prefix sums, which contains lots of other useful information, including a broader definition of the operations for which scan is useful. Skim Sections 1.4 and 1.5, we will cover them in class.
• Do exercise 1.2 on page 57. Note that by "compare" he means determine which string comes first in alphabetical (lexicographical) order. What does he mean by "prescan"?
• Express the running time of radix sort as function of three parameters, n, p, and b, where b is the number of bits in the input numbers.
• For Tues 10/6: Pointer structures. Read Sections 2.2 and 2.7.1 in the book "Introduction to Parallel Algorithms". We will discuss these and then hopefully go on to Section 2.7.2 in lecture.
• Algorithm 2.4 clearly does a lot of extra work since once each processor gets to the root of the tree it keeps cycling around the self-loop. If we change the program so that each processor stops as soon as it detects the self-loop, will that improve the asymptotic work?
• For Thurs 10/8: Optimal work list ranking. Read Chapter 3, up to the end of Section 3.1.1.
• Describe using Algorithm 2.9, the initial coloring algorithm, on a tree instead of a cycle. Assume the tree is rooted, and given by nodes with parent pointers, so that each node has one out-going pointer, with the root node pointing to itself.
• For Tue 10/13 and Thurs 10/15: Professor John Owens on parallel programming on the GPU. No questions.
• For Tues 10/20: Merge-Path. This material is in GPU merge path - a GPU merging algorithm, by Green, McColl and Bader, and An optimal parallel algorithm for merging using multiselection, by Deo, Jain and Medidi. You may look at these, but the lecture will be self-contained.
• For Thurs, 10/22: The BSP model. Read Leslie Valient's A bridging model for parallel computation, up to the giant "W" on page 106. We will also discuss page 108, from the giant "A" to the end of the last paragraph that ends on the page.
Lecture Notes
• What does he mean on page 106 when he says, "Keeping g low or fixed as the machine size p increases incurs, of course, extra costs."?
• In the PRAM model, we can describe algorithms in terms of work and depth so that our analysis is independent of the number of processors p. But in BSP, results are always stated in terms of p and n. Why?
• For Tu 10/27, we'll read the first five pages of Randomized parallel list ranking, by Dehne and Song, up until the end of Section 3. Don't worry about the proofs in Section 2, just understand what Theorem 1 and Lemma 1 mean.
Lecture notes
• We are given an unsorted array of numbers, which we plan to sort. We want to pick a random sample and use Lemma 1 to argue that the random sample partitions the output sorted list into parts of size at most O(n/p) with probability >= 1 - 1/p. How big should the random sample be? The answer is not p. An asymptotic answer is OK. You will probably want to use the fact that (1-1/k)^k <= 1/e, where k>1 and e is base of the natural logarithm.
• For Tu 11/3: Read Section 3.2 on the Euler Tour technique.
• Do Exercise 3.13.
• For Th 11/5: Read Section 3.3 on rake and compress.
• Section 3.3.3 describes how to make an arbitrary tree binary. Is it possible to do this in O(1) depth? Explain how, if possible, and why not, if impossible. Clarification: given an array of pointers containing only the d leaves of an internal node, can you construct an array of pointers representing a binary tree on those d leaves, in O(1) depth? The final order of the leaves in the binary tree does not matter.
• For Tu, 11/10: Read the intro to Chapter 5, Section 5.1.1, and Section 5.1.3 on connected components.
• The first part of exercise 5.8: suppose Step 2 was omitted from Algorithm 5.2. Give an example in which the modified algorithm would require Omega(n) depth.
• For Th, 11/12: Section 5.1.4 on Minimum Spanning Tree.