Concepts of Parallel and Distributed Processing 
Fall 2005

Charles E. Hughes
Computer Science Department
University of Central Florida

Structure: TR 15:00-16:15, CSB 221; 29 class periods, each 75 minutes long.
Go To Week 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Instructor: Charles Hughes; CSB 206; 823-2762;
Office Hours (for class): TR13:15-14:30

TA: None

Reference Material:
Gregory R. Andrews, Multithreaded, Parallel and Distributed Programming, Addison-Wesley, 2000.
Ananth Grama et al, Introduction to Parallel Computing, Addison-Wesley, 2003.
Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill, 2003.
Tarek El-Ghazawi, UPC: Distributed Shared Memory Programming, John Wiley and Sons, 2005.
Class notes

Prerequisites: COP3530 (CS3), COT3100 (Discrete Structures),CDA3103 (Computer Organization), COP3402 (Computer System Concepts).

Implementation Environments: You will be regularly using Java and C. When in Java, we will be platform independent. When  in C, we will use Linux and perhaps Windows. We will use MPI library for cluster computing. We may also use UPC, a relatively new system associated with SPMD programming.

Assignments: 4 to 6 small to moderate programming assignments (some are multi-part) using a variety of parallel and distributed programming paradigms. 4 to 6 non-programming assignments. One project using various distributed paradigms.

Exams: Two midterms and final.

Important Dates (Quiz Dates are Subject to Change): MidTerm1 – October 6; MidTerm2– November 3; Thanksgiving – November 24; Withdraw Deadline – October 14; Final – December 8, 13:00-15:50.

Evaluation (Tentative):
Mid Terms -- 150 points (90 for better; 60 for other)
Final Exam -- 150 Points
Assignments -- 200 Points
Total Available: 500
Grading will be  A-90%+, B+-87%+, B-80%+, C+-77%+, C-70%+, D-50%+, F-below 50%

Java – Eclipse; MPI; UPC.


Weeks#1: (8/23, 8/25)
  1. Concurrent Programming Concepts
  2. Introduction, even-odd transposition algorithm (SPMD), analysis
  3. Concepts of analysis of parallel algorithms
  4. SIMD solution to even/odd transposition sort

    Assignment #1:
    1. Consider the Max Tree we described earlier, but now use p processors to sort N values, where N may be greater than p. In this case, each processor uses a sequential algorithm to find its local max and then proceeds with the standard tree approach. Analyze the Time, Cost and Work for
    a. p = lg N
    b. p = lg lg N
    c. p = N / lg N
    d. p = N / lg lg N
    2. Consider the problem of sorting a deck of cards into Spades, Hearts, Diamonds, Clubs, and in order Ace down to 2 within each suit.
    a. What is the best way for an individual to do this? Describe the approach and analyze the number of comparisons and inspections (e.g., what suit is this) done.
    b. Redo this but this time with five people. One of the five starts with all 52 cards. You will need to analyze the additional number of times a card is passed from one person to another. We assume random access; that is, any person can pass a card to any other at the same cost. The ending state of the system is that all 52 cards must be in one person’s hand, sorted..
    Due: 9/1

Week#2: (8/30, 9/1)
  1. Taxonomies (control, address space, interconnection network,granularity)
  2. Taxonomies (control)
  3. Taxonomies (address space)
  4. Taxonomies (interconnection network)
  5. More on static networks
  6. Routing in static networks
  7. Reduction (all to one), broadcast (one to all), all to all on hypercube
  8. Granularity (BSP model)

    Assignment #2:
        See page 35 of notes
    Due: 9/8

Week#3: (9/6, 9/8)
  1. PRAM Model
  2. Programming Styles
  3. Foster's Deign Methodology
  4. State, history, properties
  5. Notation for concurrency
  6. Critical reference
  7. Locks and Barriers
  8. Axiomatic Semantics (very brief)
  9. Fairness
  10. SpinLocks

Assignment #3:
    See page 59-61 of notes
Due: 9/20

Week#4: (9/13, 9/15)
  1. Java Support for Concurrency
  2. Reflexive Transitive Closure
  3. All Shortest Path
  4. More analysis of Max algorithms + Brent's scheduling
  5. Fair Solutions
  6. Barrier Synchronization

Programming Assignment #1:

    Implement Floyd's all-shortest-pairs algorithm in Java, assigning onethread to each row of the adjacency matrix.
    I will give you a start by providing a sequential version of this (onethread).
    Floyd Implementation
Due: 9/29

Week#5: (9/20, 9/22)
  1. Data Parallel
  2. Bag of tasks
  3. SIMD computation
  4. Semaphores
  5. Monitors
      • signal and wait versus signal and continue
      • queues, priority queues, BPOTs, heaps and analysis
      • bitonic lists
      • semaphores
      • bounded buffers
      • readers/writers
      • shortest-job-next
      • timers
        • covering condition versus priority wait
      • sleeping barber
      • CSCAN disk scheduler
      • SCAN disk schedule

    Assignment #4:

    1. The Unix kernel provides two atomic operations similar to the following:
      sleep( ): // block the executing thread
      wakeup( ): // awaken all blocked threads
      A call of sleep always blocks the caller. A call of wakeup awakens every thread that has called sleep since the last time that wakeup was called.

      a.) A call to wakeup should awaken all tasks that entered sleep prior to the time wakeup is started, but not any that arrive later. The following “solution” has a serious problem in that it can wake up the wrong tasks. Explain how this undesirable result can happen.
      sem e = 1, delay = 0; int count = 0;
      sleep( ): P(e) ; count++; V(e); P(delay);
      wakeup( ): P(e); while (count > 0) { count--; V(delay); } V(e);

    b.) Develop and present a correct implementation of sleep and wakeup using semaphores for synchronization.
    (Hint: Consider a solution that uses baton passing.)

    1. Suppose there are m producer processes and n consumer processes. The producer processes periodically call broadcast(message) to send a copy of message to all n consumers. Each consumer receives a copy of the message by calling fetch(message, myId), where message is a result argument and myId ranges from 0 to n-1.
      Write a monitor that implements broadcast and fetch. Use the Signal and Continue discipline. The monitor should store only one message at a time, which means that after one producer calls broadcast, any future call of broadcast has to delay until every consumer has received a copy of the previous message. Be sure to declare all variables that are used in the monitor. You may assume the message is of type String, although its type is really irrelevant

    Due: 10/4

Week#6: (9/27, 9/29)
  1. Java Support for Monitors
      • constructor in Runnable specifies object that provides code for thread
      • constructor can also specify a string to identify thread
      • can do at granularity of method
      • can do at granularity of a block
  2. This ends the material for Midterm#1
  3. Message Passing
  4. Paths as a Declarative Means of Coordination
  5. Parallelizing Graph Algorithms
      • alternate data structures for adjacency (N2 verus E lgN)
      • computation cost N2/p
      • communication cost
        • Hypercube  O( N lg p ); use p = N/lg N
        • Mesh  O(N p˝ ) use p = N 2/3

Week#7: (10/4, 10/6)
  1. Topics and Promises for MidTerm1
  2. MidTerm1 on 10/6


Week#8: (10/11, 10/13)
  1. MPI
  2. Distributed Computing Paradigms
  3. UDP, Multicast UDP, TCP/IP

Assignment #5:
       Redo the three questions from Exam#1. See Assignment5.doc.
Due: 10/18

Programming Assignment #2:
       See Notes (page 179)
Due: 10/20

Programming Assignment #3
       See Notes (page 180)
Due: 10/27

Note: Last Day to Withdraw is 10/14


This is intended to be the only team assignment of the term. Teams of size two are optimal, although larger teams can be used if you convince me of the complexity. In general, I would like larger teams when two distinct approaches are taken to the same problem (e.g., an RMI and a tuple space approach, or an RMI and an MPI approach). Of course, I will expect a careful analysis of how the methods compare.

This is pretty open-ended, but here are some examples of reasonable projects.

  1. For Vision people
    1. Parallel or distributed edge detection
      1. Don't do Canny, as that was already done
    2. Parallel or distributed depth calculation
      1. Might use disparity maps from stereo
  2. For Graphics people
    1. Parallel tone or color mapping
      1. Map tone from one image to another
    2. Use background color shift to alter rendering of foreground objects
      1. Can compare various color theories
    3. Parallel or distributed compression/decompression of HDR images
      1. Could do for film clips rather than single images
      2. Can compare various representations
  3. For Systems people
    1. RMI that supports both synchronous and asynchronous calls
      1. Might experiment with a new type called "Future"
    2. Distributed Spoaces implementation using multicast UDP
    3. Spaces implementation in C# or C++
      1. Might do a comparison
  4. For Algorithms people
    1. Parallel evaluation of constraints
      1. For instance, mix Gaussian elimination with non-linearity
    2. Parallel graph rewriting
      1. Perhaps synthesized attribute evaluation on a tree, e.g., constant propagation
    3. Parallel or distributed Cocke-Kasami-Younger algorithm
    4. Parallel or distributed reconstruction of phylogenies

In all cases, you must design experiments to determine the scalability of your solution. If you have a team of two, you might have a run-off of a parallel versus a distributed solution. You will need to produce a manual that goes with this project. I want this to be a professional job.

Due: 11/3 (Proposal with use cases); 11/10 (Design Document); 11/29 (Demo);
12/4 (Final Report/Analysis, Software)

Week#9: (10/18, 10/20)
  1. Concurrent Objects
  2. Remote Method Invocation
  3. Tuple Space
  4. Mixed Reality Presentation

Week#10: (10/25, 10/27)
  1. Atomicity and Transactions
  2. Jini Object Request Broker (ORB)
  3. Distributed Garbage Collection
  4. Oblivious Comparison Exchange Sorts
  5. Shear Sort and its Analysis
  6. Formal proof of ShearSort correctness and timing.
  7. Revsort (a kin of ShearSort)
  8. Bitonic Sort
  9. Virtualizing sorts

Week#11: (11/1, 11/3)
  1. Midterm2 review and promises
  2. Midterm2 Sample Questions
  3. Answers to Midterm2 Sample Questions
  4. MidTerm2 on 11/3 Due:  11/17

Week#12: (11/8, 11/10)
  1. RMI example (
  2. Accelerated Cascading
  3. PCN (Program Composition Notation)
  4. Parallel Logic Programming (Prolog, CLP(R), Strand)
  5. Channel Paradigm
  6. CSP; reasoning through CSP; Modern CSP; Extensions

Take Home Exam Question:

Redo Question #5 (the semaphore one). You must now do a little better than I would have expected on th exam. Specifically, you must discuss all the issues of scehuling as I have done in the key for Question 4, so that you justify that your algorithm meets all the criteria laid out in #4.

Due:  11/15

Assignment #6:

  1. As regards Accelerated Cascading Max, analyze this algorithm if the binary tree reduction cutoff is:
    a.)  lg lg lg lg N
    b.)  square root(lg N)
    Determine which are fast and/or efficient. Do precise analysis.

  2. For each of (a) Bitonic Sort and (b) lg lg trees Max, operating on N values, determine if there is a magic p (similar to Brent's Scheduling), for which this algorithm is work efficient and fast ((lgN)^2 and lg lg N, resp.) when virtualized with each processor starting with N/p values. Show the derivation of this value of p and then prove it is optimal, as in Brent's choice of p = N/lg N, or argue convincingly that no such p can be found for arbitrary N.

Due:  11/17

Programming Assignment #4 (last one):

In the directory Bid, I have placed a server and clients for a bidding service. Essentially, clients can register with the server who assigns each a unique number. Any client can offer an object (via its name) for sale at some minimum price.

Besides this modification, and finding/fixing any observed bugs, you must expand this so there are two types of bid servers. The first is registered under the service "KnightBay" and the second as "PegasusBay." In KnightBay, the seller can close out bids at any time or when a desired goal amount is reached or exceeded. In the model, the price paid is the highest bid. In PegasusBay, the seller can close out bids at any time or when after some goal time (the time in milliseconds is specified in the same parameter field as the goal amount is specified in the first case.) In this second close-out strategy, the price paid is the second highest bid, but the winner is as always the highest bidder. The service chosen by a client is indicated in the command line. The service name registered by a server is also specified in its command line.

When a sales occurs, the server identifies the buyer and seller to each other. They are responsible for completing the sale in some off-line manner. As a small expansion on the current suloution, your clients must show a record of all items they personally sold or bought, with the sale prices. As currently implemented, the server uses a push strategy to send a vector of items and current bid prices to all clients each time a bid price changes, a sale is complete, or a new item is offered for sale. You should stay with that, except you are certainly welcome to sue some other form of collection that you deem more acceptable, so long as it's serializable.

I am not enamored with the user current interface. It is also the case that this server does not handle disappearing clients worth a toot. Moreover, clients can bid on any item offered for sale, included items they offered for bid. That unacceptable behavior must be blocked. You could even choose to penalize such a person by immediately declaring that the current bid is accepted.You will have to deal with these nasty little issues.

You must obey the interfaces given here, since I can choose to use my own client with your servers, and vice versa. In fact, each of you must identify one other student in the class whose servers you have tried with your client and whose client you have tried with your servers.  The name of this student must be part of your writeup. Yes, you do have to provide a user manual which gives use-case scenarios.

Due: 11/22

Week#13: (11/15, 11/17)
  1. Program Flow Analysis
  2. Control Flow
  3. Data Flow
  4. Flow Analysis and Parallelizing Code
  5. Flow Analysis and Parallelizing Code
  6. Scheduling Problems

Week#14: (11/22)
  1. Scheduling Problems
  2. Thanksgiving

Week#15: (11/29, 12/1)
  1. Meetings with me on Project status (11/29)
  2. Sample Questions for Final
  3. Discussion and Return of Assignments
  4. Review for Final Exam (Topics)

Final Exam:

© Charles E. Hughes, -- Last Modified December 1, 2005