Concepts of Parallel
and Distributed Processing
Fall 2005 |
mailto:ceh@cs.ucf.edu
Structure: TR 15:00-16:15, CSB 221; 29 class periods, each 75 minutes long.
Go To Week 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
Instructor: Charles Hughes; CSB 206; 823-2762; ceh@cs.ucf.edu
Office Hours (for class): TR13:15-14:30
TA: None
Reference Material:
Gregory R. Andrews, Multithreaded, Parallel and Distributed Programming, Addison-Wesley, 2000.
Ananth Grama et al, Introduction to Parallel Computing, Addison-Wesley, 2003.
Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill, 2003.
Tarek El-Ghazawi, UPC: Distributed Shared Memory Programming, John Wiley and Sons, 2005.
Class notes
Prerequisites: COP3530 (CS3), COT3100 (Discrete Structures),CDA3103 (Computer Organization), COP3402 (Computer System Concepts).
Implementation Environments: You will be regularly using Java and C. When in Java, we will be platform independent. When in C, we will use Linux and perhaps Windows. We will use MPI library for cluster computing. We may also use UPC, a relatively new system associated with SPMD programming.
Assignments: 4 to 6 small to moderate programming assignments (some are multi-part) using a variety of parallel and distributed programming paradigms. 4 to 6 non-programming assignments. One project using various distributed paradigms.
Exams: Two midterms and final.
Important Dates (Quiz Dates are Subject to Change): MidTerm1 – October 6; MidTerm2– November 3; Thanksgiving – November 24; Withdraw Deadline – October 14; Final – December 8, 13:00-15:50.
Evaluation (Tentative):
Mid Terms -- 150 points (90 for better; 60 for other)
Final Exam -- 150 Points
Assignments -- 200 Points
Total Available: 500
Grading will be A-90%+, B+-87%+, B-80%+, C+-77%+, C-70%+, D-50%+, F-below 50%
Resources:
Java – Eclipse; MPI; UPC.
TENTATIVE SCHEDULE -- THE FURTHER OUT, THE LESS RELIABLE THIS IS
Weeks#1: (8/23, 8/25) - Concurrent Programming Concepts
- Introduction, even-odd transposition algorithm (SPMD), analysis
- Introduce basic concepts of concurrency and some of thecourse's goals.
- Introduce sorting on a linear array and on a tree. Inparticular, present a pretty detailed description of the even-oddtransposition sort on an array of processors. Discuss the theoreticalbounds on sorting with such a architecture. Discuss what happens whenyou have a ring, not justan array of processors. Discuss the notion of oblivious comparisonexchange sorting as a forward pointer to later proofs of correctness.
- Concepts of analysis of parallel algorithms
- Architectural considerations -- synchronous versusasynchronous; barrier synchronization; centralized control
- Issues of communication and coordination in parallel anddistributed implementations
- Present notions of Time, Cost, Speedup, Work, Cost Efficiencyand Work Efficiency.
- Virtualizing an algorithm -- focus on even-odd trasposition.
- Look at scalability of even-odd transposition sort.
- SIMD solution to even/odd transposition sort
- Discussion of implications of this architecture
Assignment #1:
1. Consider the Max Tree we described earlier, but now use p processors to sort N values, where N may be greater than p. In this case, each processor uses a sequential algorithm to find its local max and then proceeds with the standard tree approach. Analyze the Time, Cost and Work for
a. p = lg N
b. p = lg lg N
c. p = N / lg N
d. p = N / lg lg N
2. Consider the problem of sorting a deck of cards into Spades, Hearts, Diamonds, Clubs, and in order Ace down to 2 within each suit.
a. What is the best way for an individual to do this? Describe the approach and analyze the number of comparisons and inspections (e.g., what suit is this) done.
b. Redo this but this time with five people. One of the five starts with all 52 cards. You will need to analyze the additional number of times a card is passed from one person to another. We assume random access; that is, any person can pass a card to any other at the same cost. The ending state of the system is that all 52 cards must be in one person’s hand, sorted..
Due: 9/1
Week#2: (8/30, 9/1) - Taxonomies (control, address space, interconnection network,granularity)
- Taxonomies (control)
- SISD, SIMD and MIMD. Data-parallel versus task parallel. Data parallel's relation to SIMD
- Taxonomies (address space)
- Regarding address-space organization, differentiate between private memory (separate address spaces), also called distributed memory, and shared address space (often called shared memory). Discuss UMA (uniform / symmetric multiprocessors (SMP)) versus NUMA (non-uniform) memory access. Discuss concepts of cache and the cache coherence (consistency) problem.
- A multicomputer is a distributed memory multiprocessor in which the nodes and network are in a single cabinet. Such a system is tightly coupled and communication is over a dedicated high speed interconnection network.
- Note that a network of workstations is a form of distributed memory multiprocessor. Such a system is loosely coupled and communication is usually through message passing.
- Beowulf is a network or cluster of workstations generally running Linux.
- Distributed, shared memory refers to a distributed implementation of the shared memory model.
- Make the connection of multicomputer to distributed memory and multiprocessor to shared address space.
Differentiate between shared-address and shared-memory parallel computers. Note how the same picture showing all processors having memory, and no global memory can represent either a distributed or a shared address space (NUMA).
- Taxonomies (interconnection network)
- Regarding interconnection network, we can have static or dynamic interconnections. dynamic interconnections of bus and crossbar versus static interconnections of linear and completely connected.
- Dynamic interconnection networks use switches (indirect), rather than direct connects. Discuss crossbar, bus-based networks.
- Regarding the crossbar, discuss the cost of scaling (high) versus the performance degradation of scaling (low) for a crossbar. Note the cost of O(p2) switches.
- Regarding bus-based, relate this to networks of workstations. Note its cost scalability, but the difficulty of scaling its performance
- Discuss multistage. Regarding multistage, present the notion of a butterfly.
- Introduce static interconnection networks, noting connections are direct. Discuss completely-connected, star connected, linear array, ring, 2-d mesh, torus, 3-d mesh and torus, tree, hypercube.
- Note that completely connected are static counterparts of crossbars, but they allow multiple messages to emanate from one node to several destinations.
- Note that the central node in a star is the bottleneck, just as the bus is in a bus scheme. This is also true of the root of a tree.
- Discuss trees with switches on interior nodes and note the use of fat trees to avoid the bottleneck of the root.
- Hamming distance as shortest path in a Hypercube
- Introduce the notions of diameter, connectivity, and bisection width.
- Show these for star connected, completely connected, linear array, ring, 2-d mesh, torus, and 3-d mesh and torus.
- Generalize to k-ary, d-cubes
- Note the hypercube is a 2-ary, d-cube, having 2d processors. A ring is a p-ary, 1-cube. A 2d torus of p processors is a Öp-ary, 2-cube. A k-ary, d-cube can be created from k k-ary (d-1) cubes by connecting identical positions.
- Embedding Other Networks in Hypercubes
- Reflected Gray Code; rings and meshes
- More on static networks
- Reflected Gray Code; trees
- Discuss reduction on a hypercube vs a linear array. Note this can be viewed as a mapping of the tree reduction onto hypercube.
- Routing in static networks
- Present XY routing and E-cube routing as examples of dimension-ordered routing.
- Describe communication costs associated with static networks. Parameters are Startup Time (ts), Per-Hop Time (th), Per-Word Transfer Time (tw).
- Switching Techniques: Store-and-forward versus cut-through. Store-and-forward cost for m words traversing l links is tcomm = ts + (mtw + th) l. Since th is usually quite small, we simplify this to tcomm = ts + mtwl. Cut-through routing advances the message as soon as it arrives at a node. Wormhole routing is a specific type of cut-through in which the message is divided into small parts called flits (flow-control digits). Flits are pipelined through the network with an intermediate node needing to store the flit, but not the whole message. Since flits are of fixed size, the communication cost is tcomm = ts + lth+ mtw . Thus, store and forward is O(ml), whereas cut-through is O(m+l).
- Discuss deadlocking in wormhole routing.
- Reduction (all to one), broadcast (one to all), all to all on hypercube
- Granularity (BSP model)
Assignment #2:
See page 35 of notes
Due: 9/8
Week#3: (9/6, 9/8) - PRAM Model
- Programming Styles
- Iterative parallelism: co // and process notation
- Recursive parallelism
- Producer / Consumer
- Client / Server
- Peers: worker, send and receive notation
- Foster's Deign Methodology
- Partitioning
- Communication
- Agglomeration
- Mapping
- State, history, properties
- s1 -> s2 -> s3 ... ->sk : trace or history states; can have many traces in concurrent system
- states are altered by atomic action
- safety property : never enter a bad state
- liveness property : eventually enter a good state
- mutual exclusion is a safety property
- partial correctness is a safety property
- termination is a liveness property (finite histories)
- total correctness is both a safety and liveness property
- Notation for concurrency
- co s1; // s2; // ... // sn; oc : concurrency
- process name { ... } : background process
- < S; > : atomic action; critical section; mutual exclusion; granularity considerations
- < await(B) > : conditional synchronization; barrier synchronization
- < await(B) S; > : conditional atomic action
- { precondition } actions { postcondition } : basis for axiomatic proofs of correctness
- Critical reference
- Critical reference is one changed by another process
- At Most once Property (x = e); appearance of atomicity
- e contains at most one critical reference and x is not read by any other process; OR
- e contains no critical references
- Examples
- Locks and Barriers
- Critical section problem (vaguely stated here)
- Simple producer/consumer
- Axiomatic Semantics (very brief)
- Triples {P} S {Q}
- Note this is a safety condition; need all histories finite for liveness
- Fairness
- Unconditional Fairness:
- every unconditional eligible atomic action is eventually executed
- Weak Fairness
- unconditionally fair; OR
- every conditional eligible atomic action is eventually executed
- assumes condition is true long enough to be observed
- Strong Fairness (an impractical consideration)
- unconditionally fair; OR
- every conditional eligible atomic action is eventually executed, assuming the condition is infinitely often true
- SpinLocks
- Critical section problem
- mutual exclusion
- absence of deadlock and livelock
- absence of unnecessary delays
- eventual entry (relates to fairness)
- We now want to consider how to implement the < ... > primitive of text
- How do we handle code like <await (!lock) lock = true;> critical; lock = false;?
- test and set from IBM 360/67 2 processor machine
- while (TS(lock)) ; // returns entry value of lock (before this set)
- one memory cycle -- basically an atomic spin lock
- no guarantee of fairness
- results in serious memory contention for shared lock
- while (lock); while (TS(lock)) { while (lock); } // Test and Test and Set
- reduces memory contention
- Various other atomic instructions like FA (fetch and add)
- To implement unconditional atomic action < S; >
- CSEnter; S; CSExit; // CSEnter is entry protocol; CSExit is exit protocol
- To implement conditional atomic action <await (B) S; >
- CSEnter; while (!B) { CSExit; delay; CSEnter; } S; CSExit;
- if B satisfies at most once property can do < await(B);> as while(!B);
- Relation to Java synchronized
- synchronized (lock) { S; } is like <S;> // every process uses same lock object
- synchronized (lock) { while (!B) try{wait();}catch(...){} S; notify(); } is like <await(B) S;>
Assignment #3:
See page 59-61 of notes
Due: 9/20
Week#4: (9/13, 9/15) - Java Support for Concurrency
- Threads : either inherit from Thread class or implement Runnable interface
- constructor in Runnable specifies object that provides code for thread
- constructor can also specify a string to identify thread
- Synchronize : specifies critical section using an object as lock
- can do at granularity of method
- can do at granularity of a block
- Locks are reentrant
- Locks can be temporarily given up : wait and notify (notifyall)
- Even-Odd Sort via Java threads
- Discuss race conditions in above
- Discuss event handling versus event processing and issues with running too long in event thread
- Reflexive Transitive Closure
- Warshall's Algorithm
- Parallelizing -- cannot do with pivot; assumes CREW PRAM to get O(N) with O(N^2) processors
- Need for Barrier at end of each pivot
- Barrier class in Java using wait and notifyall for conditional atomic action
- All Shortest Path
- Floyd's Algorithm
- Parallelizing -- cannot do with pivot
- Analysis of various options on parallelizing one or two inner loops
- Why parallelizing first inner loop beats doing so for innermost loop (barriers and granularity)
- Tradeoff between theoretical improvements versus higher contention and need for more barriers when parallelizing both inner loops
- What would need to be done to parallize all three loops
- More analysis of Max algorithms + Brent's scheduling
- Specifically look at max and sum algorithms and try to determine when we are using an appropriate number of processors.
- Measuring the number of processors that can be used and still have O(1) efficiency. For Max or Sum algorithm, Tp = O(N/p) + O(lg p).
- So E = O(N)/(O(N + p lg p) = 1/(1+p lg p / N).
- This is O(1) if p = N/lg n (Brent's scheduling)
- Fair Solutions
- Tie Breaker
- Ticket Algorithm
- Barrier Synchronization
- Shared Counter
- Flags and Coordinators
- Symmetric Barriers
Programming Assignment #1:
Implement Floyd's all-shortest-pairs algorithm in Java, assigning onethread to each row of the adjacency matrix.
I will give you a start by providing a sequential version of this (onethread).
Floyd Implementation
Due: 9/29
Week#5: (9/20, 9/22) - Data Parallel
- Examples
- Parallel Prefix
- Parallel linked list length
- Bag of tasks
- SIMD computation
- X-Net and all neighbors operations
- Semaphores
- abstraction with two services P (wait) and V (signal)
- sem s;
- P(s): <await(s>0) s--;>
- V(s): <s++;>
- internal state is
- a non-negative int value -- counting or general semaphore; or
- a binary value (0 or 1) -- binary semaphore
- fairness can be assured with proper implementation of await.
- Critical section problem and semaphores
- Java synchronized and semaphores
- Barriers and semaphores
- Producer / Consumer Problem
- Dining Philosophers Problem
- probabilistic approach
- Java version
- Reader/Writer Problems
- Single lane bridge problem
- Monitors
- monitors and conds
- wait(cv), wait(cv, rank), signal(cv), signal_all(cv), empty(cv), minrank(cv)
- signal and wait versus signal and continue
- queues, priority queues, BPOTs, heaps and analysis
- bitonic lists
- semaphores
- bounded buffers
- readers/writers
- shortest-job-next
- timers
- covering condition versus priority wait
- sleeping barber
- CSCAN disk scheduler
- SCAN disk schedule
- Single lane bridge problem using monitors
Assignment #4:
- The Unix kernel provides two atomic operations similar to the following:
sleep( ): // block the executing thread
wakeup( ): // awaken all blocked threads
A call of sleep always blocks the caller. A call of wakeup awakens every thread that has called sleep since the last time that wakeup was called.
a.) A call to wakeup should awaken all tasks that entered sleep prior to the time wakeup is started, but not any that arrive later. The following “solution” has a serious problem in that it can wake up the wrong tasks. Explain how this undesirable result can happen.
sem e = 1, delay = 0; int count = 0;
sleep( ): P(e) ; count++; V(e); P(delay);
wakeup( ): P(e); while (count > 0) { count--; V(delay); } V(e);
b.) Develop and present a correct implementation of sleep and wakeup using semaphores for synchronization.
(Hint: Consider a solution that uses baton passing.)
- Suppose there are m producer processes and n consumer processes. The producer processes periodically call broadcast(message) to send a copy of message to all n consumers. Each consumer receives a copy of the message by calling fetch(message, myId), where message is a result argument and myId ranges from 0 to n-1.
Write a monitor that implements broadcast and fetch. Use the Signal and Continue discipline. The monitor should store only one message at a time, which means that after one producer calls broadcast, any future call of broadcast has to delay until every consumer has received a copy of the previous message. Be sure to declare all variables that are used in the monitor. You may assume the message is of type String, although its type is really irrelevant
Due: 10/4
Week#6: (9/27, 9/29) - Java Support for Monitors
- Threads : either inherit from Thread class or implement Runnable interface
- constructor in Runnable specifies object that provides code for thread
- constructor can also specify a string to identify thread
- Synchronize : specifies critical section using an object as lock
- can do at granularity of method
- can do at granularity of a block
- Java synchronized, wait/notify/notify_all
- Locks are reentrant
- Locks can be temporarily given up : wait and notify
- This ends the material for Midterm#1
- Message Passing
- channels: send (non-blocking); receive (blocking)
- simple channel examples: char-to-line; sorting network
- client server examples
- one op; multi-ops; condition variables
- resource allocator
- disk server
- file server
- centralized vs symmetric vs ring reduction algorithms
- synchronous message passing and deadlock
- Paths as a Declarative Means of Coordination
- Parallelizing Graph Algorithms
- Greedy algorithms
- Spanning trees
- Minimum spanning tree (Prim's Algorithm)
- alternate data structures for adjacency (N2 verus E lgN)
- Block Striped Partitioning
- Analysis of Prim's using p processors
- computation cost N2/p
- communication cost
- Hypercube O( N lg p ); use p = N/lg N
- Mesh O(N p˝ ) use p = N 2/3
- Notion of true dependency in analysis
Week#7: (10/4, 10/6) - Topics and Promises for MidTerm1
- MidTerm1 on 10/6
Week#8: (10/11, 10/13) - MPI
- General introduction
- Floyd's algorithm revisited
- Row versus column partitioning
- Input and output considerations (so it makes sense)
- read your own or have one process do it all and send shares to others
- picking a process to do the printing
- Analysis and benchmarking (a requirement in this course)
- Quinn's Notes (Chapter 4 & Chapter 6)
- Quinn's Floyd implmentation (Floyd)
- Links to resources (MPI details; zepryr)
- Distributed Computing Paradigms
- Channels (all the ways down to UDP and TCP/IP)
- Distributed Objects
- Mediated -- Spaces and Message Queues
- UDP, Multicast UDP, TCP/IP
Assignment #5:
Redo the three questions from Exam#1. See Assignment5.doc.
Due: 10/18
Programming Assignment #2:
See Notes (page 179)
Due: 10/20
Programming Assignment #3
See Notes (page 180)
Due: 10/27
Note: Last Day to Withdraw
is 10/14
Project
This is intended to be the only team assignment
of the term. Teams of size two are optimal, although larger teams can be used
if you convince me of the complexity. In general, I would like larger teams
when two distinct approaches are taken to the same problem (e.g., an RMI and
a tuple space approach, or an RMI and an MPI approach). Of course, I will
expect a careful analysis of how the methods compare.
This is pretty open-ended, but here are some examples
of reasonable projects.
- For Vision people
- Parallel or distributed edge detection
- Don't do Canny, as that was already done
- Parallel or distributed depth calculation
- Might use disparity maps from stereo
- For Graphics people
- Parallel tone or color mapping
- Map tone from one image to another
- Use background color shift to alter rendering of foreground objects
- Can compare various color theories
- Parallel or distributed compression/decompression of HDR images
- Could do for film clips rather than single images
- Can compare various representations
- For Systems people
- RMI that supports both synchronous and asynchronous calls
- Might experiment with a new type called "Future"
- Distributed Spoaces implementation using multicast UDP
- Spaces implementation in C# or C++
- Might do a comparison
- For Algorithms people
- Parallel evaluation of constraints
- For instance, mix Gaussian elimination with non-linearity
- Parallel graph rewriting
- Perhaps synthesized attribute evaluation on a tree, e.g.,
constant propagation
- Parallel or distributed Cocke-Kasami-Younger algorithm
- Parallel or distributed reconstruction of phylogenies
In all cases, you must design experiments to determine the scalability
of your solution. If you have a team of two, you might have a run-off of
a parallel versus a distributed solution. You will need to produce a manual
that goes with this project. I want this to be a professional job.
Due: 11/3 (Proposal with use cases); 11/10 (Design
Document); 11/29 (Demo);
12/4 (Final Report/Analysis, Software)
Week#9: (10/18, 10/20)
- Concurrent Objects
- Synchronous versus asynchronous method invocation
- Single versus multiple server threads
- Remote Method Invocation
- Tuple Space
- General concepts and C-Linda
out( tuple )
in( tuple ) and inp( tuple )
rd( tuple ) and rdp( tuple )
eval( tuple )
- Comments on Paradise, Rinda, JavaSpaces, TSpaces, and GigaSpaces
- Leases
- Mixed Reality Presentation
- Background
- Distributed system protocols
- Applications
Week#10: (10/25, 10/27)
- Atomicity and Transactions
- Commit/Abort; roll forward/roll back
- ACID
- Jini Object Request Broker (ORB)
- Discovery, Join, Lookup
- Discovery process
- Packet storms on restart
- Distributed Garbage Collection
- Oblivious Comparison Exchange Sorts
- Proof of correctness for 0-1 data implies from for all
- Correctness of Even-Odd Transposition Sort
- Shear Sort and its Analysis
- Shear Sort and RevSort
- Order, Cost, Work, Cost Efficiency, Work Efficiency.
- Discuss the general principle of "getting out of the way" employed in Shearsort.
- Formal proof of ShearSort correctness and timing.
- Revsort (a kin of ShearSort)
- Extend shear notion to the technique used in Revsort. Note this is not a snake sort like shear.
- Note that Revsort is not a sort. It just gets close (within 8 rows of being right.)
- Revsort gets there fast. It cuts number of dirty rows, not in halves, but to square root of current number of dirty ones.
- Bitonic Sort
- Virtualizing sorts
Week#11: (11/1, 11/3)
- Midterm2 review and promises
- Midterm2 Sample Questions
- Answers to Midterm2 Sample
Questions
- MidTerm2 on 11/3
Due: 11/17
Week#12: (11/8, 11/10)
- RMI example (Bid.com)
- Accelerated Cascading
- Review of analysis of binary tree reduction and CRCW max
- Concept of doubly log tree
- CRCW max and doubly log tree
- Analysis of doubly log tree (T=lg lg N; W=N lg lg N)
- Idea of using binary tree reduction to reduce problem size and CRCW
max to pick up speed at end
- Formal analysis
- Reduce for lg lg lg N steps (T<=lg lg N; W<=N; Remaining Size=N/lg
lg N)
- Use CRCW to complete (T=lg lg N; W=N)
- Combined algorithm is reasonably fast (lg lg N) and work efficient (N)
- PCN (Program Composition Notation)
- Parallel Logic Programming (Prolog, CLP(R), Strand)
- Channel Paradigm
- CSP; reasoning through CSP; Modern CSP; Extensions
- synchronous communication; guarded communication
- dest ! port(expression list); source ? port(parameter list)
- blocks until match of sender and receiver
- Guarded communication (succeed, block, fail)
- if B1 ; C1 -> S1; [] B2 ; C2 -> S2; ... fi
- do B1 ; C1 -> S1; [] B2 ; C2 -> S2; ... od
- all fail (no effect; we're done)
- one or more successful (choose one non-det.)
- all block (wait for one to succeed)
- Seive of Eratosthenes
- CSP and event-based reasoning
Take Home Exam Question:
Redo Question #5 (the semaphore one). You must now
do a little better than I would have expected on th exam. Specifically,
you must discuss all the issues of scehuling as I have done in the key for
Question 4, so that you justify that your algorithm meets all the criteria
laid out in #4.
Due: 11/15
Assignment #6:
- As regards Accelerated Cascading Max, analyze this algorithm if the
binary tree reduction cutoff is:
a.) lg lg lg lg N
b.) square root(lg N)
Determine which are fast and/or efficient. Do precise analysis.
- For each of (a) Bitonic Sort and (b) lg lg trees Max, operating on N
values, determine if there is a magic p (similar to Brent's Scheduling),
for which this algorithm is work efficient and fast ((lgN)^2 and lg lg
N, resp.) when virtualized with each processor starting with N/p values.
Show the derivation of this value of p and then prove it is optimal, as
in Brent's choice of p = N/lg N, or argue convincingly that no such p
can be found for arbitrary N.
Due: 11/17
Programming Assignment #4 (last one):
In the directory Bid, I have placed a
server and clients for a bidding service. Essentially, clients can register
with the server who assigns each a unique number. Any client can offer an
object (via its name) for sale at some minimum price.
Besides this modification, and finding/fixing any observed bugs, you must
expand this so there are two types of bid servers. The first is registered
under the service "KnightBay" and the second as "PegasusBay." In KnightBay,
the seller can close out bids at any time or when a desired goal amount
is reached or exceeded. In the model, the price paid is the highest bid.
In PegasusBay, the seller can close out bids at any time or when after some
goal time (the time in milliseconds is specified in the same parameter field
as the goal amount is specified in the first case.) In this second close-out
strategy, the price paid is the second highest bid, but the winner is as
always the highest bidder. The service chosen by a client is indicated in
the command line. The service name registered by a server is also specified
in its command line.
When a sales occurs, the server identifies the buyer and seller to each
other. They are responsible for completing the sale in some off-line manner.
As a small expansion on the current suloution, your clients must show a
record of all items they personally sold or bought, with the sale prices.
As currently implemented, the server uses a push strategy to send a vector
of items and current bid prices to all clients each time a bid price changes,
a sale is complete, or a new item is offered for sale. You should stay with
that, except you are certainly welcome to sue some other form of collection
that you deem more acceptable, so long as it's serializable.
I am not enamored with the user current interface. It is also the case that
this server does not handle disappearing clients worth a toot. Moreover,
clients can bid on any item offered for sale, included items they offered
for bid. That unacceptable behavior must be blocked. You could even choose
to penalize such a person by immediately declaring that the current bid
is accepted.You will have to deal with these nasty little issues.
You must obey the interfaces given here, since I can choose to use my own
client with your servers, and vice versa. In fact, each of you must identify
one other student in the class whose servers you have tried with your client
and whose client you have tried with your servers. The name of this
student must be part of your writeup. Yes, you do have to provide a user
manual which gives use-case scenarios.
Due: 11/22
Week#13: (11/15, 11/17)
- Program Flow Analysis
- Control Flow
- Basic terminology
- Control vs data flow
- Inter vs intraprocedural analysis
- Program flowgraph
- Basic blocks
- Domination
- Loop extraction
- Depth first ordering (reverse postorder)
- Categorizing arcs (forward, back, cross)
- Data Flow
- Data flow analysis
- Notation including May/Must, Forward/Backward Flow
- Reaching Definitions Algorithm
- Flow Analysis and Parallelizing Code
- Scalar data dependence
- true, anti and output dependencies
- Vector data dependence
- Diophantine analysis
- GCD Test
- Flow Analysis and Parallelizing Code
- Transformations Used to Parallelize Code
- Scheduling Problems
- Basic definition
- Motivation for scheduling problems
- scheduling of independent tasks and bin packing
- heuristics versus perfect scheduling
- Examples with two processors, no dependencies
- Anomalous behaviors with precedence graphs
- NP problems
Week#14: (11/22)
- Scheduling Problems
- How bad can anomalies be?
- How bad can a bad schedule be?
- Fast scheduling algorithm for unit execution trees
- Adaptations for forests and anti-trees
- Discussion of fast DAG scheduling with 2 processors
- Union-Find algorithm (amortization)
- Thanksgiving
Week#15: (11/29, 12/1)
- Meetings with me on Project status (11/29)
- Sample Questions for Final
- Discussion and Return of Assignments
- Review for Final Exam (Topics)
Final Exam:
- Thursday December 8; 13:00 - 15:50
© Charles E. Hughes, ceh@cs.ucf.edu -- Last Modified
December 1, 2005