Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore introduction_to_parallel_computing_second_edition-ananth_grama.

introduction_to_parallel_computing_second_edition-ananth_grama.

Published by Demo 1, 2021-07-02 15:14:40

Description: introduction_to_parallel_computing_second_edition-ananth_grama.

Search

Read the Text Version

[Lub86] M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing, 15(4):1036–1053, 1986. [LW66] E. L. Lawler and D. Woods. Branch-and-bound methods: A survey. Operations Research, 14, 1966. [LW84] G.-J. Li and B. W. Wah. Computational efficiency of parallel approximate branch-and- bound algorithms. In Proceedings of the 1984 International Conference on Parallel Processing, 473–480, 1984. [LW85] G.-J. Li and B. W. Wah. Parallel processing of serial dynamic programming problems. In Proceedings of COMPSAC 85, 81–89, 1985. [LW86] G.-J. Li and B. W. Wah. Coping with anomalies in parallel branch-and-bound algorithms. IEEE Transactions on Computers, C-35, June 1986. [LW95] D. Lenoski and W. D. Weber. Scalable Shared-Memory Multiprocessing. Morgan Kaufmann, San Mateo, CA, 1995. [LY86] M. Li and Y. Yesha. New lower bounds for parallel computations. In Proceedings of 18th ACM Conference on Theory of Computing, 177–187, 1986. [MC82] T. A. Marsland and M. S. Campbell. Parallel search of strongly ordered game trees. Computing Surveys, 14:533–551, 1982. [MD92] A. Mahanti and C. Daniels. SIMD parallel heuristic search. Artificial Intelligence, 1992. [MD93] N. R. Mahapatra and S. Dutt. Scalable duplicate pruning strategies for parallel A* graph search. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993. [MdV87] O. A. McBryan and E. F. V. de Velde. Hypercube algorithms and implementations. SIAM Journal on Scientific and Statistical Computing, 8(2):s227–s287, March 1987. [Mes94] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Available at http://www.mpi-forum.org. May 1994. [Mes97] Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface. Available at http://www.mpi-forum.org. July 1997. [MFMV90] B. Monien, R. Feldmann, P. Mysliwietz, and O. Vornberger. Parallel game tree search by dynamic tree decomposition. In V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990. [MG89] C. U. Martel and D. Q. Gusfield. A fast parallel quicksort algorithm. Information Processing Letters, 30:97–102, 1989. [Mil91] D. Miller. Exact distributed algorithms for travelling salesman problem. In Proceedings of the Workshop on Parallel Computing of Discrete Optimization Problems, 1991. [MK99] J. Magee and J. Kramer. Concurrency: State Models and Java Programs. John Wiley & Sons, 1999. [MKRS88] R. Miller, V. K. P. Kumar, D. I. Reisis, and Q. F. Stout. Meshes with reconfigurable buses. In Proceedings of MIT Conference on Advanced Research in VLSI, 163–178, 1988. [MM73] A. Martelli and U. Montanari. From dynamic programming to search algorithms with functional costs. In Proceedings of the International Joint Conference on Artifi cial Intelligence,

345–349, 1973. [MM91] P. Messina and A. Murli, editors. Practical Parallel Computing: Status and Prospects. Wiley, Chichester, UK, 1991. [MMR95] B. Mans, T. Mautor, and C. Roucairol. A parallel depth first search branch and bound for the quadratic assignment problem. European Journal of Operational Research, 81(3):617–628, 1995. [Mod88] J. J. Modi. Parallel Algorithms and Matrix Computation. Oxford University Press, Oxford, UK, 1988. [Moh83] J. Mohan. Experience with two parallel programs solving the traveling salesman problem. In Proceedings of the 1983 International Conference on Parallel Processing, 191–193, 1983. [Mol86] C. B. Moler. Matrix computation on distributed-memory multiprocessors. In M. T. Heath, editor, Hypercube Multiprocessors 1986, 181–195. SIAM, Philadelphia, PA, 1986. [Mol87] C. B. Moler. Another look at Amdahl's law. Technical Report TN-02-0587-0288, Intel Scientific Computers, 1987. [Mol93] D. I. Moldovan. Parallel Processing: From Applications to Systems. Morgan Kaufmann, San Mateo, CA, 1993. [MP85] T. A. Marsland and F. Popowich. Parallel game tree search. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7(4):442–452, July 1985. [MP93] D. L. Miller and J. F. Pekny. The role of performance metrics for parallel mathematical programming algorithms. ORSA Journal on Computing, 5(1), 1993. [MR] D. C. Marinescu and J. R. Rice. On high level characterization of parallelism. Technical Report CSD-TR-1011, CAPO Report CER-90-32, Computer Science Department, Purdue University, West Lafayette, IN. Also published in Journal of Parallel and Distributed Computing, 1993. [MRSR92] G. P. McKeown, V. J. Rayward-Smith, and S. A. Rush. Parallel Branch-and-Bound, 111–150. Advanced Topics in Computer Science. Blackwell Scientific Publications, Oxford, UK, 1992. [MS88] Y. W. E. Ma and D. G. Shea. Downward scalability of parallel architectures. In Proceedings of the 1988 International Conference on Supercomputing, 109–120, 1988. [MS90] G. Manzini and M. Somalvico. Probabilistic performance analysis of heuristic search using parallel hash tables. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 1990. [MS96] R. Miller and Q. F. Stout. Parallel Algorithms for Regular Architectures. MIT Press, Cambridge, MA, 1996. [MV84] K. Mehlhorn and U. Vishkin. Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Informatica, 21(4):339–374, November 1984. [MV85] B. Monien and O. Vornberger. The ring machine. Technical report, University of Paderborn, Germany, 1985. Also in Computers and Artificial Intelligence, 3(1987). [MV87] B. Monien and O. Vornberger. Parallel processing of combinatorial search trees. In

Proceedings of International Workshop on Parallel Algorithms and Architectures, 1987. [MVS86] B. Monien, O. Vornberger, and E. Spekenmeyer. Superlinear speedup for parallel backtracking. Technical Report 30, University of Paderborn, Germany, 1986. [NA91] D. Nussbaum and A. Agarwal. Scalability of parallel machines. Communications of the ACM, 34(3):57–61, 1991. [Nat90] L. Natvig. Investigating the practical value of Cole's O (log n) time crew pram merge sort algorithm. In 5th International Symposium on Computing and Information Sciences, October 1990. [NBF96] B. Nichols, B. Buttlar, and J. P. Farrell. Pthreads Programming. O'Reilly & Associates, Newton, MA 02164, 1996. [nCU90] nCUBE Corporation. nCUBE 6400 Processor Manual. Beaverton, OR, 1990. [ND96] S. J. Norton and M. D. DiPasquale. Thread time: the multithreaded programming guide. Hewlett-Packard professional books. Prentice-Hall, Englewood Cliffs, NJ 07632, 1996. [Ni91] L. M. Ni. A layered classification of parallel computers. In Proceedings of 1991 International Conference for Young Computer Scientists, 28–33, 1991. [Nic90] J. R. Nickolls. The design of the MasPar MP-1: A cost-effective massively parallel computer. In IEEE Digest of Papers—Comcom, 25–28. IEEE Computer Society Press, Los Alamitos, CA, 1990. [NM93] L. M. Ni and McKinley. A survey of wormhole routing techniques in direct connect networks. IEEE Computer, 26(2), February 1993. [NMB83] D. Nath, S. N. Maheshwari, and P. C. P. Bhatt. Efficient VLSI networks for parallel processing based on orthogonal trees . IEEE Transactions on Computers, C–32:21–23, June 1983. [NS79] D. Nassimi and S. Sahni. Bitonic sort on a mesh connected parallel computer. IEEE Transactions on Computers, C–28(1), January 1979. [NS80] D. Nassimi and S. Sahni. Finding connected components and connected ones on a mesh-connected computer. SIAM Journal of Computing, 9(4):744–757, November 1980. [NS87] A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley- Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C- 36(5):581–591, 1987. [NS93] M. Nigam and S. Sahni. Sorting n numbers on n x n reconfigurable meshes with buses. In 7th International Parallel Processing Symposium, 174–181, 1993. [NSF91] Grand Challenges: High Performance Computing and Communications. A Report by the Committee on Physical, Mathematical and Engineering Sciences, NSF/CISE, 1800 G Street NW, Washington, DC, 20550, 1991. [Nug88] S. F. Nugent. The iPSC/2 direct-connect communications technology. In Proceedings of the Third Conference on Hypercubes, Concurrent Computers, and Applications, 51–60, 1988. [Nus82] H. J. Nussbaumer. Fast Fourier Transform and Convolution Algorithms. Springer- Verlag, New York, NY, 1982. [NW88] D. M. Nicol and F. H. Willard. Problem size, parallel architecture, and optimal speedup.

Journal of Parallel and Distributed Computing, 5:404–420, 1988. [GOV99] Funding a Revolution: Government Support for Computing Research. Committee on Innovations in Computing and Communications. National Academy Press, 1999. [OR88] J. M. Ortega and C. H. Romine. The ijk forms of factorization methods II: Parallel systems. Parallel Computing, 7:149–162, 1988. [Ort88] J. M. Ortega. Introduction to Parallel and Vector Solution of Linear Systems. Plenum Press, New York, NY, 1988. [OS85] D. P. O'Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computations. Communications of the ACM, 28:840–853, 1985. [OS86] D. P. O'Leary and G. W. Stewart. Assignment and scheduling in parallel matrix factorization. Linear Algebra and its Applications, 77:275–299, 1986. [Pac98] P. Pacheco. Parallel Programming with MPI. Morgan Kaufmann, 1998. [PBG+85] G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, V. A. Norlton, and J. Weiss. The IBM research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of 1985 International Conference on Parallel Processing, 764– 771, 1985. [PC89] P. M. Pardalos and J. Crouse. A parallel algorithm for the quadratic assignment problem. In Supercomputing '89 Proceedings, 351–360. ACM Press, New York, NY, 1989. [PD89] K. H. Park and L. W. Dowdy. Dynamic partitioning of multiprocessor systems. International Journal of Parallel Processing, 18(2):91–120, 1989. [Pea84] J. Pearl. Heuristics—Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, MA, 1984. [Per87] R. Perrott. Parallel Programming. Addison-Wesley, Reading, MA, 1987. [Pfi98] G. F. Pfister. In Search of Clusters. Prentice Hall, Englewood Cliffs, NJ, 1998. 2nd Edition. [PFK90] C. Powley, C. Ferguson, and R. Korf. Parallel heuristic search: Two approaches. In V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal, editors, Parallel Algorithms for Machine Intelligence and Vision. Springer-Verlag, New York, NY, 1990. [PG98] T. Q. Pham and P. K. Garg. Multithreaded Programming with Win32. Prentice Hall, 1998. [PH90] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA, 1990. [PH96] D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach, 2nd edition. Morgan Kaufmann, San Mateo, CA, 1996. [PK89] R. C. Paige and C. P. Kruskal. Parallel algorithms for shortest path problems. In Proceedings of 1989 International Conference on Parallel Processing, 14– 19, 1989. [PK95] I. Pramanick and J. G. Kuhl. An inherently parallel method for heuristic problem-solving: Part I – general framework. IEEE Transactions on Parallel and Distributed Systems, 6(10), October 1995. [PKF92] C. Powley, R. Korf, and C. Ferguson. IDA* on the connection machine. Artificial

Intelligence, 1992. [Pla89] C. C. Plaxton. Load balancing, selection and sorting on the hypercube. In Proceedings of the 1989 ACM Symposium on Parallel Algorithms and Architectures, 64–73, 1989. [PLRR94] P. M. Pardalos, Y. Li, K. Ramakrishna, and M. Resende. Lower bounds for the quadratic assignment problem. Annals of Operations Research, 50:387–411, 1994. Special Volume on Applications of Combinatorial Optimization. [PR85] V. Pan and J. H. Reif. Efficient parallel solution of linear systems. In 17th Annual ACM Symposium on Theory of Computing, 143–152, 1985. [PR89] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for unconstrainted quadratic zero-one programming. In R. Sharda et al., editors, Impacts of Recent Computer Advances on Operations Research, 131–143. North-Holland, Amsterdam, The Netherlands, 1989. [PR90] P. M. Pardalos and G. P. Rodgers. Parallel branch-and-bound algorithms for quadratic zero-one programming on a hypercube architecture. Annals of Operations Research, 22:271–292, 1990. [PR91] M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33:60– 100, 1991. [Pri57] R. C. Prim. Shortest connection network and some generalizations. Bell Systems Technical Journal, 36:1389–1401, 1957. [PRV88] G. Plateau, C. Roucairol, and I. Valabregue. Algorithm PR2 for the parallel size reduction of the 0/1 multiknapsack problem. In INRIA Rapports de Recherche, number 811, 1988. [PS82] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982. [PV80] F. P. Preparata and J. Vuillemin. Area-time optimal VLSI networks for matrix multiplication. In Proceedings of the 14th Princeton Conference on Information Science and Systems, 300–309, 1980. [PY88] C. H. Papadimitriou and M. Yannakakis. Towards an architecture independent analysis of parallel algorithms. In Proceedings of 20th ACM Symposium on Theory of Computing, 510–513, 1988. [QD86] M. J. Quinn and N. Deo. An upper bound for the speedup of parallel branch-and-bound algorithms. BIT, 26(1), March 1986. [Qui88] M. J. Quinn. Parallel sorting algorithms for tightly coupled multiprocessors. Parallel Computing, 6:349–357, 1988. [Qui89] M. J. Quinn. Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer. IEEE Transactions on Computers, 1989. [Qui94] M. J. Quinn. Parallel Computing: Theory and Practice. McGraw-Hill, New York, NY, 1994. [Ram97] V. Ramachandran. Qsm: A general purpose shared-memory model for parallel computation. In Foundations of Software Technology and Theoretical Computer Science, 1–5, 1997.

[Ran89] A. G. Ranade. Fluent Parallel Computation. Ph.D. Thesis, Department of Computer Science, Yale University, New Haven, CT, 1989. [Ran91] A. G. Ranade. Optimal speedup for backtrack search on a butterfly network. In Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures, 1991. [Rao90] V. N. Rao. Parallel Processing of Heuristic Search. Ph.D. Thesis, University of Texas, Austin, TX, 1990. [Ras78] L. Raskin. Performance Evaluation of Multiple Processor Systems. Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, PA, 1978. [RB76] C. M. Rader and N. M. Brenner. A new principle for Fast fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, 24:264–265, 1976. [RDK89] C. Renolet, M. Diamond, and J. Kimbel. Analytical and heuristic modeling of distributed algorithms. Technical Report E3646, FMC Corporation, Advanced Systems Center, Minneapolis, MN, 1989. [Rei81] R. Reischuk. Probabilistic algorithms for sorting and selection. SIAM Journal of Computing, 396–409, 1981. [RF89] D. A. Reed and R. M. Fujimoto. Multicomputer Networks: Message-Based Parallel Processing. MIT Press, Cambridge, MA, 1989. [RICN88] K. Rokusawa, N. Ichiyoshi, T. Chikayama, and H. Nakashima. An efficient termination detection and abortion algorithm for distributed processing systems. In Proceedings of 1988 International Conference on Parallel Processing: Vol. I, 18–22, 1988. [RK87] V. N. Rao and V. Kumar. Parallel depth-first search, part I: Implementation. International Journal of Parallel Programming, 16(6):479–499, 1987. [RK88a] V. N. Rao and V. Kumar. Concurrent access of priority queues. IEEE Transactions on Computers, C–37 (12), 1988. [RK88b] V. N. Rao and V. Kumar. Superlinear speedup in state-space search. In Proceedings of the 1988 Foundation of Software Technology and Theoretical Computer Science, number 338, 161–174. Springer-Verlag Series Lecture Notes in Computer Science, 1988. [RK93] V. N. Rao and V. Kumar. On the efficicency of parallel backtracking. IEEE Transactions on Parallel and Distributed Systems, 4(4):427–437, April 1993. available as a technical report TR 90-55, Computer Science Department, University of Minnesota. [RKR87] V. N. Rao, V. Kumar, and K. Ramesh. A parallel implementation of iterative- deepening-A*. In Proceedings of the National Conference on Artificial Intelligence (AAAI-87), 878–882, 1987. [RND77] E. M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, 1977. [RO88] C. H. Romine and J. M. Ortega. Parallel solution of triangular systems of equations. Parallel Computing, 6:109–114, 1988. [Rob75] M. O. Robin. Probabilistic algorithms. In J. Traub, editor, Algorithms and Complexity: New Directions and Recent Results, 21–39. Academic Press, San Diego, CA, 1975. [Rob90] Y. Robert. The Impact of Vector and Parallel Architectures on Gaussian Elimination. John Wiley and Sons, New York, NY, 1990.

[Rom87] C. H. Romine. The parallel solution of triangular systems on a hypercube. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 552–559. SIAM, Philadelphia, PA, 1987. [Rou87] C. Roucairol. A parallel branch-and-bound algorithm for the quadratic assignment problem. Discrete Applied Mathematics, 18:211–225, 1987. [Rou91] C. Roucairol. Parallel branch-and-bound on shared-memory multiprocessors. In Proceedings of the Workshop On Parallel Computing of Discrete Optimization Problems, 1991. [RRRR96] K. A. Robbins, S. Robbins, K. R. Robbins, and S. Robbins. Practical UNIX Programming: A Guide to Concurrency, Communication, and Multithreading. Prentice Hall, 1996. [RS90a] A. P. R. Alverson, D. Callahan, D. Cummings, B. Koblenz and B. Smith. The tera computer system. In International Conference on Supercomputing, 1–6, 1990. [RS90b] S. Ranka and S. Sahni. Hypercube Algorithms for Image Processing and Pattern Recognition. Springer-Verlag, New York, NY, 1990. [RV87] J. H. Reif and L. G. Valiant. A logarithmic time sort for linear size networks. Journal of the ACM, 34(1):60–76, January 1987. [Ryt88] W. Rytter. Efficient parallel computations for dynamic programming. Theoretical Computer Science, 59:297–307, 1988. [RZ89] M. Reeve and S. E. Zenith, editors. Parallel Processing and Artificial Intelligence. Wiley, Chichester, UK, 1989. [Saa86] Y. Saad. Communication complexity of the Gaussian elimination algorithm on multiprocessors. Linear Algebra and its Applications, 77:315–340, 1986. [SB77] H. Sullivan and T. R. Bashkow. A large scale, homogeneous, fully distributed parallel machine. In Proceedings of Fourth Symposium on Computer Architecture, 105–124, March 1977. [SBCV90] R. H. Saavedra-Barrera, D. E. Culler, and T. Von Eiken. Analysis of multithreaded architectures for parallel computing. Report UCB/CSD 90/569, University of California, Berkeley, Computer Science Division, Berkeley, CA, April 1990. [Sch80] J. T. Schwartz. Ultracomputers. ACM Transactions on Programming Languages and Systems, 2:484–521, October 1980. [Sed78] R. Sedgewick. Implementing quicksort programs. Communications of the ACM, 21(10):847–857, 1978. [Sei85] C. L. Seitz. The cosmic cube. Communications of the ACM, 28(1):22–33, 1985. [Sei89] S. R. Seidel. Circuit-switched vs. store-and-forward solutions to symmetric communication problems. In Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, 253–255, 1989. [Sei92] C. L. Seitz. Mosaic C: An experimental fine-grain multicomputer. Technical report, California Institute of Technology, Pasadena, CA, 1992. [SG88] S. R. Seidel and W. L. George. Binsorting on hypercube with d-port communication. In Proceedings of the Third Conference on Hypercube Concurrent Computers, 1455–1461, January 1988.

[SG91] X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17:1093–1109, December 1991. Also available as Technical Report IS-5053, UC- 32, Ames Laboratory, Iowa State University, Ames, IA. [Sha85] J. A. Sharp. Data-Flow Computing. Ellis Horwood, Chichester, UK, 1985. [She59] D. L. Shell. A high-speed sorting procedure. Communications of the ACM, 2(7):30–32, July 1959. [SHG93] J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42–50, 1993. [Sie77] H. J. Siegel. The universality of various types of SIMD machine interconnection networks. In Proceedings of the 4th Annual Symposium on Computer Architecture , 23–25, 1977. [Sie85] H. J. Siegel. Interconnection Networks for Large-Scale Parallel Processing. D. C. Heath, Lexington, MA, 1985. [SJ81] C. Savage and J. Jaja. Fast, efficient parallel algorithms for some graph problems. SIAM Journal of Computing, 10(4):682–690, November 1981. [SK89] W. Shu and L. V. Kale. A dynamic scheduling strategy for the chare-kernel system. In Proceedings of Supercomputing Conference, 389–398, 1989. [SK90] V. Saletore and L. V. Kale. Consistent linear speedup to a first solution in parallel state- space search. In Proceedings of the 1990 National Conference on Artificial Intelligence, 227–233, August 1990. [SKAT91a] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Efficient algorithms for parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2):95–131, 1991. [SKAT91b] V. Singh, V. Kumar, G. Agha, and C. Tomlinson. Scalability of parallel sorting on mesh multicomputers. International Journal of Parallel Programming, 20(2), 1991. [SM86] S. J. Stolfo and D. P. Miranker. The DADO production system machine. Journal of Parallel and Distributed Computing, 3:269–296, June 1986. [Smi84] D. R. Smith. Random trees and the analysis of branch and bound procedures. Journal of the ACM, 31(1), 1984. [SN90] X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, 324–333, 1990. [SN93] X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19:27–37, September 1993. [Sni82] M. Snir. On parallel search. In Proceedings of Principles of Distributed Computing, 242–253, 1982. [Sni85] M. Snir. On parallel searching. SIAM Journal of Computing, 14(3):688–708, August 1985. [Sny86] L. Snyder. Type architectures, shared-memory and the corollary of modest potential. Annual Review of Computer Science, 1:289–317, 1986. [SOHL+96] M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The

Complete Reference. MIT Press, Cambridge, MA, 1996. [Sol77] M. Sollin. An algorithm attributed to Sollin. In S. Goodman and S. Hedetniemi, editors, Introduction to The Design and Analysis of Algorithms. McGraw-Hill, Cambridge, MA, 1977. [SR91] X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. Technical Report IS-5057, Ames Laboratory, Iowa State University, Ames, IA, 1991. Also published in IEEE Transactions on Parallel and Distributed Systems. [SS88] Y. Saad and M. H. Schultz. Topological properties of hypercubes. IEEE Transactions on Computers, 37:867–872, 1988. [SS89a] Y. Saad and M. H. Schultz. Data communication in hypercubes. Journal of Parallel and Distributed Computing, 6:115–135, 1989. Also available as Technical Report YALEU/DCS/RR- 428 from the Department of Computer Science, Yale University, New Haven, CT. [SS89b] Y. Saad and M. H. Schultz. Data communication in parallel architectures. Parallel Computing, 11:131–150, 1989. [SS90] H. Shi and J. Schaeffer. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14:361–372, 1990. [ST95] B. M. S. Tschvke, R. Lling. Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network. In Proceedings of the 9th International Parallel Processing Symposium, 182– 189, Santa Barbara, CA, April 1995. [Sto71] H. S. Stone. Parallel processing with the perfect shuffle. IEEE Transactions on Computers, C-20(2):153–161, 1971. [Sto93] H. S. Stone. High-Performance Computer Architecture: Third Edition. Addison-Wesley, Reading, MA, 1993. [Sun95] SunSoft. Solaris multithreaded programming guide. SunSoft Press, Mountainview, CA, 1995. [Sup91] Supercomputer Systems Division, Intel Corporation. Paragon XP/S Product Overview. Beaverton, OR, 1991. [SV81] Y. Shiloach and U. Vishkin. Finding the maximum, merging and sorting in a parallel computation model. Journal of Algorithms, 88–102, 1981. [SV82] Y. Shiloach and U. Vishkin. An O (n2 log n) parallel max-flow algorithm. Journal of Algorithms, 3:128–146, 1982. [SW87] Q. F. Stout and B. A. Wagar. Passing messages in link-bound hypercubes. In M. T. Heath, editor, Hypercube Multiprocessors 1987, 251–257. SIAM, Philadelphia, PA, 1987. [Swa87] P. N. Swarztrauber. Multiprocessor FFTs. Parallel Computing, 5:197–210, 1987. [SZ96] X.-H. Sun and J. P. Zhu. Performance considerations of shared virtual memory machines. IEEE Trans. on Parallel and Distributed Systems, 6(11):1185– 1194, 1996. [Tab90] D. Tabak. Multiprocessors. Prentice-Hall, Englewood Cliffs, NJ, 1990. [Tab91] D. Tabak. Advanced Multiprocessors. McGraw-Hill, New York, NY, 1991. [Tak87] R. Take. An optimal routing method of all-to-all communication on hypercube networks. In 35th Information Processing Society of Japan, 1987.

[TEL95] D. M. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on- chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995. [Ten90] S. Teng. Adaptive parallel algorithms for integral knapsack problems. Journal of Parallel and Distributed Computing, 8:400–406, 1990. [Thi90] Thinking Machines Corporation. The CM-2 Technical Summary. Cambridge, MA. 1990. [Thi91] Thinking Machines Corporation. The CM-5 Technical Summary. Cambridge, MA. 1991. [Tho83] C. D. Thompson. Fourier transforms in VLSI. IBM Journal of Research and Development, C-32(11):1047–1057, 1983. [Thr99] J. Throop. Standards: OpenMP: Shared-memory parallelism from the ashes. Computer, 32(5):108–109, May 1999. [Tic88] W. F. Tichy. Parallel matrix multiplication on the connection machine. Technical Report RIACS TR 88.41, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1988. [TK77] C. D. Thompson and H. T. Kung. Sorting on a mesh-connected parallel computer. Communications of the ACM, 21:263–271, 1977. [TL90] Z. Tang and G.-J. Li. Optimal granularity of grid iteration problems. In Proceedings of the 1990 International Conference on Parallel Processing, I111–I118, 1990. [Tul96] D. M. Tullsen. Simultaneous multithreading. Ph.D. Thesis, University of Washington, Seattle, WA, 1996. [TV85] R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 14(4):862–874, November 1985. [TY96] J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run- time data dependence checking and control speculation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 35–46, 1996. [Upf84] E. Upfal. A probabilistic relation between desirable and feasible models of parallel computation. In Proceedings of the 16th ACM Conference on Theory of Computing, 258–265, 1984. [UW84] E. Upfal and A. Widgerson. How to share memory in a distributed system. In Proceedings of the 25th Annual Symposium on the Foundation of Computer Science, 171–180, 1984. [Val75] L. G. Valiant. Parallelism in comparison problems. SIAM Journal of Computing, 4(3):348–355, September 1975. [Val82] L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350–361, 1982. [Val90a] L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8), 1990. [Val90b] L. G. Valiant. General purpose parallel architectures. Handbook of Theoretical Computer Science, 1990. [Vav89] S. A. Vavasis. Gaussian elimination with pivoting is P-complete. SIAM Journal on

Discrete Mathematics, 2:413–423, 1989. [VB81] L. G. Valiant and G. J. Brebner. Universal schemes for parallel communication. In Proceedings of the 13th ACM Symposium on Theory of Computation, 263–277, 1981. [VC89] F. A. Van-Catledge. Towards a general model for evaluating the relative performance of computer systems. International Journal of Supercomputer Applications, 3(2):100–108, 1989. [Vor86] O. Vornberger. Implementing branch-and-bound in a ring of processors. Technical Report 29, University of Paderborn, Germany, 1986. [Vor87a] O. Vornberger. The personal supercomputer: A network of transputers. In Proceedings of the 1987 International Conference on Supercomputing, 1987. [Vor87b] O. Vornberger. Load balancing in a network of transputers. In Proceedings of the Second International Workshop on Distributed Parallel Algorithms, 1987. [VS86] J. S. Vitter and R. A. Simmons. New classes for parallel complexity: A study of unification and other complete problems for P. IEEE Transactions on Computers, May 1986. [VSBR83] L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. Fast parallel computation of polynomials using few processors. SIAM Journal of Computing, 12(4):641–644, 1983. [WA98] B. Wilkinson and C. M. Allen. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall, 1998. [WA99] B. Wilkinson and M. Allen. Parallel Programming. Prentice-Hall, NJ, 1999. [Wag87] B. A. Wagar. Hyperquicksort: A fast sorting algorithm for hypercubes. In Proceedings of the Second Conference on Hypercube Multiprocessors, 292– 299, 1987. [Wal91] Y. Wallach. Parallel Processing and Ada. Prentice-Hall, Englewood Cliffs, NJ, 1991. [WB72] W. A. Wulf and C. G. Bell. C.mmp—a multimicroprocessor. In Proceedings of AFIPS Conference, 765–777, 1972. [Wei97] B. Weissman. Active threads manual. Technical Report TR-97-037, International Computer Science Institute, Berkeley, CA 94704, 1997. [WF80] C. L. Wu and T. Y. Feng. On a class of multistage interconnection networks. IEEE Transactions on Computers, 669–702, August 1980. [WF84] C. L. Wu and T. Y. Feng. Interconnection Networks for Parallel and Distributed Processing. IEEE Computer Society Press, Washington, DC, 1984. [WI89] K. Wada and N. Ichiyoshi. A distributed shortest path algorithm and its mapping on the Multi-PSI. In Proceedings of International Conference of Parallel Processing, 1989. [Wil86] H. S. Wilf. Algorithms and Complexity. Prentice-Hall, NJ, 1986. [Wil95] G. V. Wilson. Practical Parallel Programming. MIT Press, Cambridge, MA, 1995. [Wil00] A. Williams. Windows 2000 Systems Programming Black Book. The Coriolis Group, 2000. [Win77] S. Winograd. A new method for computing DFT. In IEEE International Conference on Acoustics, Speech and Signal Processing, 366–368, 1977. [WL88] B. W. Wah and G.-J. Li. Systolic processing for dynamic programming problems.

Circuits, Systems, and Signal Processing, 7(2):119–149, 1988. [WLY84] B. W. Wah, G.-J. Li, and C. F. Yu. The status of MANIP—a multicomputer architecture for solving combinatorial extremum-search problems. In Proceedings of 11th Annual International Symposium on Computer Architecture, 56–63, 1984. [WM84] B. W. Wah and Y. W. E. Ma. MANIP—a multicomputer architecture for solving combinatorial extremum-search problems. IEEE Transactions on Computers, C–33, May 1984. [Woo86] J. V. Woods, editor. Fifth Generation Computer Architectures. North-Holland, Amsterdam, The Netherlands, 1986. [Wor88] P. H. Worley. Information Requirements and the Implications for Parallel Computation. Ph.D. Thesis, Stanford University, Department of Computer Science, Palo Alto, CA, 1988. [Wor90] P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838–858, 1990. [Wor91] P. H. Worley. Limits on parallelism in the numerical solution of linear PDEs. SIAM Journal on Scientific and Statistical Computing, 12:1–35, January 1991. [WS88] Y. Won and S. Sahni. A balanced bin sort for hypercube multiprocessors. Journal of Supercomputing, 2:435–448, 1988. [WS89] J. Woo and S. Sahni. Hypercube computing: Connected components. Journal of Supercomputing, 3:209–234, 1989. [WS91] J. Woo and S. Sahni. Computing biconnected components on a hypercube. Journal of Supercomputing, June 1991. Also available as Technical Report TR 89-7 from the Department of Computer Science, University of Minnesota, Minneapolis, MN. [WY85] B. W. Wah and C. F. Yu. Stochastic modeling of branch-and-bound algorithms with best-first search. IEEE Transactions on Software Engineering, SE-11, September 1985. [Zho89] X. Zhou. Bridging the gap between Amdahl's law and Sandia laboratory's result. Communications of the ACM, 32(8):1014–5, 1989. [Zom96] A. Zomaya, editor. Parallel and Distributed Computing Handbook. McGraw-Hill, 1996. [ZRV89] J. R. Zorbas, D. J. Reble, and R. E. VanKooten. Measuring the scalability of parallel computer systems. In Supercomputing '89 Proceedings, 832–841, 1989. [ Team LiB ]


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook