Computing on Large Clusters

Lecturers: Dr. Grzegorz Czajkowski and Dr. Grzegorz Malewicz (Google).

About the lecturers | Course Summary Assignment

About the lecturers:
Grzegorz Czajkowski received an MSc from AGH in Krakow in 1994 and a PhD from Cornell University in 1999, both in computer science. Then he joined Sun Microsystems Laboratories, where he investigated ways of fusing safe languages and operating systems. Six years, several technology transfers, a dozen papers and 20+ patents later, Grzegorz joined Google, where he leads several systems infrastructure teams. Actively maintaining connections with the Polish IT and academic community, Grzegorz was a member of the technical advisory board of several Polish start-ups and taught a course on virtual machines at AGH in the fall of 2005.

Grzegorz Malewicz received the BA degrees in computer science and in applied mathematics (studia jednoczesne) in 1996 and 1998, respectively, and the MS degree in computer science in 1998, all from the University of Warsaw. He received the PhD degree in computer science from the University of Connecticut in 2003 with his last year at Massachusetts Institute of Technology. He is an engineer at Google. He has had internships at the AT&T Shannon Laboratory (summer 2001) and Microsoft Corp. (summer 2000 and fall 2001). He was a visiting scientist at the University of Massachusetts, Amherst (summer 2004) and Argonne National Laboratory (summer 2005), and an assistant professor at the University of Alabama, where he taught computer science (2003 until 2005). His research focuses on high-performance parallel and distributed computing, experimental and theoretical algorithmics, combinatorial optimization, and scheduling. His research appears in top journals and conferences and includes a singly authored SIAM Journal on Computing paper that solves a decade-old problem in distributed computing.

Course summary:
Google processed over 400 PB of data on datacenters composed of thousands of machines in September 2007 alone. What challenges emerge when computing on such a scale? Users need simple and expressive parallel programming models. On the systems side, these models need to allow for scalable and fault-tolerant implementation on commodity computers. This lecture will describe ways in which these challenges can be addressed.
Prerequisites: Operating Systems, Networks, Algorithms, at an undergraduate level.

Assignment Information on the assignment is here.