Glimpses of peer-to-peer and data-center technologies for large-scale data storage and management

Lecturer: Prof. Anwitaman Datta(NTU Singapore).

About the lecturer | Course Summary | Slides: All | Assigments (due 24 April 2012): Assignment

About the lecturer: Anwitaman Datta did his PhD at EPFL Lausanne before moving to NTU Singapore in 2006, where he is currently an Assistant Professor in the School of Computer Engineering. He is interested in large scale networked distributed information systems and social collaboration networks, self-organization and algorithmic issues of these systems and networks and their scalability, resilience, security and performance. He leads the SANDS (Self-* and Algorithmic aspects of Networked Distributed Systems) group at NTU. He won the best paper awards at IWSOS 2006, ICDCS 2007 and ICDCN 2011, and is one of the recipients of HP Labs Innovation Research Program award 2008.

Course summary: This course will have three parts. The first two will be related to peer-to-peer (P2P) technologies, namely distributed hash tables (DHTs) and P2P storage systems. After traversing through some core concepts about such P2P systems which are typically distributed around the world, we will follow the journey of such technologies to the data-centers. Specifically, for DHTs we will look at the routing and topological aspects including graph theoretic & small-world interpretations; and issues related to the maintenance & bootstrapping of such overlay networks. For P2P storage systems we will explore the design space - including the different kinds of redundancy, maintenance and placement choices that affect system performance. Finally, for data-centers, we will take a look at two very distinct architectures, that of Google File Systems (older version) and Amazon's Dynamo, and connecting to some of the concepts and technologies that emerged from the P2P literature. The course will be concluded by introducing the concept of NoSQL systems.