Applications of machine learning to Big Data

Lecturer: Stanisław Matwin (Dalhousie University, Canada and IPI PAN).

About the lecturer | Course Summary | Slides | Assignment

About the lecturer: Stanisław Matwin is a Professor and "Canada Research Chair" at the Faculty of Computer Science, Dalhouse University, Halifax, Canada. He is the Director of the Institute for Big Data Analytics, the first research institute of this kind in Canada. He is a former chairman of the Canadian Artificial Intelligence Society. A member of the Scientific Council of the Polish Artificial Intelligence Society, for the past five years he has chaired the yearly competition for the best Polish PhD thesis in the field. He is and ECCAI Fellow. His research interests include textual data analysis, data exploration and data privacy.

Course summary:
  1. Introduction (definitions, challenge, nosql examples etc.)
  2. Linear algorithms
  3. Bayesian algorithms
  4. Stream data: VFDTs
  5. Visualization: examples
  6. Privacy and Big Data (scripts, Bloom filters), the ABAC approach (Acumulo)
  7. Applications: mobility, ocean-related data.
Literature:
  1. K. Cukier, V. Mayer-Schonberger: Big Data. Rewolucja, ktora zmieni nasze myslenie, prace i zycie. MT, 2013.
  2. A. Ng: Linear regression. Lecture notes, 2013.
  3. P. Domingos, G. Hulten: Mining High-speed Data Streams
  4. N. Yau: Data Points: Visualization That Means Someting
  5. Ch. Renso: Human mobility data, CUP 2013.
  6. Ribeiro, Singh, Guestrin: Why should I trust you? Explaining the predictions of any classifier
  7. L. Von Ahn, M. Blum, J. Langford: Telling humans and computers apart automatically. CACM 47(2), 2004.

Assignment: You can find the problem statement here. Attached to the problem is a map and a set of data as a .tar.gz file. Please send the solutions to the address provided in the problem statement by Monday, July 18.