Data Analytics

CSC3005

Big Data is nowadays manifested in a very large number of environments and application fields pertaining to our education, entertainment, health, public governance, enterprising, etc. This module will endow students with the understanding of the new challenges big data introduces, in particular in the area of IoT and the currently available solutions. These include (i) challenges pertaining to the modelling, accessing, and storing of big data, (ii) an understanding of the fundamentals of systems designed to store and access big data, (iii) programming paradigms for efficient scalable access to big data, and (iv) data processing methodology to facilitate big data analytics.

The module will have a particular emphasis on the impact of the desiderata of scalability and efficiency in big data infrastructures, and expose students with a number of different cloud-based NoSQL systems and their design and implementation details, showing how they can achieve efficiency and scalability. Topics to be covered include Google FS, HDFS, Map-Reduce/Spark Programming paradigm (including an overview of computational statistics and machine learning in the Hadoop/Spark universe), Distributed NoSQL data store (BigTable/HBase), Cassandra and Hive.