Big Data IT Tools
- Course type
- COMPUTER SCIENCE
- Correspondant
- Ikko YAMANE
- Unit
-
Module 2-08 : Computational methods
- Number of ECTS
- 2
- Course code
- 2AINF06
- Distribution of courses
-
Heures de cours : 18
Heures de TP : 6
- Language of teaching
- French
Objectives
Find your way around the most common "big data" technologies
Identify bottlenecks in data processing execution and adapt processing to remedy them
Choose and implement the right architecture for a given processing task, in particular CPU vs. GPU, local vs. cloud, batch vs. streaming, high-level vs. low-level, etc.
Produce simple statistical analyses with Spark
Provision a simple infrastructure on AWS
Course outline
The term "big data" is being used more and more, both in business and in the general media. Unfortunately, it is often used as a catch-all term. This course begins with a deconstruction of the notion of big data, presenting the V’s of big data and introducing the notion of high-performance data processing.
It then presents an overview of the technologies labelled big data and the associated computing architectures, comparing them with traditional solutions:
General architecture of local (processor, RAM, storage) and distributed computing (centralized vs. pee-to-peer; advantages and disadvantages of distributed systems)
Storage architectures (file systems vs. databases, local vs. distributed)
Focus on distributed storage with HDFS
Focus on distributed computing with Spark and MapReduce
Introduction to cloud computing with Amazon Web Service
Prerequisites
Basic knowledge of Python