IT Tools 2 (NoSQL, Big Data Processing with Spark)
- Course type
- COMPUTER SCIENCE
- Correspondant
- François PORTIER
- Unit
-
UE-MSD05 : IT Tools
- Number of ECTS
- 3
- Course code
- MSD 05-2
- Distribution of courses
-
Heures de cours : 24
- Language of teaching
- English
Objectives
NoSQL:
Understand the fundamentals of NoSQL databases and the features and specific challenges NoSQL databases are addressing compared to classic SQL databases. Evaluate and select appropriate NoSQL technologies for particular situations. Gain hands-on experience in deploying and using NoSQL databases, such as MongoDB or Neo4j.
Big Data Processing with Spark:
Understand the stakes of distributed computing through the Apache Spark architecture. Discover how to use Apache Spark, platforms & tools available. Practice PySpark coding to learn Apache Spark features, from data management to machine learning.
Course outline
NoSQL:
– NoSQL origins (history & players)
– NoSQL / SQL comparison
– Key concepts of NoSQL databases:
– Data models
– Distribution models
– Query languages
– Consistency
– NoSQL database types
– NoSQL database technologies & comparisons (MongoDB, Cassandra, Neo4j, Redis, ElasticSearch…)
– Neo4j introduction + lab
– ElasticSearch introduction + lab
Big Data Processing with Spark:
– Distributed computing introduction
– Apache Spark origins & history, links to Apache Hadoop
– Apache architecture and main concepts:
– Apache Spark “modules”
– Architecture: driver & executors
– Transformations vs. actions
– Lazy evaluation
– Data structures: RDD, dataframes & datasets
– Using Apache Spark:
– Create sessions and connect to clusters
– Use data management functions
– Leverage SQL with Spark SQL
– Train & test machine learning models
– Use Spark Web UI
Prerequisites
NoSQL:
Basic knowledge of SQL, databases, and computer systems
Big Data Processing with Spark:
Computer systems and architecture basic knowledge, Python & SQL language practice