IT Tools 2 (NoSQL, Big Data Processing with Spark)
- Teacher(s)
- Hervé MIGNOT, Nikolaos PARLAVANTZAS
- Course type
- COMPUTER SCIENCE
- Correspondant
- François PORTIER
- Unit
-
UE-MSD05 : IT Tools
- Number of ECTS
- 3
- Course code
- MSD 05-2
- Distribution of courses
-
Heures de cours : 24
- Language of teaching
- English
Objectives
NoSQL: rnUnderstand the fundamentals of NoSQL databases and the features and specific challenges NoSQL databases are addressing compared to classic SQL databases. Evaluate and select appropriate NoSQL technologies for particular situations. Gain hands-on experience in deploying and using NoSQL databases, such as MongoDB or Neo4j. rnrnrnBig Data Processing with Spark: rnUnderstand the stakes of distributed computing through the Apache Spark architecture. Discover how to use Apache Spark, platforms & tools available. Practice PySpark coding to learn Apache Spark features, from data management to machine learning. rnrn
Course outline
NoSQL:rn- NoSQL origins (history & players) rn- NoSQL / SQL comparison rn- Key concepts of NoSQL databases: rn – Data models rn – Distribution models rn – Query languages rn – Consistency rn- NoSQL database types rn- NoSQL database technologies & comparisons (MongoDB, Cassandra, Neo4j, Redis, ElasticSearch…) rn- Neo4j introduction + lab rn- Cassandra introduction + lab rnrnrnBig Data Processing with Spark: rn- Distributed computing introduction rn- Apache Spark origins & history, links to Apache Hadoop rn- Apache architecture and main concepts: rn – Apache Spark “modules” rn – Architecture: driver & executors rn – Transformations vs. actions rn – Lazy evaluation rn – Data structures: RDD, dataframes & datasets rn- Using Apache Spark: rn – Create sessions and connect to clusters rn – Use data management functions rn – Leverage SQL with Spark SQL rn- Train & test machine learning models rn- Use Spark Web UI
Prerequisites
NoSQL: rnBasic knowledge of SQL, databases, and computer systems rnrnrnBig Data Processing with Spark:rnComputer systems and architecture basic knowledge, Python & SQL language practice rnrn