First semester

IT Tools 2 (NoSQL, Big Data Processing with Spark)

Objectives

NoSQL: rnUnderstand the fundamentals of NoSQL databases and the features and specific challenges NoSQL databases are addressing compared to classic SQL databases. Evaluate and select appropriate NoSQL technologies for particular situations. Gain hands-on experience in deploying and using NoSQL databases, such as MongoDB or Neo4j. rnrnrnBig Data Processing with Spark: rnUnderstand the stakes of distributed computing through the Apache Spark architecture. Discover how to use Apache Spark, platforms & tools available. Practice PySpark coding to learn Apache Spark features, from data management to machine learning. rnrn

Course outline

NoSQL:rn- NoSQL origins (history & players) rn- NoSQL / SQL comparison rn- Key concepts of NoSQL databases: rn – Data models rn – Distribution models rn – Query languages rn – Consistency rn- NoSQL database types rn- NoSQL database technologies & comparisons (MongoDB, Cassandra, Neo4j, Redis, ElasticSearch…) rn- Neo4j introduction + lab rn- Cassandra introduction + lab rnrnrnBig Data Processing with Spark: rn- Distributed computing introduction rn- Apache Spark origins & history, links to Apache Hadoop rn- Apache architecture and main concepts: rn – Apache Spark “modules” rn – Architecture: driver & executors rn – Transformations vs. actions rn – Lazy evaluation rn – Data structures: RDD, dataframes & datasets rn- Using Apache Spark: rn – Create sessions and connect to clusters rn – Use data management functions rn – Leverage SQL with Spark SQL rn- Train & test machine learning models rn- Use Spark Web UI

Prerequisites

NoSQL: rnBasic knowledge of SQL, databases, and computer systems rnrnrnBig Data Processing with Spark:rnComputer systems and architecture basic knowledge, Python & SQL language practice rnrn