First semester

Parallel Computing with R and Python

Objectives

– Detecting the slow parts of a script by using graphical tools for code profiling. Students will be able to detect the parts of a script where the code should be improved and where the memory allocations should be reduced. rn- Knowing the various ways of implementing parallel computations.rn- Understanding the futureverse ecosystem of packages which is a unifying parallelization framework in R for everyone with which you can parallelize locally or on clusters.rn- If time permits, learning basics of C++ coding and interfacing it in R and how to parallelize C++ code within R.rn- Improving the code performances using CPU parallel computation. Students will be able to use both of the forking and socket methods of parallel computation.

Course outline

First, an introduction of code profiling is proposed (micro and macro profiling, memory monitoring). Then, the two standard methods for CPU parallel computations are presented (forking and socket).rnrnIn the R section, we will learn how to profile the code to look for slow parts or memory-heavy parts. We will then learn a few tricks to make sure the basic R code is optimized before thinking about parallelization. Next, we will introduce various ways of implementing parallel computations in R with their pros and cons. Finally, we will go into depth about the futureverse framework which is a unifying framework for parallel computing in R. We will be learning through various examples, such as simulations and so on. If time permits, we will show how one can actually implement functions in R that actual run C++ code and how this code can be easily parallelized as well.rnrnWith Python, we will first review low level explicit parallelism using the multiprocessing library. Then, we’ll focus on the Dask library which supports parallel CPU-based processing of large collections of data such as arrays or CSV files.

Prerequisites

Knowledge of R and Python