Course Outline
Lecture Notes
### Lecture 1. Introduction ### Lecture 2. Prerequisites: Computer Architecture and Introduction to C Programming ### Lecture 3. Working with HPC Introduction to HPC environments, Linux command line, compilers, remote login, job submission, and using cluster resources effectively. ### Lecture 4. Parallel Programming Basics ### Lecture 5. Introduction to OpenMP ### Lectures 6. OpenMP Basics ### Lecture 9. Introduction to MPI ### Lecture 10. Point-to-Point Communication in MPI ### Lecture 11. Collective Communication in MPI ### Lecture 12. Derived Datatypes in MPI ### Lecture 13. Hybrid OpenMP and MPI ### Lecture 14. Introduction to GPU ### Lecture 15. Introduction to CUDA ### Lecture 16. Basic CUDA Programming ### Lecture 17. NVIDIA GPU Architectures ### Lecture 18. CUDA Performance: From Warp Execution to Memory Access ### Lecture 19. Functions in CUDA ### Lecture 20. Deep Learning Essentials ### Lecture 21. Softmax and Flash Attention in CUDA ### Lecture 22. CUDA Support: Makefile and CUDA Toolchain ### Lecture 23. Parallel Deep Neural Network Training