CSCi 8980 - University of Minnesota

CSci 8980: Machine Learning in Computer Systems

Jon Weissman

Announcements

Welcome to 8980 -- Machine Learning in Computer Systems

Course Description

Instructor: Jon Weissman
Office Hours: 1-2 PM Friday Keller Hall 4-225F
Lectures: 11:15 AM-12:30 PM T/Th Kolthoff Hall 139
Section: 002

In this course, we will examine how ML techniques are being applied to computer systems, interpreted broadly, in areas such as databases, networks, OS, data centers, IOT/mobile, HPC, and others. The big question: is there a tangible impact? The course is suitable for any graduate student that has taken at least one 5xxx systems course (interpreted broadly), e.g. CSci 5103, 5708, 5211, etc. and ideally a class on statistics, data mining, or ML, though that is not required as we will assume little knowledge of the latter. Students unsure of their background should check with the instructor. The course will be run as a seminar with paper readings, critiques, and discussions. A student-defined final project that combines computer systems and ML will also be required. This course may be eligible for plan C credit.
The list of topics tentatively include:

Machine Learning Introduction
Databases
Networking
Power Management
Scheduling and resource management
Storage
Compilers/Architecture
Fault tolerance
IOT/mobile

The course will consist of paper readings and critiques/blogs, presentations, as well as a final project. It is intended primarily for graduate students (or budding graduate students) with research interests in one or more of the following areas: machine learning, distributed computing, operating systems, IOT/mobile computing, networking, databases, HPC, and others.
This class will survey the state of the art in the applications of machine learning to the broad area of computer systems. Readings will be drawn from recent publications in the top systems and ML conferences (OSDI, NSDI, SIGCOMM, SIGMOD, SC, NIPS, etc.) spanning areas including cloud computing, operating systems, mobile computing, networks, databases, distributed systems, among others.
This course is intended for graduate students at all levels, and some advanced undergraduates (by permission) that intend to go on to graduate school.

Grading

Presentation(s): 20% (10% for each of them)
~~Take-home mid-term: 20%~~
Final project: 40%
Written critiques (blogs): 10%
Discussions: 10%

Assignments

This course will involve paper readings, paper critiques, presentations, and a final project. You are expected to read the papers for each lecture, and engage in discussion. Due to the (relatively) small size, the class will be informal, and discussion-oriented. The presenter is also required to ask several questions of the audience to spur discussion, and vice-versa. The questions can be open-ended (all the better!). The goal isn't to stump anyone on tough questions or to show off, but to have fun and generate interesting exchange.
Paper Critiques: You will be providing paper critiques in the form of blogs for some of the papers that we will be reading (only for the long conference style papers). The critique is NOT a summary of the paper's content. Rather it is a brief analysis of the key ideas in the paper and your critical opinion of them. You are encouraged to point out flaws, limitations, or interesting applications of the ideas that go beyond what was said in the paper. You are also encouraged to connect and contrast the paper to papers we have already read. Most importantly, it should help stimulate discussion via the presenter. Here is an example blog. Here are the kinds of things you may want to put in your critique.
Lecture/discussion preparation: You will also be responsible for making two presentations during the class term - a longer paper (conference length) and a shorter paper (workshop length). Sometimes the short paper will be in the same area (typical), but occasionally may be in a completely different topic (marked by *). As already said, the goal of your presentation(s) is to stimulate discussion about the key ideas in the paper, not to simply list the gory details of the paper. As with the critiques, a strong presentation will go beyond what is in the paper and place its main contributions in context, relating the paper to others we have seen. A top presentation will engage the class in discussion, so you should ask questions of us during the talk. Here is an example presentation template. You may use ppt or the blackboard if you wish (either way, you should prepare notes that we can post later). Your paper presentation may need to include background material and possibly other reading. NEVER present concepts that you do not understand. You may also bring up the blog points raised by others on the paper to help stimulate discussion. Presentations should allow for enough discussion. Some papers are marked optional: helpful to read, but not necessarily discussed. You must briefly explain any ML background needed for the paper. You may need to read other papers to prepare for your presentation. You may find slides on-line as a starting point, but you must modify the slides as needed and you must understand everything you present!
~~Midterm:~~ There will be an essay-style take home exam that will test your knowledge of the key concepts in the course. Success on this exam depends critically on your class attendance, reading all of the assigned papers, and participating in class discussions.
Finally, you will complete a final project. This project is of your own choice and must be done in a group of any size depending on the scope and scale of the project. This project must be in the broad area of ML applied to any area of computer systems: a typical project would be implementation-based. Available infrastructures TBD. Traditional cloud infrastructures could be leveraged and these include: Microsoft Azure, Amazon EC-2 (http://aws.amazon.com/free/), Google Compute Cloud (https://cloud.google.com/free-trial/). If you are interested in one of these clouds, I recommend you get an account (for this you may need my help) and start to poke around. Some "risk" is also encouraged (and rewarded) in the project. Possible project ideas will be discussed in class. You will present your project ideas and final project to the class. All team members will receive the same score for the project. Your final project may build upon your research and if it leverages some existing work you must ensure that the project offers something new. You are encouraged (and expected) to read additional papers in support of your project (as needed).

CLASS BLOG

Syllabus and Schedule

Each class period will contain the presentation of one long (30 minutes) and one short (15 minutes) paper. You must scale the presentation to the paper type. Your job is to make the presentation lively presenting the most important and thought-provoking parts of the paper, not to regurgitate every detail. Sometimes the schedule will slip and your presentation will shift - if this is a major problem you need to let me know ahead of time. If you really, really want a paper I have picked, then you can request it. I'm also open to paper swapping where you can independently locate a different paper that you prefer or think is better than an existing paper, but it must be in a similar area and you must give us enough notice. The lecture notes may appear ahead of time or shortly after the lecture. This schedule is VERY tentative (some papers could change as well).

Date Topic Papers Presenter Blogger

======== Introduction ================================ ============== ==============

Tu 01/22/19 Course introduction
Lecture notes (Intro) Jon

Th 01/24/19 Machine Learning Basics-1
Lecture notes (ML Basics) Deep Learning Book (Chapter 1) Jon

Th 01/31/19 Machine Learning Basics-2
Lecture notes (DL Basics) Deep Learning Book (Chap. 5), Pattern Recognition and Machine Learning (Chap. 6), Reinforcement Learning Jon

======== Databases ================================ ============== ==============

Tu 02/05/19 Database Indices
Lecture notes (Case for Learned ... , Lifting the Curse ... ) The Case for Learned Index Structures, Lifting the Curse ... Sequeria, Sequeria Kayala

Th 02/07/19 Database Entities/TLBs
Lecture notes (Deep Learning for Entity Matching ... , Virtual Address Translation ... ) Deep Learning for Entity Matching ..., *Virtual Address Translation ...* Monteiro, Y Wang Biswas/Bhat

Tu 02/12/19 Database Tuning/Caching
Lecture notes (Auto DBMS Tuning... , PeCC ... ) Auto DBMS Tuning... , *PeCC ...* Y Wang, Monteiro Unnikrishnan/Gupta

======== Scheduling ================================ ============== ==============

Th 02/14/19 CMP scheduling/Caching
Lecture notes (Coordinated ... , Cache Miss Rate ... ) Coordinated ..., *Cache Miss Rate ...* Hegde, Bhat Li/Amudapuram

Tu 02/19/19 Mapping/Cluster Scheduling
Lecture notes (... Mapping Streaming ..., Learning ... Clusters ... (first one with RL) ) ... Mapping Streaming ..., Learning ... Clusters ... Nimkar, Shaheen S Wang/Gupta

Th 02/21/19 Placement
Lecture notes (Device Placement ..., Learning ... Tensor ... ) Device Placement ..., Learning ... Tensor ... Li, Biswas Sequeira/Monteiro

======== Power ================================ ============== ==============

Tu 02/26/19 Voltage Scaling/DC Power
Lecture notes (Integrated ..., ... Data Center Optimization ) Integrated ..., ... Data Center Optimization Wu, Hu S Wang/Shaheen

Th 02/28/19 CMP Power/Quantization
Lecture notes ( ReLeQ ... *DeepCache ...* ) ReleQ ..., DeepCache ... Hegde, Sadeghi Kulkarni/Wu

======== Compilers ================================ ============== ==============

Tu 03/05/19 Compiler Opt
Lecture notes (Project Discussion, ... Phase-Ordering ... ) ... Phase-Ordering ...(only 1 paper) Weissman, Hu Sadeghi/Hegde

Th 03/07/19 Midterm Replacement
Lecture notes (Eureka ..., Placeto ... ) *Eureka ...* (two related papers), *Placeto ...* Weissman, Li

Tu 03/12/19 Runtime
Lecture notes (CrystalBall ..., ... Performance Profiling ) CrystalBall ..., ... Performance Profiling Gupta, Wu Hu/Kayala

======== Fault Tolerance ================================ ============== ==============

Th 03/14/19 Link Failure/Cloud Debugging
Lecture notes (NetBouncer ..., Seer ... ) NetBouncer ..., *Seer ...* Kulkarni, Amudapuram Bhat/Unnikrishnan

Written 1 page project proposals due

Tu 03/19/19 Spring Break

Th 03/21/19 Spring Break

Tu 03/26/19 Node Failure/Anomaly Detection
Lecture notes (Doomsday ..., ... Anomaly Detection ) Doomsday ..., ... Anomaly Detection Shaheen, Kulkarni Amundapuram/Biswas

======== Networking ================================ ============== ==============

Th 03/28/19 Networking
Lecture notes (... Traffic-Driven ..., Iroko ... ) ... Traffic-Driven ..., Iroko ... Unnikrishnan, Nimkar Li/Nimkar

Tu 04/02/19 Traffic Optimization
Lecture notes (AuTO ..., Statistical ... ) AuTO ..., Statistical ... Sadeghi, S Wang Monteiro/Sequeira

1 Page Project proposal progress reports due next Tues (4/9)

======== Miscellaneous ================================ ============== ==============

Th 04/04/19 Software Configuration
Lecture notes (REx ..., ... API Functions ... ) REx ..., ... API Functions ... Gupta, Unnikrishnan Y Wang/Wu

Tu 04/09/19 Storage/IO
Lecture notes (CAPES ..., Chasing ... ) CAPES ..., Chasing ... Kayala, Biswas Kulkarni/Hegde

1 Page Project proposal progress reports due in class
Th 04/11/19 Migration/Containers
Lecture notes (... Live Migration ..., RACC ... ) ... Live Migration ..., RACC ... S Wang, Gupta Sadeghi/Wu

======== Going Small ================================ ============== ==============

Tu 04/16/19 IOT
Lecture notes (ApDeepSense ..., Deep Learning ... IOT ) ApDeepSense ..., Deep Learning ... IOT Bhat, Li Hu/Shaheen

Th 04/18/19 Mobile/IOT
Lecture notes (StormDroid ..., QualityDeepSense ... ) StormDroid ..., QualityDeepSense ... Amudapuram, Kayala Nimkar/Y Wang

======== Projects ================================ ============== ==============

Tu 04/23/19 No class: work on projects!

Th 04/25/19 project presentations

Tu 04/30/19 project presentations

Th 05/02/19 project presentations -- last class

Papers

Introduction

Deep Learning
Ian Goodfellow and Yoshua Bengio and Aaron Courville
MIT Press 2016
Pattern Recognition and Machine Learning
Christopher Bishop
Springer 2006
Introduction to Reinforcement Learning
Shipra Agrawal

Databases

The Case for Learned Index Structures
Tim Kraska et al
SIGMOD 2018
Lifting the Curse of Multidimensional Data with Learned Existence Indexes
Stephen Macke et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
Deep Learning for Entity Matching: A Design Space Exploration
Sidharth Mudgal et al
SIGMOD 2018
Virtual Address Translation via Learned Page Table Indexes
Artemiy Margaritov
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
Automatic Database Management System Tuning Through Large-scale Machine Learning
Dana Van Aken et al
SIGMOD 2017
PeCC: Prediction-error Correcting Cache
Adit Bhardwaj et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software

Scheduling

Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning Approach
Ramazan Bitirgen et al
MICRO 2008
Cache Miss Rate Predictability via Neural Networks
Rishikesh Jha et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors
Paul-Jules Micolet et al
LCTES 2016
Learning Scheduling Algorithms for Data Processing Clusters
Hongzi Mao et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
Device Placement Optimization with Reinforcement Learning
Azalia Mirhoseini et al
ICML 2017
Learning to Optimize Tensor Programs
Tianqi Chen
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software

Power

Integrated CPU and L2 Cache Voltage Scaling using Machine Learning
Nevine AbouGhazaleh et al
LCTES 2007
Machine Learning Applications for Data Center Optimization
Jim Gao, Google
Google AI White Paper, 2014
Up By Their Bootstraps: Online Learning in Artificial Neural Networks for CMP Uncore Power Management
Jae-Yeon Won et al
HPCA 2014
ReLeQ: An Automatic Reinforcement Learning Approach for Deep Quantization of Neural Networks
Amir Yazdanbakhsh et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
DeepCache: A Deep Learning Based Framework For Content Caching
Arvind Narayarn et al
NetAI 2018

Compilers

Mitigating the Compiler Optimization Phase-Ordering Problem using Machine Learning
Sameer Kulkarni et al
OOPSLA 2012
Automated Testing of Graphics Units by Deep-Learning Detection of Visual Anomalies
Lev Faivishevsky et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning
Stephen Zekany et al
MICRO 2016
Exploring the Use of Learning Algorithms for Efficient Performance Profiling
Shoumik Palkar
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software

Fault Tolerance

NetBouncer: Active Device and Link Failure Localization in Data Center Networks
Cheng Tan et al
NSDI 2019
Seer: Leveraging Big Data to Navigate The Complexity of Cloud Debugging
Yu Gan et al
Hotcloud 2018
Doomsday: Predicting Which Node Will Fail When on Supercomputers
Anwesha Das et al
SC 2018
Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection
Andy Brown et al
Workshop On Machine Learning for Computer Systems 2018

Networking

Neural Network Meets DCN: Traffic-driven Topology Adaptation with Deep Learning
Mowei Wang et al
ACM Meas. Anal. Comput. Syst 2018
DeepConf: Automating Data Center Network Topologies Management with Machine Learning
Saim Salman et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
PCC Vivace: Online-Learning Congestion Control
Mo Dong et al
NSDI 2018
Iroko: A Framework to Prototype Reinforcement Learning for Data Center Traffic Control
Fabian Ruffy et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Automatic Traffic Optimization
Li Chen et al
SIGCOMM 2018
Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters
Peter Bodik et al
Hotcloud 2009

Miscellaneous

REX: A Development Platform and Online Learning Approach for Runtime Emergent Software Systems
Barry Porter et al
OSDI 2016
Neural Inference of API Functions from Input Output Examples
Rohan Bavishi et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning
Yan Li et al
SC 2017
Chasing the Signal: Statistically Separating Multi-Tenant I/O Workloads
Si Chen et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software
A Machine Learning Approach to Live Migration Modeling
Changyeon Jo et al
SOCC 2017
RACC: Resource-Aware Container Consolidation using a Deep Learning Approach
Saurav Nanda et al
Workshop On Machine Learning for Computer Systems 2018

Going Small

ApDeepSense: Deep Learning Uncertainty Estimation Without the Pain for IoT Applications
Shuochao Yao et al
ICDCS 2018
Deep Learning for the Internet of Things
Shuochao Yao et al
IEEE Computer 2018
StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware
Sen Chen et al
ASIA CCS 2016
QualityDeepSense: Quality-Aware Deep Learning Framework for Internet of Things Applications with Sensor-Temporal Attention
Shuochao Yao et al
EMDL 2018

Some extras

Eureka: Edge-based Discovery of Training Data for Machine Learning
Ziqiang Feng et al
IEEE Internet Computing 2018
Edge-based Discovery of Training Data for Machine Learning
Ziqiang Feng et al
IEEE/ACM Symposium on Edge Computing (SEC) 2018
Placeto: Efficient Progressive Device Placement Optimization
Ravichandra Addanki et al
Workshop On Machine Learning for Computer Systems 2018

Date	Topic	Papers	Presenter	Blogger
========	Introduction	================================	==============	==============
Tu 01/22/19	Course introduction Lecture notes (Intro)		Jon
Th 01/24/19	Machine Learning Basics-1 Lecture notes (ML Basics)	Deep Learning Book (Chapter 1)	Jon
Th 01/31/19	Machine Learning Basics-2 Lecture notes (DL Basics)	Deep Learning Book (Chap. 5), Pattern Recognition and Machine Learning (Chap. 6), Reinforcement Learning	Jon
========	Databases	================================	==============	==============
Tu 02/05/19	Database Indices Lecture notes (Case for Learned ... , Lifting the Curse ... )	The Case for Learned Index Structures, Lifting the Curse ...	Sequeria, Sequeria	Kayala
Th 02/07/19	Database Entities/TLBs Lecture notes (Deep Learning for Entity Matching ... , Virtual Address Translation ... )	Deep Learning for Entity Matching ..., Virtual Address Translation ...	Monteiro, Y Wang	Biswas/Bhat
Tu 02/12/19	Database Tuning/Caching Lecture notes (Auto DBMS Tuning... , PeCC ... )	Auto DBMS Tuning... , PeCC ...	Y Wang, Monteiro	Unnikrishnan/Gupta
========	Scheduling	================================	==============	==============
Th 02/14/19	CMP scheduling/Caching Lecture notes (Coordinated ... , Cache Miss Rate ... )	Coordinated ..., Cache Miss Rate ...	Hegde, Bhat	Li/Amudapuram
Tu 02/19/19	Mapping/Cluster Scheduling Lecture notes (... Mapping Streaming ..., Learning ... Clusters ... (first one with RL) )	... Mapping Streaming ..., Learning ... Clusters ...	Nimkar, Shaheen	S Wang/Gupta
Th 02/21/19	Placement Lecture notes (Device Placement ..., Learning ... Tensor ... )	Device Placement ..., Learning ... Tensor ...	Li, Biswas	Sequeira/Monteiro
========	Power	================================	==============	==============
Tu 02/26/19	Voltage Scaling/DC Power Lecture notes (Integrated ..., ... Data Center Optimization )	Integrated ..., ... Data Center Optimization	Wu, Hu	S Wang/Shaheen
Th 02/28/19	CMP Power/Quantization Lecture notes ( ReLeQ ... DeepCache ... )	ReleQ ..., DeepCache ...	Hegde, Sadeghi	Kulkarni/Wu
========	Compilers	================================	==============	==============
Tu 03/05/19	Compiler Opt Lecture notes (Project Discussion, ... Phase-Ordering ... )	... Phase-Ordering ...(only 1 paper)	Weissman, Hu	Sadeghi/Hegde
Th 03/07/19	Midterm Replacement Lecture notes (Eureka ..., Placeto ... )	Eureka ... (two related papers), Placeto ...	Weissman, Li
Tu 03/12/19	Runtime Lecture notes (CrystalBall ..., ... Performance Profiling )	CrystalBall ..., ... Performance Profiling	Gupta, Wu	Hu/Kayala
========	Fault Tolerance	================================	==============	==============
Th 03/14/19	Link Failure/Cloud Debugging Lecture notes (NetBouncer ..., Seer ... )	NetBouncer ..., Seer ...	Kulkarni, Amudapuram	Bhat/Unnikrishnan
	Written 1 page project proposals due
Tu 03/19/19	Spring Break
Th 03/21/19	Spring Break
Tu 03/26/19	Node Failure/Anomaly Detection Lecture notes (Doomsday ..., ... Anomaly Detection )	Doomsday ..., ... Anomaly Detection	Shaheen, Kulkarni	Amundapuram/Biswas
========	Networking	================================	==============	==============
Th 03/28/19	Networking Lecture notes (... Traffic-Driven ..., Iroko ... )	... Traffic-Driven ..., Iroko ...	Unnikrishnan, Nimkar	Li/Nimkar
Tu 04/02/19	Traffic Optimization Lecture notes (AuTO ..., Statistical ... )	AuTO ..., Statistical ...	Sadeghi, S Wang	Monteiro/Sequeira
	1 Page Project proposal progress reports due next Tues (4/9)
========	Miscellaneous	================================	==============	==============
Th 04/04/19	Software Configuration Lecture notes (REx ..., ... API Functions ... )	REx ..., ... API Functions ...	Gupta, Unnikrishnan	Y Wang/Wu
Tu 04/09/19	Storage/IO Lecture notes (CAPES ..., Chasing ... )	CAPES ..., Chasing ...	Kayala, Biswas	Kulkarni/Hegde
	1 Page Project proposal progress reports due in class
Th 04/11/19	Migration/Containers Lecture notes (... Live Migration ..., RACC ... )	... Live Migration ..., RACC ...	S Wang, Gupta	Sadeghi/Wu
========	Going Small	================================	==============	==============
Tu 04/16/19	IOT Lecture notes (ApDeepSense ..., Deep Learning ... IOT )	ApDeepSense ..., Deep Learning ... IOT	Bhat, Li	Hu/Shaheen
Th 04/18/19	Mobile/IOT Lecture notes (StormDroid ..., QualityDeepSense ... )	StormDroid ..., QualityDeepSense ...	Amudapuram, Kayala	Nimkar/Y Wang
========	Projects	================================	==============	==============
Tu 04/23/19	No class: work on projects!
Th 04/25/19	project presentations
Tu 04/30/19	project presentations
Th 05/02/19	project presentations -- last class