CSci 8980: Machine Learning in Computer Systems |
Announcements |
Welcome to 8980 -- Machine Learning in Computer Systems
Course Description |
Instructor: Jon Weissman
Office Hours: 1-2 PM Friday Keller Hall 4-225F
Lectures: 11:15 AM-12:30 PM T/Th Kolthoff Hall 139
Section: 002
In this course, we will examine how ML techniques are being applied to computer systems, interpreted broadly, in areas such as databases, networks, OS, data centers, IOT/mobile, HPC, and others. The big question: is there a tangible impact? The course is suitable for any graduate student that has taken at least one 5xxx systems course (interpreted broadly), e.g. CSci 5103, 5708, 5211, etc. and ideally a class on statistics, data mining, or ML, though that is not required as we will assume little knowledge of the latter. Students unsure of their background should check with the instructor. The course will be run as a seminar with paper readings, critiques, and discussions. A student-defined final project that combines computer systems and ML will also be required. This course may be eligible for plan C credit.
The list of topics tentatively include:
- Machine Learning Introduction
- Databases
- Networking
- Power Management
- Scheduling and resource management
- Storage
- Compilers/Architecture
- Fault tolerance
- IOT/mobile
The course will consist of paper readings and critiques/blogs, presentations, as well as a final project. It is intended primarily for graduate students (or budding graduate students) with research interests in one or more of the following areas: machine learning, distributed computing, operating systems, IOT/mobile computing, networking, databases, HPC, and others.
This class will survey the state of the art in the applications of machine learning to the broad area of computer systems. Readings will be drawn from recent publications in the top systems and ML conferences (OSDI, NSDI, SIGCOMM, SIGMOD, SC, NIPS, etc.) spanning areas including cloud computing, operating systems, mobile computing, networks, databases, distributed systems, among others.
This course is intended for graduate students at all levels, and some advanced undergraduates (by permission) that intend to go on to graduate school.
Grading
- Presentation(s): 20% (10% for each of them)
Take-home mid-term: 20%- Final project: 40%
- Written critiques (blogs): 10%
- Discussions: 10%
Assignments This course will involve paper readings, paper critiques, presentations, and a final project. You are expected to read the papers for each lecture, and engage in discussion. Due to the (relatively) small size, the class will be informal, and discussion-oriented. The presenter is also required to ask several questions of the audience to spur discussion, and vice-versa. The questions can be open-ended (all the better!). The goal isn't to stump anyone on tough questions or to show off, but to have fun and generate interesting exchange.
Paper Critiques: You will be providing paper critiques in the form of blogs for some of the papers that we will be reading (only for the long conference style papers). The critique is NOT a summary of the paper's content. Rather it is a brief analysis of the key ideas in the paper and your critical opinion of them. You are encouraged to point out flaws, limitations, or interesting applications of the ideas that go beyond what was said in the paper. You are also encouraged to connect and contrast the paper to papers we have already read. Most importantly, it should help stimulate discussion via the presenter. Here is an example blog. Here are the kinds of things you may want to put in your critique.
Lecture/discussion preparation: You will also be responsible for making two presentations during the class term - a longer paper (conference length) and a shorter paper (workshop length). Sometimes the short paper will be in the same area (typical), but occasionally may be in a completely different topic (marked by *). As already said, the goal of your presentation(s) is to stimulate discussion about the key ideas in the paper, not to simply list the gory details of the paper. As with the critiques, a strong presentation will go beyond what is in the paper and place its main contributions in context, relating the paper to others we have seen. A top presentation will engage the class in discussion, so you should ask questions of us during the talk. Here is an example presentation template. You may use ppt or the blackboard if you wish (either way, you should prepare notes that we can post later). Your paper presentation may need to include background material and possibly other reading. NEVER present concepts that you do not understand. You may also bring up the blog points raised by others on the paper to help stimulate discussion. Presentations should allow for enough discussion. Some papers are marked optional: helpful to read, but not necessarily discussed. You must briefly explain any ML background needed for the paper. You may need to read other papers to prepare for your presentation. You may find slides on-line as a starting point, but you must modify the slides as needed and you must understand everything you present!
Midterm:There will be an essay-style take home exam that will test your knowledge of the key concepts in the course. Success on this exam depends critically on your class attendance, reading all of the assigned papers, and participating in class discussions.Finally, you will complete a final project. This project is of your own choice and must be done in a group of any size depending on the scope and scale of the project. This project must be in the broad area of ML applied to any area of computer systems: a typical project would be implementation-based. Available infrastructures TBD. Traditional cloud infrastructures could be leveraged and these include: Microsoft Azure, Amazon EC-2 (http://aws.amazon.com/free/), Google Compute Cloud (https://cloud.google.com/free-trial/). If you are interested in one of these clouds, I recommend you get an account (for this you may need my help) and start to poke around. Some "risk" is also encouraged (and rewarded) in the project. Possible project ideas will be discussed in class. You will present your project ideas and final project to the class. All team members will receive the same score for the project. Your final project may build upon your research and if it leverages some existing work you must ensure that the project offers something new. You are encouraged (and expected) to read additional papers in support of your project (as needed).
Syllabus and Schedule Each class period will contain the presentation of one long (30 minutes) and one short (15 minutes) paper. You must scale the presentation to the paper type. Your job is to make the presentation lively presenting the most important and thought-provoking parts of the paper, not to regurgitate every detail. Sometimes the schedule will slip and your presentation will shift - if this is a major problem you need to let me know ahead of time. If you really, really want a paper I have picked, then you can request it. I'm also open to paper swapping where you can independently locate a different paper that you prefer or think is better than an existing paper, but it must be in a similar area and you must give us enough notice. The lecture notes may appear ahead of time or shortly after the lecture. This schedule is VERY tentative (some papers could change as well).
Date Topic Papers Presenter Blogger ======== Introduction ================================ ============== ============== Tu 01/22/19 Course introduction
Lecture notes (Intro)Jon Th 01/24/19 Machine Learning Basics-1
Lecture notes (ML Basics)Deep Learning Book (Chapter 1) Jon Th 01/31/19 Machine Learning Basics-2
Lecture notes (DL Basics)Deep Learning Book (Chap. 5), Pattern Recognition and Machine Learning (Chap. 6), Reinforcement Learning Jon ======== Databases ================================ ============== ============== Tu 02/05/19 Database Indices
Lecture notes (Case for Learned ... , Lifting the Curse ... )The Case for Learned Index Structures, Lifting the Curse ... Sequeria, Sequeria Kayala Th 02/07/19 Database Entities/TLBs
Lecture notes (Deep Learning for Entity Matching ... , Virtual Address Translation ... )Deep Learning for Entity Matching ..., *Virtual Address Translation ...* Monteiro, Y Wang Biswas/Bhat Tu 02/12/19 Database Tuning/Caching
Lecture notes (Auto DBMS Tuning... , PeCC ... )Auto DBMS Tuning... , *PeCC ...* Y Wang, Monteiro Unnikrishnan/Gupta ======== Scheduling ================================ ============== ============== Th 02/14/19 CMP scheduling/Caching
Lecture notes (Coordinated ... , Cache Miss Rate ... )Coordinated ..., *Cache Miss Rate ...* Hegde, Bhat Li/Amudapuram Tu 02/19/19 Mapping/Cluster Scheduling
Lecture notes (... Mapping Streaming ..., Learning ... Clusters ... (first one with RL) )... Mapping Streaming ..., Learning ... Clusters ... Nimkar, Shaheen S Wang/Gupta Th 02/21/19 Placement
Lecture notes (Device Placement ..., Learning ... Tensor ... )Device Placement ..., Learning ... Tensor ... Li, Biswas Sequeira/Monteiro ======== Power ================================ ============== ============== Tu 02/26/19 Voltage Scaling/DC Power
Lecture notes (Integrated ..., ... Data Center Optimization )Integrated ..., ... Data Center Optimization Wu, Hu S Wang/Shaheen Th 02/28/19 CMP Power/Quantization
Lecture notes ( ReLeQ ... *DeepCache ...* )ReleQ ..., DeepCache ... Hegde, Sadeghi Kulkarni/Wu ======== Compilers ================================ ============== ============== Tu 03/05/19 Compiler Opt
Lecture notes (Project Discussion, ... Phase-Ordering ... )... Phase-Ordering ...(only 1 paper) Weissman, Hu Sadeghi/Hegde Th 03/07/19 Midterm Replacement
Lecture notes (Eureka ..., Placeto ... )*Eureka ...* (two related papers), *Placeto ...* Weissman, Li Tu 03/12/19 Runtime
Lecture notes (CrystalBall ..., ... Performance Profiling )CrystalBall ..., ... Performance Profiling Gupta, Wu Hu/Kayala ======== Fault Tolerance ================================ ============== ============== Th 03/14/19 Link Failure/Cloud Debugging
Lecture notes (NetBouncer ..., Seer ... )NetBouncer ..., *Seer ...* Kulkarni, Amudapuram Bhat/Unnikrishnan Written 1 page project proposals due Tu 03/19/19 Spring Break
Th 03/21/19 Spring Break
Tu 03/26/19 Node Failure/Anomaly Detection
Lecture notes (Doomsday ..., ... Anomaly Detection )Doomsday ..., ... Anomaly Detection Shaheen, Kulkarni Amundapuram/Biswas ======== Networking ================================ ============== ============== Th 03/28/19 Networking
Lecture notes (... Traffic-Driven ..., Iroko ... )... Traffic-Driven ..., Iroko ... Unnikrishnan, Nimkar Li/Nimkar Tu 04/02/19 Traffic Optimization
Lecture notes (AuTO ..., Statistical ... )AuTO ..., Statistical ... Sadeghi, S Wang Monteiro/Sequeira 1 Page Project proposal progress reports due next Tues (4/9) ======== Miscellaneous ================================ ============== ============== Th 04/04/19 Software Configuration
Lecture notes (REx ..., ... API Functions ... )REx ..., ... API Functions ... Gupta, Unnikrishnan Y Wang/Wu Tu 04/09/19 Storage/IO
Lecture notes (CAPES ..., Chasing ... )CAPES ..., Chasing ... Kayala, Biswas Kulkarni/Hegde 1 Page Project proposal progress reports due in class Th 04/11/19 Migration/Containers
Lecture notes (... Live Migration ..., RACC ... )... Live Migration ..., RACC ... S Wang, Gupta Sadeghi/Wu ======== Going Small ================================ ============== ============== Tu 04/16/19 IOT
Lecture notes (ApDeepSense ..., Deep Learning ... IOT )ApDeepSense ..., Deep Learning ... IOT Bhat, Li Hu/Shaheen Th 04/18/19 Mobile/IOT
Lecture notes (StormDroid ..., QualityDeepSense ... )StormDroid ..., QualityDeepSense ... Amudapuram, Kayala Nimkar/Y Wang ======== Projects ================================ ============== ============== Tu 04/23/19 No class: work on projects!
Th 04/25/19 project presentations Tu 04/30/19 project presentations Th 05/02/19 project presentations -- last class
Papers Introduction
- Deep Learning
Ian Goodfellow and Yoshua Bengio and Aaron Courville
MIT Press 2016- Pattern Recognition and Machine Learning
Christopher Bishop
Springer 2006- Introduction to Reinforcement Learning
Shipra Agrawal
Databases
- The Case for Learned Index Structures
Tim Kraska et al
SIGMOD 2018- Lifting the Curse of Multidimensional Data with Learned Existence Indexes
Stephen Macke et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- Deep Learning for Entity Matching: A Design Space Exploration
Sidharth Mudgal et al
SIGMOD 2018- Virtual Address Translation via Learned Page Table Indexes
Artemiy Margaritov
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- Automatic Database Management System Tuning Through Large-scale Machine Learning
Dana Van Aken et al
SIGMOD 2017- PeCC: Prediction-error Correcting Cache
Adit Bhardwaj et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source SoftwareScheduling
- Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning Approach
Ramazan Bitirgen et al
MICRO 2008- Cache Miss Rate Predictability via Neural Networks
Rishikesh Jha et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors
Paul-Jules Micolet et al
LCTES 2016- Learning Scheduling Algorithms for Data Processing Clusters
Hongzi Mao et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- Device Placement Optimization with Reinforcement Learning
Azalia Mirhoseini et al
ICML 2017- Learning to Optimize Tensor Programs
Tianqi Chen
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source SoftwarePower
- Integrated CPU and L2 Cache Voltage Scaling using Machine Learning
Nevine AbouGhazaleh et al
LCTES 2007- Machine Learning Applications for Data Center Optimization
Jim Gao, Google
Google AI White Paper, 2014- Up By Their Bootstraps: Online Learning in Artificial Neural Networks for CMP Uncore Power Management
Jae-Yeon Won et al
HPCA 2014- ReLeQ: An Automatic Reinforcement Learning Approach for Deep Quantization of Neural Networks
Amir Yazdanbakhsh et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- DeepCache: A Deep Learning Based Framework For Content Caching
Arvind Narayarn et al
NetAI 2018Compilers
- Mitigating the Compiler Optimization Phase-Ordering Problem using Machine Learning
Sameer Kulkarni et al
OOPSLA 2012- Automated Testing of Graphics Units by Deep-Learning Detection of Visual Anomalies
Lev Faivishevsky et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning
Stephen Zekany et al
MICRO 2016- Exploring the Use of Learning Algorithms for Efficient Performance Profiling
Shoumik Palkar
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source SoftwareFault Tolerance
- NetBouncer: Active Device and Link Failure Localization in Data Center Networks
Cheng Tan et al
NSDI 2019- Seer: Leveraging Big Data to Navigate The Complexity of Cloud Debugging
Yu Gan et al
Hotcloud 2018- Doomsday: Predicting Which Node Will Fail When on Supercomputers
Anwesha Das et al
SC 2018- Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection
Andy Brown et al
Workshop On Machine Learning for Computer Systems 2018Networking
- Neural Network Meets DCN: Traffic-driven Topology Adaptation with Deep Learning
Mowei Wang et al
ACM Meas. Anal. Comput. Syst 2018- DeepConf: Automating Data Center Network Topologies Management with Machine Learning
Saim Salman et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- PCC Vivace: Online-Learning Congestion Control
Mo Dong et al
NSDI 2018- Iroko: A Framework to Prototype Reinforcement Learning for Data Center Traffic Control
Fabian Ruffy et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Automatic Traffic Optimization
Li Chen et al
SIGCOMM 2018- Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters
Peter Bodik et al
Hotcloud 2009Miscellaneous
- REX: A Development Platform and Online Learning Approach for Runtime Emergent Software Systems
Barry Porter et al
OSDI 2016- Neural Inference of API Functions from Input Output Examples
Rohan Bavishi et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning
Yan Li et al
SC 2017- Chasing the Signal: Statistically Separating Multi-Tenant I/O Workloads
Si Chen et al
NIPS 2018, MLSys: Workshop on Systems for ML and Open Source Software- A Machine Learning Approach to Live Migration Modeling
Changyeon Jo et al
SOCC 2017- RACC: Resource-Aware Container Consolidation using a Deep Learning Approach
Saurav Nanda et al
Workshop On Machine Learning for Computer Systems 2018Going Small
- ApDeepSense: Deep Learning Uncertainty Estimation Without the Pain for IoT Applications
Shuochao Yao et al
ICDCS 2018- Deep Learning for the Internet of Things
Shuochao Yao et al
IEEE Computer 2018- StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware
Sen Chen et al
ASIA CCS 2016- QualityDeepSense: Quality-Aware Deep Learning Framework for Internet of Things Applications with Sensor-Temporal Attention
Shuochao Yao et al
EMDL 2018Some extras
- Eureka: Edge-based Discovery of Training Data for Machine Learning
Ziqiang Feng et al
IEEE Internet Computing 2018- Edge-based Discovery of Training Data for Machine Learning
Ziqiang Feng et al
IEEE/ACM Symposium on Edge Computing (SEC) 2018- Placeto: Efficient Progressive Device Placement Optimization
Ravichandra Addanki et al
Workshop On Machine Learning for Computer Systems 2018