Main Program

Sunday Feb. 23, 2020

[18:00 – 20:00] Reception

Day 1: Monday Feb. 24, 2020

[07:00 – 08:10] Breakfast
[08:10 – 08:20] Opening: chairs’ welcome (Pavillion)
[08:20 – 08:30] SIGPLAN CARES and SIGARCH/SIGMICRO CARES intro
Peng Wu (Futurewei Technologies) and Timothy Pinkston (University of Southern California)
[08:30 – 09:30] Keynote: Interdisciplinary Research at a Time of Pervasive Changes
Josep Torrellas (University of Illinois, Urbana-Champaign)
[09:35 – 10:25] Session 1a: Machine Learning Acceleration
(Capri/Rivera)
Session 1b: Reliability and Fault Tolerance
(Monte Carlo/St. Tropez)
Session Chair: Antonio Gonzalez Session Chair: Devesh Tiwari
Deep Learning Acceleration with Neuron-to-Memory Transformation

Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, and Tajana Rosing (University of California, San Diego)
ACR: Amnesic Checkpointing and Recovery

Ismail Akturk (University of Missouri, Columbia); Ulya Karpuzcu (U. Minnesota)
HyGCN: A GCN Accelerator with Hybrid Architecture
[Slide]

Mingyu Yan (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Lei Deng and Xing Hu (University of California, Santa Barbara); Ling Liang (Unicersity of Canofornia, Santa Barbara); YuJing feng (Institute of computing technology, Chinese Academy of Sciences); Xiaochun Ye (Chinese Academy of Sciences); Zhimin Zhang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences); Dongrui Fan (Institute of Computing Technology, Chinese Academy of Sciences); Yuan Xie (Univ. of California Santa Barbara)
Asymmetric Resilience: Exploiting Task-level Idempotency for Transient Error Recovery in Accelerator-based Systems

Jingwen Leng (Shanghai Jiao Tong University); Alper Buyuktosunoglu, Ramon Bertran Monfort, and Pradip Bose (IBM Research); Quan Chen and Minyi Guo (Shanghai Jiao Tong University); Vijay Janapa Reddi (University of Texas, Austin/Harvard University)
[10:25 – 10:55] Coffee Break
[10:55 – 12:50] Session 2: Best Paper Nominees + Award Session (Capri/Rivera/Monte Carlo/St. Tropez)
Session Chair: Yan Solihin
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
[Slide]

Eric Qin, Ananda Samajdar, Hyoukjun Kwon, and Vineet Nadella (Georgia Institute of Technology); Sudarshan Srinivasan, Dipankar Das, and Bharat Kaul (Intel); Tushar Krishna (Georgia Institute of Technology)
EMSim: A Microarchitecture-Level Simulation Tool for Modeling Electromagnetic Side-Channel Signals

Nader Sehatbakhsh, Baki Berkay Yilmaz, Alenka Zajic, and Milos Prvulovic (Georgia Tech)
Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching
[Slide]

Elaheh Sadredini, Reza Rahimi, Marzieh Lenjani, Mircea Stan, and Kevin Skadron (University of Virginia)
A Deep Reinforcement Learning Framework for Architectural Exploration: A Routerless NoC Case Study
[Slide]

Ting-Ru Lin (University of Southern California); Drew Penney (Oregon State University); Massoud Pedram (University of Southern California); Lizhong Chen (Oregon State University)
[12:50 – 14:00] Lunch, Location TBD
[14:00 – 15:40] Session 3a: Security and NoC (Capri/Rivera) Session 3b: Cloud (Monte Carlo/St. Tropez)
Session Chair: Ashish Venkat Session Chair: Lizhong Chen
IRONHIDE: A Secure Multicore that Efficiently Mitigates Microarchitecture State Attacks for Interactive Applications

Hamza Omar and Omer Khan (University of Connecticut)
Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services
[Slide]

Rajiv Nishtala (Norwegian University of Science and Technology); Vinicius Petrucci (University of Pittsburgh); Paul Carpenter (Barcelona Supercomputing Center); Magnus Själander (Norwegian University of Science and Technology)
A New Side-Channel Vulnerability on Modern Computers by Exploiting Electromagnetic Emanations from the Power Management Unit

Nader Sehatbakhsh, Baki Berkay Yilmaz, Alenka Zajic, and Milos Prvulovic (Georgia Tech)
QuickNN: Memory and Performance Optimization of k-d Tree Based Nearest Neighbor Search for 3D Point Clouds
[Slide]

Reid Pinkham (University of Michigan, Ann Arbor); Shuqing Zeng (General Motors); Zhengya Zhang (University of Michigan, Ann Arbor)
Leaking Information Through Cache LRU States
[Slide]

Wenjie Xiong and Jakub Szefer (Yale University)
CLITE: Efficient and QoS-Aware Co-location of Multiple Latency-Critical Jobs for Warehouse Scale Computers

Tirthak Patel and Devesh Tiwari (Northeastern University)
Baldur: A Power-Efficient and Scalable Network Using All-Optical Switches

Mohammad Reza Jokar (University of Chicago); Junyi Qiu, John Dallesasse, Milton Feng, and Lynford Goddard (University of Illinois at Urbana–Champaign); Yanjing Li and Fred Chong (University of Chicago)
Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-tolerant Microservices

Amirhossein Mirhosseini and Brendan L. West (University of Michigan); Geoffrey Blake (Amazon Web Services); Thomas F. Wenisch (University of Michigan)
[15:40 – 15:50] Coffee Break
[15:50 – 17:30] Session 4a: Accelerator and DSA (Capri/Rivera) Session 4b: Memory and Memory Hierarchy
(Monte Carlo/St. Tropez)
Session Chair: Adwait Jog Session Chair: Hung-Wei Tseng
PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible NPUs

Yujeong Choi and Minsoo Rhu (KAIST)
Mitigating Voltage Drop in Resistive Memories by Dynamic RESET Voltage Regulation and Partition RESET

Farzaneh Zokaee and Lei Jiang (Indiana University Bloomington)
Domain-Specialized Cache Management for Graph Analytics
[Slide]

Priyank Faldu (The University of Edinburgh); Jeff Diamond (Oracle Labs); Boris Grot (The University of Edinburgh)
DRAM-less: Hardware Acceleration of Data Processing with New Memory

Jie Zhang (KAIST); Gyuyoung Park (KAIST); David Donofrio and John Shalf (Lawrence Berkeley National Laboratory); Myoungsoo Jung (KAIST)
ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
[Slide]

Bahar Asgari and Ramyad Hadidi (Georgia Institute of Technology); Tushar Krishna and Kim Hyesoon (Georgia Tech); Sudhakar Yalamanchili (Georgia Institute of Technology)
ELP2IM: Efficient and Low Power Bitwise Operation Processing in DRAM

Xin Xin, Youtao Zhang, and Jun Yang (University of Pittsburgh)
SpArch: Efficient Architecture for Sparse Matrix Multiplication
[Slide]

Zhekai Zhang, Hanrui Wang, and Song Han (Massachusetts Institute of Technology); William Dally (NVIDIA/Stanford)
ResiRCA: A Resilient Energy Harvesting ReRAM-based Accelerator for Intelligent Embedded Processors

Keni Qiu (Capital Normal University); Nicholas Jao (The Pennsylvania State University); Mengying Zhao (Shandong University); Cyan Subhra Mishra, Gulsum Gudukbay, Sethu Jose, Jack Sampson, Mahmut Taylan Kandemir, and Vijaykrishnan Narayanan (The Pennsylvania State University)
[18:00 – 19:00] IEEE TCCA Business Meeting (Capri/Rivera)

Day 2: Tuesday Feb. 25, 2020

[07:00 – 08:30] Breakfast
[08:30 – 09:30] Keynote: Scaling Parallel Programming Beyond Threads
Michael Garland (NVIDIA Research)
[09:35 – 10:25] Session 5a: Machine Learning Acceleration
(Capri/Rivera)
Session 5b: Fault Tolerance and Security
(Monte Carlo/St. Tropez)
Session Chair: Newsha Ardalani Session Chair: Jakub Szefer
A3: Accelerating Attention Mechanisms in Neural Networks with Approximation
[Slide]

Tae Jun Ham, Sung Jun Jung, Seonghak Kim, and Yeonhong Park (Seoul National University); Young H. Oh (Sungkyunkwan University); Yoon Ho Song, Junghoon Park, and Sanghee Lee (Seoul National University); Kyoung Park (SK Hynix); Jae W. Lee and Deog-Kyoon Jeong (Seoul National University)
FLOWER and FaME: A Low Overhead Bit-level Fault-map and Fault-tolerance Approach for Deeply Scaled Memories

Donald Kline, Jr, Rami Melhem, and Alex K. Jones (University of Pittsburgh)
AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerator Arrays
[Slide]

Linghao Song and Fan Chen (Duke University); Youwei Zhuo and Xuehai Qian (University of Southern California); Hai Li and Yiran Chen (Duke University)
Multi-range Supported Oblivious RAM for Efficient Block Data Retrieval

Yuezhi Che and Rujia Wang (Illinois Institute of Technology)
[10:25 – 10:55] Coffee Break
[10:55 – 12:35] Session 6a: Microarchitecture
(Capri/Rivera)
Session 6b: NoC
(Monte Carlo/St. Tropez)
Session Chair: Hiroshi Nakamura Session Chair: John Kim
CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows

Ipoom Jeong and Seihoon Park (Yonsei University); Changmin Lee (Samsung Electronics); Won Woo Ro (Yonsei University)
EquiNox: Equivalent NoC Injection Routers for Silicon Interposer-based Throughput Processors

Yunfan Li and Lizhong Chen (Oregon State University)
Precise Runahead Execution
[Slide]

Ajeya Naithani (Ghent University); Josue Feliu (Universitat Politècnica de València); Almutaz Adileh and Lieven Eeckhout (Ghent University)
DRAIN: Deadlock Removal for Arbitrary Irregular Networks
[Slide]

Mayank Parasar (Georgia Institute of Technology); Hossein Farrokhbakht and Natalie Enright Jerger (University of Toronto); Paul Gratz (Texas A&M University); Tushar Krishna (Georgia Tech); Joshua San Miguel (University of Wisconsin-Madison)
BBS: Micro-architecture Benchmarking Blockchain Systems through Machine Learning and Fuzzy Set
[Slide]

Liang Zhu, Chao Chen, Zihao Su, and Weiguang Chen (Shenzhen Institutes of Advanced Technology, Chinese Academy of Science); Tao Li (University of Florida); Zhibin Yu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Science)
SnackNoC: Processing in the Communication Layer
[Slide]

Karthik Sangaiah, Michael Lui, Ragh Kuttappa, Baris Taskin (Drexel University), Mark Hempstead (Tufts University)
Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
[Slide]

Mehdi Alipour (Uppsala University); Rakesh Kumar (Norwegian University of Science and Technology (NTNU)); Stefanos Kaxiras and David Black-Schaffer (Uppsala University)”
PIXEL: Photonic Neural Network Accelerator

Kyle Shiflett, Dylan Wright, and Avinash Karanth (Ohio University); Ahmed Louri (George Washington University)
[12:35 – 14:00] Lunch, Location TBD
[14:00 – 15:15] Session 7a: Industry Session 1
(Capri/Rivera)
Session 7b: Accelerators and DSA 2
(Monte Carlo/St. Tropez)
Session Chair: Alaa Alameldeen Session Chair: Abdullah Muzahid
The Architectural Implications of Facebook’s DNN-based Personalized Recommendation

Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang (Facebook Inc.)
Communication Lower Bound in Convolution Accelerators

Xiaoming Chen and Yinhe Han (Institute of Computing Technology, Chinese Academy of Sciences); Yu Wang (Tsinghua University)
NVDIMM-C: A Byte-Addressable Non-Volatile Memory Module for Compatibility with Standard DDR Interfaces

Changmin Lee (Samsung Electronics, Wonjae Shin (Samsung Electronics), Dae Jeong Kim (Samsung Electronics), Yongjun Yu (Samsung Electronics), Sung-Joon Kim (Samsung Electronics), Taekyeong Ko (Samsung Electronics), Deokho Seo (Samsung Electronics), Kwanghee Lee (Samsung Electronics), Seongho Choi (Samsung Electronics), Namhyung Kim (Samsung Electronics), Vishak G (Samsung Electronics), Arun George (Samsung Electronics), Vishwas (Samsung Electronics), Donghun Lee (SAP Labs Korea), Kangwoo Choi (SAP Labs Korea), Changbin Song (SAP Labs Korea), Dohan Kim (Samsung Electronics), Insu Choi (Samsung Electronics), Ilgyu Jung (Samsung Electronics), Yong Ho Song (Samsung Electronics), Jinman Han (Samsung Electronics), Jongmin Park (Samsung Electronics)
Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design
[Slide]

Xingyao Zhang (University of Houston); Shuaiwen Leon Song (University of Sydney); Chenhao Xie (Pacific Northwest National Laboratory); Jing Wang and Weigong Zhang (Capital Normal University, Beijing, China); Xin Fu (University of Houston)
Missing the Forest for the Trees: End-to-End AI Application Performance in Edge Data
[Slide]

Daniel Richins (University of Texas, Austin), Dharmisha Doshi (Intel), Ankit Patel (Intel), Matthew Blackmore (Intel), Aswathy Thulaseedharan Nair (Intel), Neha Pathapati (Intel), Brainard Daguman (Intel), Daniel Dobrijalowski (Intel), Ramesh Illikkal (Intel), Kevin Long (Intel), David Zimmerman (Intel), Vijay Janapa Reddi (University of Texas, Austin/Harvard University)
Fulcrum: a Simplified Control and Access Mechanism toward Flexible and Practical in-situ Accelerators
[Slide]

Marzieh Lenjani, Patricia Gonzalez, and Elaheh Sadredini (University of Virginia); Shuangchen Li and Yuan Xie (University of California, Santa Barbara); Ameen Akel and Sean Eilert (Micron); Mircea R. Stan and Kevin Skadron (University of Virginia)
[15:15 – 15:45] Coffee Break
[15:45 – 17:00] Session 8a: Back to the Future Vision Talks (Capri/Rivera) Session 8b: Best of CAL
(Monte Carlo/St. Tropez)
Session Chair: TBD Session Chair: Dan Sorin
ML Computation: a Reality Check and the Road Ahead

Hsien-Hsin Sean Lee (Facebook)
Orbital Edge Computing: Machine Inference in Space

Bradley Denby and Brandon Lucia (Carnegie Mellon University)
Persistent Memory and the Path to Being Comfortably NUMB

Steven Swanson (UCSD)
A Scalable and Efficient in-Memory Interconnect Architecture for Automata Processing

Elaheh Sadredini, Reza Rahimi, Vaibhav Verma, Mircea Stan, and Kevin Skadron (University of Virginia)
Exploring Shared-Memory Multi-GPU Computing

David Kaeli (Northeastern University)
Isolating Speculative Data to Prevent Transient Execution Attacks

Kristin Barber, Anys Bacha, Li Zhou, Yinqian Zhang, and Radu Teodorescu (Ohio State University except Bacha who is with University of Michigan)
[17:30 – 20:30] Excursion to SeaWorld, buses start leaving at 17:30pm in front of hotel

Day 3: Feb. 26, 2020

[07:00 – 08:15] Breakfast
[08:15 – 09:30] Keynote: MLIR Compiler Infrastructure
Chris Lattner (SiFive) and Tatiana Shpeisman (Google)
[09:35 – 10:50] Session 9a: GPUs (Capri/Rivera) Session 9b: Industry Session 2
(Monte Carlo/St. Tropez)
Session Chair: Ashutosh Pattnaik Session Chair: Kingsum Chow
BCoal: Bucketing-based Memory Coalescing for Efficient and Secure GPUs

Gurunath Kadam (College of William & Mary); Danfeng Zhang (Penn State University); Adwait Jog (College of William & Mary)
EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform

Jianbo Dong (Alibaba Group), Zheng Cao (Alibaba Group), Tao Zhang (Alibaba Group), Jianxi Ye (Alibaba Group), Shaochuang Wang (Alibaba Group), Fei Feng (Alibaba Group), Li Zhao (Alibaba Group), Xiaoyong Liu (Alibaba Group), Liuyihan Song (Alibaba Group), Liwei Peng (Alibaba Group), Yiqun Guo (Alibaba Group), Xiaowei Jiang (Alibaba Group), Lingbo Tang (Alibaba Group), Yin Du (Alibaba Group), Yingya Zhang (Alibaba Group),  Pan Pan (Alibaba Group), Yuan Xie (Alibaba Group)
HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems
[Slide]

Xiaowei Ren (UBC/NVIDIA); Daniel Lustig, Evgeny Bolotin, Aamer Jaleel, Oreste Villa, and David Nellans (NVIDIA)
Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices

Jawad Haj-Yahya (ETH Zurich), Yanos Sazeides (University of Cyprus), Mohammed Alser (ETH Zurich), Efraim Rotem (Intel), Onur Mutlu (ETH Zurich, Carnegie Mellon University)
Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems
[Slide]

Trinayan Baruah and Yifan Sun (Northeastern University); Ali Tolga Dinçer (Istanbul Technical University); Saiful A. Mojumder (Boston University); José Luis Abellán (Universidad Católica San Antonio de Murcia); Yash Ukidave (Millennium USA); Ajay Joshi (Boston University); norman rubin (Northeastern University); John Kim (KAIST); David Kaeli (Northeastern University)
Experiences with ML-Driven Design: A NoC Case Study

Jieming Yin (Advanced Micro Devices), Subhash Sethumurugan (Advanced Micro Devices, University of Minnesota, Twin Cities), Yasuko Eckert (Advanced Micro Devices), Alan Smith (Advanced Micro Devices), Chintan Patel (Advanced Micro Devices), Eric Morton (Advanced Micro Devices), Mark Oskin (University of Washington, Advanced Micro Devices), Natalie Enright Jerger (University of Toronto), Gabriel H. Loh (Advanced Micro Devices)
[10:50 – 11:20] Coffee Break
[11:20 – 12:35] Session 10a: Memory and Memory Hierarchy and Cloud (Capri/Rivera) Session 10b: Accelerators and DSA 3
(Monte Carlo/St. Tropez)
Session Chair: Jung Ho Ahn Session Chair: David Kaeli
Hybrid2: Combining Caching and Migration in Hybrid Memory Systems

Evangelos Vasilakis (Chalmers University of technology); Vassilis Papaefstathiou (FORTH / University of Crete); Pedro Trancoso and Ioannis Sourdis (Chalmers University of Technology)
Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations
[Slide]

Nitish Kumar Srivastava and Hanchen Jin (Cornell University); Shaden Smith (Intel Parallel Computing Labs); Hongbo Rong (Intel Labs); David Albonesi (Cornell University); Zhiru Zhang (Cornell Univeristy)
Charge-Aware DRAM Refresh Reduction with Value Transformation
[Slide]

Seikwon Kim (Samsung Electronics); Wonsang Kwak (RTST); Changdae Kim (ETRI); Daehyeon Baek and Jaehyuk Huh (KAIST)
A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms

Jian Weng, Sihao Liu, Zhengrong Wang, Vidushi Dadu, and Tony Nowatzki (UCLA)
DWT: Decoupled Workload Tracing for Data Centers

Jian Chen, Ying Zhang, Xiaowei Jiang, and Li Zhao (Alibaba Groups)
Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions

Adrián Barredo, Juan M. Cebrian, Miquel Moreto, and Marc Casas (Barcelona Supercomputing Center); Mateo Valero (Director)