Saturday, March 2 – Sunday, March 3: Workshops & Tutorials

      Program for Workshops and Tutorials is available here. Please visit the respective webpage of a workshop/tutorial for its detailed schedule.


Opening Reception: March 4, 8:15 AM – 8:30 AM

Location: Pentland/Sidlaw/Fintry

Day 1: Monday, March 4

8:30 AM – 9:30 AM: HPCA Keynote by Derek Chiou (UT-Austin/Microsoft)

Abstract
Server design has traditionally been processor-centric. Processors received each input and decided whether to process it first or pass it to another component, such as an accelerator or memory, to be processed and/or stored. In public clouds that rent virtual machines to tenants, however, the center of the server is moving from processors to SmartNICs/IPUs/DPUs that implement cloud infrastructure functionality such as triage of IO, virtualization, security, and Quality of Service. SmartNICs are complex systems, requiring programmable components for flexibility, ASICs for performance and efficiency, and software to coordinate and manage. This talk (i) motivates moving the center of cloud servers to SmartNICs, (ii) describes what SmartNICs do and how they do it, (iii) discusses the tradeoffs of implementing programmability on cores and FPGAs, and (iv) explores potential future paths for SmartNICs and the functionality they implement.

Speaker
Derek Chiou is a Professor in the Electrical and Computer Engineering Department at The University of Texas at Austin and a Partner Architect at Microsoft responsible for future infrastructure offload system architecture. He is a co-founder of the Azure Boost project, Microsoft’s SmartNIC effort, and lead the Bing FPGA team to first deployment of Bing ranking on FPGAs. He was an assistant and associate professor from 2005 to 2016. Before joining UT in 2005, Dr. Chiou was a system architect at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

9:30 AM – 10:00 AM: Coffee Break

10:00 AM – 11:00 AM

Session Chair: TBA
10:00 AM – 10:20 AM
Exploitation of Security Vulnerability on Retirement
Ke Xu, Ming Tang, Quancheng Wang, Han Wang

10:20 AM – 10:40 AM
GadgetSpinner: A New Transient Execution Primitive using the Loop Stream Detector
Yun Chen, Ali Hajiabadi, Trevor E. Carlson

10:40 AM – 11:00 AM
Uncovering and Exploiting AMD Speculative Memory Access Predictors for Fun and Profit
Chang Liu, Dongsheng Wang, Yongqiang Lyu, Pengfei Qiu, Yu Jin, Zhuoyuan Lu, Yinqian Zhang, Gang Qu
Session Chair: TBA
10:00 AM – 10:20 AM
E2EMap: End-to-End Reinforcement Learning for CGRA Compilation via Reverse Mapping
Dajiang Liu, Yuxin Xia, Jiaxing Shang, Jiang Zhong, Peng Ouyang, Shouyi Yin

10:20 AM – 10:40 AM
Revet: A Language and Compiler for Dataflow Threads
Alexander Rucker, Shiv Sundram, Coleman Smith, Matt Vilim, Raghu Prabhakar, Fredrik Kjolstad, Kunle Olukotun

10:40 AM – 11:00 AM
An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
Weichuang Zhang, Jieru Zhao, Guan Shen, Quan Chen, Chen Chen, Minyi Guo
Session Chair: Jie Zhang (Peking University)
10:00 AM – 10:20 AM
Celeritas: Out-of-Core based Unsupervised Graph Neural Network via Cross-layer Computing
Yi Li, Tsun-Yu Yang, Ming-Chang Yang, Zhaoyan Shen, Bingzhe Li

10:20 AM – 10:40 AM
PruneGNN: An Optimized Algorithm-Hardware Framework for Graph Neural Network Pruning
Deniz Gurevin, Shaoyi Huang, Mohsin Shan, MD Amit Hasan, Caiwen Ding, Omer Khan

10:40 AM – 11:00 AM
MEGA: A Memory-Efficient GNN Accelerator Exploiting Degree-Aware Mixed-Precision Quantization
Zeyu Zhu, Fanrong Li, Gang Li, Zejian Liu, Zitao Mo, Qinghao Hu, Xiaoyao Liang, Jian Cheng

11:00 AM – 11:30 AM: Coffee Break

11:30 AM – 12:50 PM

Session Chair: Jian Li (Futurewei Technologies)
11:30 AM – 11:50 AM
Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
Jeongmin Hong, Sungjun Cho, Geonwoo Park, Wonhyuk Yang, Young-Ho Gong, Gwangsun Kim

11:50 AM – 12:10 PM
Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
Jingwei Cai, Zuotong Wu, Sen Peng, Yuchen Wei, Zhanhong Tan, Guiming Shi, Mingyu Gao, Kaisheng Ma

12:10 PM – 12:30 PM
STELLAR: Energy-Efficient and Low-Latency SNN Algorithm and Hardware Co-design with Spatiotemporal Computation
Ruixin Mao, Lin Tang, Xingyu Yuan, Ye Liu, Jun Zhou

12:30 PM – 12:50 PM
MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing
Geraldo Francisco De Oliveira Junior, Ataberk Olgun, Giray Yaglikci, Nisa Bostanci, Juan Gómez Luna, Saugata Ghose, Onur Mutlu
Session Chair: Sam Ainsworth (University of Edinburgh)
11:30 AM – 11:50 AM
Supporting Secure Multi-GPU Computing with Dynamic and Batched Metadata Management
Seonjin Na, Jungwoo Kim, Sunho Lee, Jaehyuk Huh

11:50 AM – 12:10 PM
Data Enclave: A Data-Centric Trusted Execution Environment
Yuanchao Xu, James Pangia, Chencheng Ye, Yan Solihin, Xipeng Shen

12:10 PM – 12:30 PM
Salus: Efficient Security Support for CXL-Expanded GPU Memory
Rahaf Adbullah, Hyokeun Lee, Huiyang Zhou, Amro Awad

12:30 PM – 12:50 PM
Morphling: A Throughput-Maximized TFHE-based Accelerator using Transform-domain Reuse
Prasetiyo, Adiwena Putra, Joo-Young Kim
Session Chair: Sukhan Lee (Samsung Electronics)
11:30 AM – 11:50 AM
Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
Bongjoon Hyun,Taehun Kim, Dongjae Lee, Minsoo Rhu

11:50 AM – 12:10 PM
Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis.
Ismail Emir Yuksel, Yahya Can Tuğrul, Ataberk Olgun, Nisa Bostanci, Giray Yaglikci, Geraldo Francisco de Oliveira Junior, Haocong Luo, Juan Gómez Luna, Mohammad Sadrosadati, Onur Mutlu

12:10 PM – 12:30 PM
StreamPIM: Streaming Matrix Computation in Racetrack Memory
Yuda An, Yunxiao Tang, Shushu Yi, Li Peng, Xiurui Pan, Guangyu Sun, Zhaochu Luo, Qiao Li, Jie Zhang

12:30 PM – 12:50 PM
SmartDIMM: In-Memory Acceleration of Upper Layer I/O Protocols
Neel Patel, Amin Mamandipoor, Mohammad Nouri, Mohammad Alian

12:50 PM – 2:20 PM: Lunch

Location: Cromdale Hall

2:20 PM – 3:40 PM

Session Chair: Hung-Wei Tseng (University of California, Riverside)
2:20 PM – 2:40 PM
BeaconGNN: Large-Scale GNN Acceleration with Asynchronous In-Storage Computing
Yuyue Wang, Xiurui Pan, Yuda An, Jie Zhang, Glenn Reinman

2:40 PM – 3:00 PM
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
Hongsun Jang, Jaeyong Song, Jaewon Jung, Jaeyoung Park, Youngsok Kim, Jinho Lee

3:00 PM – 3:20 PM
FlashGNN: An In-SSD Accelerator for GNN Training
Fuping Niu, Jianhui Yue, Jiangqiu Shen, XIAOFEI Liao, Hai Jin

3:20 PM – 3:40 PM
DockerSSD: Containerized In-Storage Processing and Hardware Acceleration for Computational SSDs
Donghyun Gouk, Miryeong Kwon, Hanyeoreum Bae, Myoungsoo Jung
Session Chair: Andrew Hilton (Duke University)
2:20 PM – 2:40 PM
PrefetchX: Cross-Core Cache-Agnostic Prefetcher-Based Side-Channel Attacks
Yun Chen, Ali Hajiabadi, Lingfeng Pei, Trevor E. Carlson

2:40 PM – 3:00 PM
Modeling, Derivation, and Automated Analysis of Branch Predictor Security Vulnerabilities
Quancheng Wang, Ming Tang, Ke Xu, Han Wang

3:00 PM – 3:20 PM
SegScope: Probing Fine-grained Interrupts via Architectural Footprints
Xin Zhang, Zhi Zhang, Qingni Shen, Wenhao Wang, Yansong Gao, Zhuoxi Yang, Zhang Jiliang

3:20 PM – 3:40 PM
Differential-Matching Prefetcher for Indirect Memory Access
Gelin Fu, Tian Xia, Zhongpei Luo, Ruiyang Chen, Wenzhe Zhao, Pengju Ren
Session Chair: Jackson Woodruff (University of Edinburgh)
2:20 PM – 2:40 PM
SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving
Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee, Junwon Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi

2:40 PM – 3:00 PM
Rapper: A Parameter-Aware Repair-in-Memory Accelerator for Blockchain Storage Platform
Chenlin Ma, Yingping Wang, Fuwen Chen, Jing Liao, Yi Wang, Rui Mao

3:00 PM – 3:20 PM
MOPED: Efficient Motion Planning Engine with Flexible Dimension Support
Lingyi Huang, Yu Gong, Yang Sui, Xiao Zang, Bo Yuan

3:20 PM – 3:40 PM
TALCO: Tiling Genome Sequence Alignment using Convergence of Traceback Pointers
Sumit Walia, Cheng Ye, Arkid Kalyan Bera, Dhruvi Prakash Lodhavia, Yatish Turakhia

3:40 PM – 4:10 PM: Coffee Break

4:10 PM – 5:40 PM: Poster Session

5:45 PM – 6:30 PM: Awards Session

Location: Pentland

6:30 PM – 8:00 PM: Business Meeting

Location: Carrick 1-3


Day 2: Tuesday, March 5

8:30 AM – 9:30 AM: CGO Keynote by Kunle Olukotun (Stanford University)

Abstract
Generative AI applications with their ability to produce natural language, computer code and images are transforming all aspects of society. These applications are powered by huge foundation models such as GTP-4 which are trained on massive unlabeled datasets. Foundation models have 10s of billions of parameters and have obtained state-of-the-art quality in natural language processing, vision and speech applications. These models are computationally challenging because they require 100s of petaFLOPS of computing capacity for training and inference. Future foundation models will have even greater capabilities provided by more complex model architectures with longer sequence lengths, irregular data access (sparsity) and irregular control flow. In this talk I will describe how the evolving characteristics of foundation models will impact the design of the optimized computing systems required for training and serving these models. A key element of improving the performance and lowering the cost of deploying future foundation models will be optimizing the data movement (Dataflow) within the model using specialized hardware. In contrast to human-in-the-loop applications such as conversational AI, an emerging application of foundation models is in continuous processing applications that operate without human supervision. I will describe how continuous processing and real-time machine learning can be used to create an intelligent network data plane.

Speaker
Kunle Olukotun is a Professor of Electrical Engineering and Computer Science at Stanford University and he has been on the faculty since 1991. Olukotun is well known for leading the Stanford Hydra research project which developed one of the first chip multiprocessors with support for thread-level speculation (TLS). Olukotun founded Afara Websystems to develop high-throughput, low power server systems with chip multiprocessor technology. Afara was acquired by Sun Microsystems; the Afara microprocessor technology, called Niagara, is at the center of Sun’s throughput computing initiative. Niagara based systems have become one of Sun’s fastest ramping products ever. Olukotun is actively involved in research in computer architecture, parallel programming environments and scalable parallel systems. Olukotun currently co-leads the Transactional Coherence and Consistency project whose goal is to make parallel programming accessible to average programmers. Olukotun also directs the Stanford Pervasive Parallelism Lab (PPL) which seeks to proliferate the use of parallelism in all application areas. Olukotun is an ACM Fellow (2006) for contributions to multiprocessors on a chip and multi threaded processor design. He has authored many papers on CMP design and parallel software and recently completed a book on CMP architecture. Olukotun received his Ph.D. in Computer Engineering from The University of Michigan.

9:30 AM – 10:00 AM: Coffee Break

10:00 AM – 11:00 AM

Session Chair: Gabriel Loh (AMD Research)
10:00 AM – 10:20 AM
Effective Context-Sensitive Memory Dependence Prediction
Sebastian S. Kim, Alberto Ros

10:20 AM – 10:40 AM
A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering
Alexandre Valentin Jamet, Georgios Vavouliotis, Daniel A. Jiménez, Lluc Alvarez, Marc Casas

10:40 AM – 11:00 AM
gem5-MARVEL: Microarchitecture-Level Resilience Analysis of Heterogeneous SoC Architectures
Odysseas Chatzopoulos, George Papadimitriou, Vasileios Karakostas, Dimitris Gizopoulos
Session Chair: Mattan Erez (UT Austin)
10:00 AM – 10:20 AM
Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions
Abdullah Giray Yaglikci, Geraldo Francisco de Oliveira Junior, Yahya Can Tugrul, Ismail Yuksel, Ataberk Olgun, Haocong Luo, Onur Mutlu

10:20 AM – 10:40 AM
START: Scalable Tracking for Any Rowhammer Threshold
Anish Saxena, Moinuddin Qureshi

10:40 AM – 11:00 AM
CoMeT: Count-Min Sketch-based Row Tracking to Mitigate RowHammer with Low Cost
Nisa Bostanci, Ismail Emir Yuksel, Ataberk Olgun, Konstantinos Kanellopoulos, Yahya Can Tuğrul, Giray Yaglikci, Mohammad Sadrosadati, Onur Mutlu
Session Chair: John Kim (KAIST)
10:00 AM – 10:20 AM
A Quantum Computer Trusted Execution Environment
Theodoros Trochatos, Chuanqi Xu, Sanjay Deshpande, Yao Lu, Yongshan Ding, Jakub Szefer

10:20 AM – 10:40 AM
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models
Jaewan Choi, Jaehyun Park, Kwanhee Kyung, Nam Sung Kim, Jung Ho Ahn

10:40 AM – 11:00 AM
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications
Joonseop Sim, Soohong Ahn, Taeyoung Ahn, Seungyong Lee, Myunghyun Rhee, Jooyoung Kim, Kwangsik Shin, Donguk Moon, Euiseok Kim, Kyoung Park

11:00 AM – 11:30 AM: Coffee Break

11:30 AM – 12:50 PM

Session Chair: Saugata Ghose (University of Illinois Urbana-Champaign)
11:30 AM – 11:50 AM
LearnedFTL: A Learning-based Page-level FTL for Reducing Double Reads in Flash-based SSDs
Shengzhe Wang, Zihang Lin, Suzhen Wu, Hong Jiang, Jie Zhang, Bo Mao

11:50 AM – 12:10 PM
Are Superpages Super-fast? Distilling Flash Blocks to Unify Flash Pages of a Superpage in an SSD
Shih-Hung Tseng, Tseng-Yi Chen, Ming-Chang Yang

12:10 PM – 12:30 PM
RiF: Improving Read Performance of Modern SSDs Using an On-Die Early-Retry Engine
Myoungjun Chun, Jaeyong Lee, Myungsuk Kim, Jisung Park, Jihong Kim

12:30 PM – 12:50 PM
Midas Touch: Invalid-Data Assisted Reliability and Performance Boost for 3D High-Density Flash
Qiao Li, Hongyang Dang, Zheng Wan, Congming Gao, Min Ye, Jie Zhang, Tei-Wei Kuo, Chun Jason Xue
Session Chair: Huiyang Zhou (NC State University)
11:30 AM – 11:50 AM
ECO-CHIP: Estimation of the Carbon Footprint of Chiplet-based Architectures for Sustainable VLSI
Chetan Choppali Sudarshan, Nikhil Matkar, Sarma Vrudhula, Sachin S. Sapatnekar, Vidya A. Chhabria

11:50 AM – 12:10 PM
Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accelerator
Hanqing Zhu, Jiaqi Gu, Hanrui Wang, Zixuan Jiang, Zhekai Zhang, Rongxin Tang, Chenghao Feng, Song Han, Ray T. Chen, David Pan

12:10 PM – 12:30 PM
MIRAGE: Quantum Circuit Decomposition and Routing Collaborative Design using Mirror Gates
Evan McKinney, Michael Hatridge, Alex K. Jones

12:30 PM – 12:50 PM
SACHI: A Stationarity-Aware, All-Digital, Near-Cache, Ising Architecture
Siddhartha Raman Sundara Raman, Lizy John, Jaydeep Kulkarni
Session Chair: David Kaeli (Northeastern University)
11:30 AM – 11:50 AM
BitWave: Exploiting Column-Based Bit-Level Sparsity for Deep Learning Acceleration.
Man Shi, Vikram Jain, Antony Joseph, Maurice Meijer, Marian Verhelst

11:50 PM – 12:10 PM
LUTein: Dense-Sparse Bit-slice Architecture with Radix-4 LUT-based Slice-Tensor Processing Units
Dongseok Im, Hoi-Jun Yoo

12:10 PM – 12:30 PM
FIGNA: Integer Unit-based Accelerator Design for FP-INT GEMM Preserving Numerical Accuracy
Jaeyong Jang, Yulhwa Kim, Juheun Lee, Jaejoon Kim

12:30 PM – 12:50 PM
ASADI: Accelerating Sparse Attention using Diagonal-based In-situ Computing
Huize Li, Zhaoying Li, Zhenyu Bai, Tulika Mitra

12:50 PM – 2:20 PM: Lunch

Location: Cromdale Hall

2:20 PM – 3:40 PM

Session Chair: Krishnan Kailas (IBM Research)
2:20 PM – 2:40 PM
Enabling Large Dynamic Neural Network Training with Learning-based Memory Management
Jie Ren, Dong Xu, Shuangyan Yang, Jiacheng Zhao, Zhicheng Li, Christian Navasca, Chenxi Wang, Harry Xu, Dong Li

2:40 PM – 3:00 PM
Tetris: Boosting Distributed DNN Execution with Flexible Schedule Search
Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

3:00 PM – 3:20 PM
SpecFL: An Efficient Speculative Federated Learning System for Tree-based Model Training
Yuhui Zhang, Lutan Zhao, Cheng Che, XiaoFeng Wang, Dan Meng, Rui Hou

3:20 PM – 3:40 PM
Enhancing Collective Communication in MCM Accelerator for Deep Learning Training
Sabuj Laskar, Pranati Majhi, Sungkeun Kim, Farabi Mahmud, Abdullah Muzahid, EJ Kim
Session Chair: José Cano (University of Glasgow)
2:20 PM – 2:40 PM
TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers
Yu-Yuan Liu, Hong-Sheng Zheng, Yu-Fang Hu, Chen-Fong Hsu, Tsung Tai Yeh

2:40 PM – 3:00 PM
CAMEL: Co-Designing AI Models and eDRAMs for Efficient On-Device Learning
Sai Qian Zhang, Thierry Tambe, Nestor Cuevas, Gu-Yeon Wei, David Brooks

3:00 PM – 3:20 PM
FlipBit: Approximate Flash Memory for IoT Devices
Alex Buck, Karthik Ganesan, Natalie Enright Jerger

3:20 PM – 3:40 PM
Uṣás: A Sustainable Continuous-Learning Framework for Edge Servers
Cyan Subhra Mishra, Jack Sampson, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan, Chita Das
Session Chair: Mohammad Alian (University of Kansas)
2:20 PM – 2:40 PM
Cepheus: Accelerating Datacenter Applications with High-Performance RoCE-Capable Multicast
Wenxue Li, Junyi Zhang, Yufei Liu, Gaoxiong Zeng, Zilong Wang, Chaoliang Zeng, Pengpeng Zhou, Qiaoling Wang, Kai Chen

2:40 PM – 3:00 PM
LibPreemptible: Enabling Fast, Adaptive, and Hardware-Assisted User-Space Scheduling
Yueying Li, Nikita Lazarev, David Koufaty, Yijun Yin, Andy Anderson, Zhiru Zhang, G. Edward Suh, Kostis Kaffes, Christina Delimitrou, David Koufaty

3:00 PM – 3:20 PM
MINOS: Distributed Consistency and Persistency Implementation and Offloading to SmartNICs
Antonis Psistakis, Fabien Chaix, Josep Torrellas

3:20 PM – 3:40 PM
Ursa: Lightweight Resource Management for Cloud-Native Microservices
Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou

3:40 PM – 4:10 PM: Coffee Break

4:10 PM – 5:30 PM: Panel

Moderator
Lieven Eeckhout (Ghent University)

Panelists
TBA

6:30 PM – 11:00 PM: Banquet

Location: National Museum of Scotland


Day 3: Wednesday, March 6

8:30 AM – 9:30 AM: PPoPP Keynote by Nir Shavit (MIT)

Abstract
Our brain executes very sparse computation, allowing for great speed and energy savings. Deep neural networks can also be made to exhibit high levels of sparsity without significant accuracy loss. As their size grows, it is becoming imperative that we use sparsity to improve their efficiency. This is a challenging task because the memory systems and SIMD operations that dominate todays CPUs and GPUs do not lend themselves easily to the irregular data patterns sparsity introduces. This talk will survey the role of sparsity in neural network computation, and the parallel algorithms and hardware features that nevertheless allow us to make effective use of it.

Speaker
Nir Shavit received B.Sc. and M.Sc. degrees in Computer Science from the Technion - Israel Institute of Technology in 1984 and 1986, and a Ph.D. in Computer Science from the Hebrew University of Jerusalem in 1990. Shavit is a co-author of the book The Art of Multiprocessor Programming. He is a recipient of the 2004 Gödel Prize in theoretical computer science for his work on applying tools from algebraic topology to model shared memory computability and of the 2012 Dijkstra Prize in Distributed Computing for the introduction of Software Transactional Memory. For many years his main interests were techniques for designing, implementing, and reasoning about multiprocessor algorithms. These days he is interested in understanding the relationship between deep learning and how neural tissue computes and is part of an effort to do so by extracting connectivity maps of brain, a field called connectomics. Nir is the principal investigator of the Multiprocessor Algorithmics Group and the Computational Connectomics Group.

9:30 AM – 10:00 AM: Coffee Break

10:00 AM – 11:00 AM

Session Chair: Josep Torrellas (University of Illinois Urbana-Champaign)
10:00 AM – 10:20 AM
An LPDDR-based CXL-PNM Platform for TCO-Efficient GPT Inference
Sang-Soo Park, KyungSoo Kim, Jinin So, Jin Jung, Jonggeon Lee, Kyoungwan Woo, Nayeon Kim, Younghyun Lee, Hyungyo Kim, Yongsuk Kwon, Jinhyun Kim, Jieun Lee, YeonGon Cho, Yongmin Tai, Jeonghyeon Cho, Hoyoung Song, Jung Ho Ahn, Nam Sung Kim

10:20 AM – 10:40 AM
LightPool: A NVMe-oF-based High-performance and Lightweight Storage Pool Architecture for Cloud-Native Distributed Database
Jiexiong Xu, Yiquan Chen, Yijing Wang, Wenhui Shi, Guoju Fang, Yi Chen, Huasheng Liao, Yang Wang, Hai Lin, Zhen Jin, Qiang Liu, Wenzhi Chen

10:40 AM – 11:00 AM
Enterprise-Class Cache Compression Design
Alper Buyuktosunoglu, David Trilla, Bulent Abali, Deanna Berger, Craig Walters, Jang-Soo Lee

11:00 AM – 11:30 AM: Coffee Break

11:30 AM – 12:50 PM

Session Chair: Jinho Lee (Seoul National University)
11:30 AM – 11:50 AM
HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures
Gerasimos Gerogiannis, Sriram Aananthakrishnan, Josep Torrellas, Ibrahim Hur

11:50 AM – 12:10 PM
SPARK: Scalable and Precision-Aware Acceleration of Neural Networks via Efficient Encoding
Fangxin Liu, Ning Yang, Haomin Li, Zongwu Wang, Zhuoran Song, Songwen Pei, Li Jiang

12:10 PM – 12:30 PM
Data Motion Acceleration to Chain Cross-Domain Multi Accelerators
Shu-Ting Wang, Hanyang Xu, Amin Mamandipoor, Rohan Mahapatra, Byung Hoon Ahn, Soroush Ghodrati, Krishnan Kailas, Mohammad Alian, Hadi Esmaeilzadeh

12:30 PM – 12:50 PM
RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling
Sudhanshu Gupta, Sandhya Dwarkadas
Session Chair: Gwangsun Kim (POSTECH)
11:30 AM – 11:50 AM
GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement
Yueqi Wang, Bingyao Li, Aamer Jaleel, Jun Yang, Xulong Tang

11:50 AM – 12:10 PM
WASP: Exploiting Pipeline Parallelism with GPU Hardware and Compiler Support
Neal Crago, Sana Damani, Karu Sankaralingam, Stephen W. Keckler

12:10 PM – 12:30 PM
Guser: A GPGPU Power Stressmark Generator
Yalong Shan, Yongkui Yang,Xuehai Qian, Zhibin Yu

12:30 PM – 12:50 PM
GPU Scale-Model Simulation
Hossein SeyyedAghaei, Mahmood Naderan-Tahan, Lieven Eeckhout
Session Chair: Chang Hyun Park (Uppsala University)
11:30 AM – 11:50 AM
Agile-DRAM: Agile Trade-Offs in Memory Capacity, Latency, and Energy for Data Centers
Jaeyoon Lee, Wonyeong Jung, Dongwhee Kim, Daero Kim, Junseung Lee, Jungrae Kim

11:50 AM – 12:10 PM
CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning
Xiaoyang Lu, Hamed Najafi, Jason Liu, Xian-He Sun

12:10 PM – 12:30 PM
Prosper: Program Stack Persistence in Hybrid Memory Systems
Arun KP, Debadatta Mishra, Biswabandan Panda

12:30 PM – 12:50 PM
Mitigating Write Disturbance in Non-Volatile Memory via Coupling Machine Learning with Out-of-Place Updates
Ronglong Wu, Zhirong Shen, Zhiwei Yang, Jiwu Shu