Program for Workshops and Tutorials is available here. Please visit the respective webpage of a workshop/tutorial for its detailed schedule.
Server design has traditionally been processor-centric. Processors received each input and decided whether to process it first or pass it to another component, such as an accelerator or memory, to be processed and/or stored. In public clouds that rent virtual machines to tenants, however, the center of the server is moving from processors to SmartNICs/IPUs/DPUs that implement cloud infrastructure functionality such as triage of IO, virtualization, security, and Quality of Service. SmartNICs are complex systems, requiring programmable components for flexibility, ASICs for performance and efficiency, and software to coordinate and manage. This talk (i) motivates moving the center of cloud servers to SmartNICs, (ii) describes what SmartNICs do and how they do it, (iii) discusses the tradeoffs of implementing programmability on cores and FPGAs, and (iv) explores potential future paths for SmartNICs and the functionality they implement.
Derek Chiou is a Professor in the Electrical and Computer Engineering Department at The University of Texas at Austin and a Partner Architect at Microsoft responsible for future infrastructure offload system architecture. He is a co-founder of the Azure Boost project, Microsoft’s SmartNIC effort, and lead the Bing FPGA team to first deployment of Bing ranking on FPGAs. He was an assistant and associate professor from 2005 to 2016. Before joining UT in 2005, Dr. Chiou was a system architect at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.
Generative AI applications with their ability to produce natural language, computer code and images are transforming all aspects of society. These applications are powered by huge foundation models such as GTP-4 which are trained on massive unlabeled datasets. Foundation models have 10s of billions of parameters and have obtained state-of-the-art quality in natural language processing, vision and speech applications. These models are computationally challenging because they require 100s of petaFLOPS of computing capacity for training and inference. Future foundation models will have even greater capabilities provided by more complex model architectures with longer sequence lengths, irregular data access (sparsity) and irregular control flow. In this talk I will describe how the evolving characteristics of foundation models will impact the design of the optimized computing systems required for training and serving these models. A key element of improving the performance and lowering the cost of deploying future foundation models will be optimizing the data movement (Dataflow) within the model using specialized hardware. In contrast to human-in-the-loop applications such as conversational AI, an emerging application of foundation models is in continuous processing applications that operate without human supervision. I will describe how continuous processing and real-time machine learning can be used to create an intelligent network data plane.
Kunle Olukotun is a Professor of Electrical Engineering and Computer Science at Stanford University and he has been on the faculty since 1991. Olukotun is well known for leading the Stanford Hydra research project which developed one of the first chip multiprocessors with support for thread-level speculation (TLS). Olukotun founded Afara Websystems to develop high-throughput, low power server systems with chip multiprocessor technology. Afara was acquired by Sun Microsystems; the Afara microprocessor technology, called Niagara, is at the center of Sun’s throughput computing initiative. Niagara based systems have become one of Sun’s fastest ramping products ever. Olukotun is actively involved in research in computer architecture, parallel programming environments and scalable parallel systems. Olukotun currently co-leads the Transactional Coherence and Consistency project whose goal is to make parallel programming accessible to average programmers. Olukotun also directs the Stanford Pervasive Parallelism Lab (PPL) which seeks to proliferate the use of parallelism in all application areas. Olukotun is an ACM Fellow (2006) for contributions to multiprocessors on a chip and multi threaded processor design. He has authored many papers on CMP design and parallel software and recently completed a book on CMP architecture. Olukotun received his Ph.D. in Computer Engineering from The University of Michigan.
Lieven Eeckhout (Ghent University)
Our brain executes very sparse computation, allowing for great speed and energy savings. Deep neural networks can also be made to exhibit high levels of sparsity without significant accuracy loss. As their size grows, it is becoming imperative that we use sparsity to improve their efficiency. This is a challenging task because the memory systems and SIMD operations that dominate todays CPUs and GPUs do not lend themselves easily to the irregular data patterns sparsity introduces. This talk will survey the role of sparsity in neural network computation, and the parallel algorithms and hardware features that nevertheless allow us to make effective use of it.
Nir Shavit received B.Sc. and M.Sc. degrees in Computer Science from the Technion - Israel Institute of Technology in 1984 and 1986, and a Ph.D. in Computer Science from the Hebrew University of Jerusalem in 1990. Shavit is a co-author of the book The Art of Multiprocessor Programming. He is a recipient of the 2004 Gödel Prize in theoretical computer science for his work on applying tools from algebraic topology to model shared memory computability and of the 2012 Dijkstra Prize in Distributed Computing for the introduction of Software Transactional Memory. For many years his main interests were techniques for designing, implementing, and reasoning about multiprocessor algorithms. These days he is interested in understanding the relationship between deep learning and how neural tissue computes and is part of an effort to do so by extracting connectivity maps of brain, a field called connectomics. Nir is the principal investigator of the Multiprocessor Algorithmics Group and the Computational Connectomics Group.