Hard Real-Time Systems and Many-Core Platforms
This talk will illustrate the difficulties of obtaining timing guarantees from multi- and many-core
platforms. Global scheduling, partitioned scheduling and semi-partitioned scheduling will be
addressed. The requirements for a NoC (Network-on-Chip) will be outlined and the wormhole
protocol, and its associated analysis, will be covered. Problems that have arisen with this
analysis will be reviewed and will include the current state-of-the-art. Finally a different
means of analysing complex platforms will be presented; here predictability is engineered
to be an emergent property.
Professor Alan Burns is a member of the Department of Computer Science, University of York, U.K. His research interests cover a number of aspects of real-time systems including the assessment of languages for use in the real-time domain, distributed operating systems, the formal specification of scheduling algorithms and implementation strategies, and the design of dependable user interfaces to real-time applications. Professor Burns has authored/co-authored 500 papers/reports and books. Most of these are in the real-time area. His teaching activities include courses in Operating Systems and Real-time Systems. In 2009 Professor Burns was elected a Fellow of the Royal Academy of Engineering. In 2012 he was elected a Fellow of the IEEE.
Making the case for the safety of machine learning for highly automated driving
Machine learning technologies such as neural networks show great potential for enabling automated driving functions in an open world context. However, these technologies can only be released for series production if it can be demonstrated to be sufficiently safe. As a result, convincing arguments need to be made for the safety of automated driving systems based on such technologies. This talk examines the various forms in which machine learning can be applied to automated driving and the resulting functional safety challenges. A systems engineering approach is proposed to derive a precise definition of the performance requirements on the function to be implemented on which to base the safety case. A systematic approach to structuring the safety case, is introduced and a number of open research questions are presented.
Dr. Simon Burton graduated in computer science at the University of York, where he also achieved his Phd on the topic of the verification and validation of safety-critical systems. Dr. Burton has a background in a number of safety-critical industries. He has spent the last 16 years mainly focusing on automotive, working in research and development projects within a major OEM as well as leading consulting, engineering service and product organisations supporting OEM's and their supply chain with solutions for process improvement, embedded software, safety and security. He currently has the role of Chief Expert within the Robert Bosch GmbH Central Research division, where he coordinates research strategy in the area of safety, security, reliability and availability of software intensive systems.
High-end GPU processors implement a throughput-oriented architecture
that has been highly successful for CPU acceleration in supercomputers
and datacenters. GPUs can be characterized as manycore processors,
that is, parallel processors where a large number or cores are
distributed across compute units. Each GPU compute unit or « streaming
multiprocessor » is composed of multi-threaded processing cores
sharing a control unit, a local memory and a global memory hierarchy.
However effective, GPU architecture entails significant limitations in
the areas of expressiveness of programming environments, effectiveness
on diverging parallel computations, and execution time predictability.
We discuss the architectural options and programming models of
accelerators designed to address high-performance applications ranging
from embedded to extreme computing. Similarly to GPU architectures,
such accelerators comprise multiple compute units connected by on-chip
global fabrics to external memory systems and network interfaces.
Selecting compute units composed of fully programmable cores,
coprocessors and asynchronous data transfer engines enable to match
the acceleration performance and energy efficiency of GPU processors,
while avoiding their limitations. This discussion is illustrated by
the co-design of the 3rd-generation MPPA manycore processor for
automated driving, and is related to the implication of Kalray in the
Mont-Blanc 2020 and the European Processor Initiative projects that
target exascale computing.
Benoît is the Kalray VLIW core main architect, and co-architect of the Kalray Multi-Purpose Processing Array (MPPA). He his also a direct contributor to several components of the AccessCore software development environment.
Before joining Kalray, Benoît was in charge of Research and Development for the STMicroelectronics Software, Tools, Services division. He was promoted to STMicroelectronics Fellow in 2008. Prior to STMicroelectronics, Benoît worked at the Cray Research park (Minnesota, USA), where he developed the software pipeliner of the Cray T3E production compilers.
Benoît earned an engineering degree in Radar and Telecommunications from the Ecole Nationale Supérieure de l’Aéronautique et de l’Espace (Toulouse, France), and a doctoral degree in computer systems from the University Pierre et Marie Curie (Paris) under the direction of Prof. P. Feautrier. He completed his post-doctoral studies at the McGill university (Montreal, Canada) at the ACAPS laboratory led by Prof. G. R. Gao.
Benoît has published over 50 conference papers, journal articles and book chapters, and holds 10 hardware patents.
From Here to MARS: Self-Aware Middleware for Adaptive Reflective Computer Systems
Self-awareness has a long history in biology, psychology, medicine, engineering and (more recently) computing. In the past decade this has inspired new self-aware strategies for emerging computing substrates (e.g., complex heterogeneous MPSoCs) that must cope with the (often conflicting) challenges of resiliency, energy, heat, cost, performance, security, etc. in the face of highly dynamic operational behaviors and environmental conditions. Earlier we had championed the concept of CyberPhysical-Systems-on-Chip (CPSoC), a new class of sensor-actuator rich many-core computing platforms that intrinsically couples on-chip and cross-layer sensing and actuation to enable self-awareness. Unlike a traditional MPSoC, CPSoC exploits self-aware models to enable intelligent co-design of the control, communication, and computing (C3) substrates of the SoC to adaptively achieve desired objectives and Quality-of-Service (QoS). The CPSoC design paradigm achieves self-awareness through introspection (i.e., modeling and observing its own internal and external behaviors) combined with both reflexive and reflective adaptations via cross-layer physical and virtual sensing and actuations applied across multiple layers of the hardware/software system stack. The closed loop control used for adaptation to dynamic variation -- commonly known as the observe-decide-act (ODA) loop -- is implemented using an adaptive, reflective middleware layer.
In this talk I present MARS: a Middleware for Adaptive Reflective Computer Systems that performs both reactive and proactive resource allocation decisions and power management by leveraging concepts from reflective systems. Reflection enables dynamic adaptation based on both external feedback and introspection (i.e., self-assessment). In our context, this translates into performing resource management actuations considering both sensing information (e.g., readings from performance counters, power sensors, etc.) to assess the current system state, as well as models to predict the behavior of other system components before performing an action. I will summarize results leveraging our MARS toolchain to i) perform energy-efficient task mapping on heterogeneous architectures, ii) explore the design space of novel HMP architectures, and iii) extend the lifetime of mobile devices.
Nikil Dutt is a Chancellor's Professor of CS, Cognitive Sciences, and EECS at the University of California, Irvine. He received a PhD from the University of Illinois at Urbana-Champaign (1989). His research interests are in embedded systems, EDA, computer architecture and compilers, distributed systems, and brain-inspired architectures and computing. He has received numerous best paper awards and is coauthor of 7 books. Professor Dutt has served as EiC of ACM TODAES and AE for ACM TECS and IEEE TVLSI. He is on the steering, organizing, and program committees of several premier EDA and Embedded System Design conferences and workshops, and has also been on the advisory boards of ACM SIGBED, ACM SIGDA, ACM TECS and IEEE ESL. He is an ACM Fellow, IEEE Fellow, and recipient of the IFIP Silver Core Award.
The SpiNNaker - Spiking Neural Network Architecture - machine is a brain-inspired massively-parallel neuromorphic computer openly available under the auspices of the EU Flagship Human Brain Project. Establishing the reliable operation of a million-core machine, developed with limited resources, has required considerable attention to both hardware and software issues, and many of the lessons learned will be relevant to other projects of similar scale. Failures can occur at all levels, from wafer test escapes to PCB assembly issues through to radiation-induced soft memory errors. It has proved easier (and cheaper!) to accommodate faults than to eliminate them.
Steve Furber CBE FRS FREng is ICL Professor of Computer Engineering in the School of Computer Science at the University of Manchester, UK. After completing a BA in mathematics and a PhD in aerodynamics at the University of Cambridge, UK, he spent the 1980s at Acorn Computers, where he was a principal designer of the BBC Microcomputer and the ARM 32-bit RISC microprocessor. Over 100 billion variants of the ARM processor have since been manufactured, powering much of the world's mobile and embedded computing. He moved to the ICL Chair at Manchester in 1990 where he leads research into asynchronous and low-power systems and, more recently, neural systems engineering, where the SpiNNaker project is delivering a computer incorporating a million ARM processors optimised for brain modelling applications.
Neural networks have been established as a generic and powerful method to approach challenging problems such as image classification, object detection or decision making and will be widely adopted in the future, for example for autonomous driving systems. While their successful deployment requires an enormous amount of compute, it has been proven that the quantization of network parameters is an effective measure to reduce the hardware implementation cost so effectively that the feasible scope of applications is expanded even very space and energy-constraint compute environments. Reconfigurable logic enables custom-tailored architectures that fully exploit the advantages of quantization.
Dr. Giulio Gambardella is working as a research scientist in Xilinx Research, which he joined in 2016. His expertise covers hardware acceleration of machine learning algorithms, with a particular emphasis on quantized neural networks.
Before joining Xilinx, he was a visiting scientist at ABB Corporate Research Centre in Oslo, working on dependability evaluation for embedded systems in safety-critical applications.
He obtained his PhD in Computer Engineering from Politecnico di Torino (Italy)with a thesis, “Dynamic Partial Reconfiguration for Dependable Systems”, targeting FGPA adoption in safety-critical applications. Further research interests include memory testing, fault-tolerant reconfigurable systems and software based self-test, working in strong collaboration with CINI, the Italian National Interuniversity Consortium for Informatics and several private companies. He also served as a reviewer for several international journals and conferences.
Run-time power management of multi- and many-core systems
Power- and energy-efficiency continues to be a primary concern in the design and management of computing systems, through from mobile devices (battery life and temperature) to HPC (electricity bills and temperature). In this talk I will give a summary of our research into the runtime management (RTM) of multi- and many-core computing systems, that have come out of the PRiME (www.prime-project.org) and Graceful research projects. I will present a range of different approaches that we have developed and experimentally validated, and the key findings that we have made along the way. These encompass 1) exploring RTM on both novel and heterogeneous/homogeneous COTS multi-core platforms, 2) the impact of core scaling on RTMs, 3) issues and approaches for managing concurrently executing workloads on shared resource, and 4) comparing the impact of offline vs online characterisation approaches. I will also present a range of open-source tools that we have developed and released through these projects, spanning simulation and runtime power models for multi-core CPUs, to a framework for researchers to incorporate multi-core runtime management into their system and enable level comparison with the SoA.
Geoff V. Merrett is an Associate Professor at the School of Electronics and Computer Science, University of Southampton, where he is Head of the Centre for Internet of Things and Pervasive Systems. He received the BEng (1st, Hons) and PhD degrees in Electronic Engineering from Southampton in 2004 and 2009 respectively. He is internationally known for his research into the system-level energy management of mobile and self-powered embedded systems, and he has published over 150 journal and conference papers in these areas. He has given invited talks on his research (e.g. DAC, DATE), and had a number of best paper nominations and awards (e.g. DATE, CODES-ISSS, IJCAI). He is technical manager of the Arm-ECS Research Centre, an award winning industry-academia collaboration between the University of Southampton and ARM. He has edited a number of research books, and is currently co-editing an IET Press book titled "Multi- and Many-Core Computing: Software and Hardware" due for publication at the beginning of 2019. He is an Associate Editor for the IET CDS (IF: 1.092) and MDPI Sensors (IF: 2.677) journal, has been guest editor on a number of special issues in areas related to his research interests, serves as a reviewer for a number of leading journals, and on TPCs for a range of conferences.
Methodologies for Application Mapping for NoC-Based MPSOCs
In this talk, we give an overview of novel techniques for systematically mapping applications to NoC-based multi-core architectures
(MPSoCs). Complex applications requiring heterogenous processing resources are often described by task graphs
with data dependencies. Here, the nodes represent actors or tasks which are typically activated periodically based on the
availability of data. One prominent domain of applications fitting this model is stream processing. Here, it is often important to guarantee
either bandwidth or execution time requirements. But more recently, also security, energy and reliability aspects impose
constraints on the mapping of the tasks as well as their communication to cores, respectively routes in the underlying NoC.
Concerning mapping methodologies, we first present a class of algorithms that perform "Self-Embedding". The idea is here that
a source node issues a request to find appropriate resources to embed its sucessor tasks, and so on.
The next class of techniques introduced is called "Hybrid Application Mapping (HAM)". Here, a careful analysis and
characterization of symmetric mappings by constellations of cores and routes is explored in a static (compile-time)
phase called "Design Space Exploration (DSE)". At run-time, the operating system then only needs to search within such
pre-analysed constellations for finding a concrete mapping that will satisfy the given non-functional constraints by construction.
We present ideas of how timing constraints may be statically analysed in case of compositional MPSoC architectures such that
deadlines or throughput requirements will be automatically met for streaming applications.
Finally, we conclude with a discussion on resource constellations that may satisfy certain security requirements on an MPSoC.
Jürgen Teich is with Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany, where he is head of the chair of hardware/software codesign since 2003. He received the M.Sc. Degree (Dipl.- Ing.; with honors) from the University of Kaiserslautern, Germany, in 1989 and the Ph.D. degree (Dr.-Ing.; summa cum laude) from the University of Saarland, Saarbrücken, Germany, in 1993.
Prof. Teich has organized various ACM/IEEE conferences/symposia as program chair including CODES+ISSS 2007, FPL 2008, ASAP 2010, DATE 2016, and was vice general chair of DATE 2018. He currently serves as the general chair of DATE 2019 and in the editorial boards of diverse scientific journals such as ACM TODAES, IEEE Design and Test, IET Cyber-Physical Systems, and JES. He has edited two textbooks on hardware/software codesign and the Handbook of Hardware/Software Codesign (Springer).
Since 2010, he has also been the principal coordinator of the Transregional Research Center 89 “invasive computing” on multicore research funded by the German Research Foundation (DFG). He is a member of Academia Europaea, the academy of europe, and a fellow of the IEEE.
Continuous on-line adaptation in many-core systems: The GRACEFUL project
Imagine a many-core system with thousands or millions of processing nodes that gets better and better with time at executing an application, “gracefully” providing optimal power usage while maximizing performance levels and tolerating component failures. Applications running on this system would be able to autonomously vary the number of nodes in use to overcome three critical issues related to the implementation of many-core systems: reliability, energy efficiency, and on-line optimisation.
The approach developed in the GRACEFUL project explores hardware mechanisms centred around two basic processes: graceful degradation implies that the system will be able to cope with faults (permanent or temporary) or potentially damaging power consumption peaks by lowering its performance; graceful amelioration implies that the system will constantly seek for alternative implementations that represent an improvement from the perspective of some user-defined parameter (e.g. execution speed, power consumption).
Dr. Gianluca Tempesti received a B.S.E. in electrical engineering from Princeton University in 1991 and a M.S.E. in computer science and engineering from the University of Michigan at Ann Arbor in 1993. In 1998 he received a Ph.D. from the Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland. In 2003 he was granted a young professorship award from the Swiss National Science Foundation (FNS) and created the Cellular Architecture Research Group (CARG). In 2006 he joined the Department of Electronic Engineering at the University of York as a Reader in Intelligent Systems. His research interests include bio-inspired digital hardware and software, built-in self-test and self-repair, programmable logic, and many-core systems, and he has published over 80 articles in these areas.
ARM was founded as a spin-off from Acorn and Apple in 1990. Its first multicore capable microarchitecture entered the market fifteen years later. In the 13 years since core counts have exploded and Arm based system-on-chip (SoC) now have upwards of 48 cores with Arm servers having configurations with up to 256 threads. The company has expanded focus from embedded and mobile phone technology to infrastructure based products incorporating networking, server, and high performance computing market segments. Along with core-count increases Arm has explored a variety of multi-core options including different microarchitecture configurations, fine-grained multicore power management, and heterogenous multi-core options incorporating accelerators into the SoC. This talk will discuss Arm’s history in multicore focusing on software/hardware co-design and present some of the ongoing research projects to increase efficiency in large-scale and increasingly heterogenous multi-core sockets.
Eric is currently a fellow in the Research division at Arm in Austin, TX leading the software and large scale systems research group. The group's activities include exploring the place of Arm within high performance computing, data centers, and investigating next generation concepts in operating systems, runtimes, and systems software.
Previously, he was a research staff member in the Future Systems department at IBM's Austin Research Lab. Over the twelve years at IBM, he has worked on distributed operating systems for high performance computing, low-power dense server and network processor appliance blades, DRAM power management, full system simulation, high performance computing, hypervisors, and the Linux operating system. Before coming to IBM, he worked for four years at Lucent Technologies Bell Laboratories on the Plan 9 and Inferno operating systems. His current research focuses on exploring new operating system and distributed system techniques for systems with hundreds of thousands to millions of cores. Eric received a B.S. in Computer Science from the Rochester Institute of Technology in 1996 and has attended graduate courses in Stanford.