Invited Speakers

We are very proud to announce the following Invited Speakers:

  • Pete Beckman − Argonne National Laboratory, USA
    [ 16:25 - 17:25, Wednesday, 3 Dec @ Auditorium, Kobe U. 2F ]
    "The Dynamic Duo for Exascale Computing: Power and Resilience" (PDF 3 MB)
    Abstract For the last several decades, many large-scale scientific applications have used a block synchronous programing paradigm. These applications took the very straightforward approach of partitioning equal-sized chunks of data across computational elements and then exchanging data when the computational phase was complete. To improve scalability of these applications, system software writers and hardware architects focused on reducing latency in the messaging system and eliminating variances in computational speed. These variances were often caused when some CPUs ran slightly slower or when system software multitasked between the user’s computation and message processing or memory management. As we look forward, will the Dynamic Duo: power management and resilience response change how we write scalable system software and applications for extreme-scale systems? What can we expect as we look into the future? Can our research community team up with the Dynamic Duo, rather than fight against them?
  • William Harrod − Department of Energy, USA
    [ 13:45 - 14:45, Tuesday, 2 Dec @ Auditorium, Kobe U. 2F ]
    "Delivering the Promise of Exascale" (PDF 15 MB)

    The advance of exascale computing in serving multidisciplinary fields in science, technology, industry and business, and social welfare in the US and internationally is challenged by constraints on power, size, cost, resilience, and productivity. Integrated advances are required in technology and architecture, operating and runtime system software, programming interfaces and environments, and fundamental algorithms if extreme scale computing is to be practically applied in the next decade.

    In preparation to benefit from these potential opportunities, new research initiatives are being undertaken worldwide to a degree unprecedented in recent years and through international collaboration. The following topics will be discussed during this talk:

    • This presentation will discuss the key technical challenges confronting the field and innovative strategies that may yield revolutionary breakthroughs to address them.
    • Some ongoing and possible future programs to energize and coordinate a broad campaign of research will be presented with an emphasis on how their independent outcomes will combine in synergy to drive the future of extreme-scale computing.
  • Thomas Schulthess − Swiss National Supercomputing Center, Switzerland
    [ 08:30 - 09:30, Wednesday, 3 Dec @ Auditorium, Kobe U. 2F ]
    "High-performance or high-productivity, can we have both?"
    Abstract Since the beginnings of electronic computing, supercomputing ・loosely defined as the most powerful scientific computing at any given time ・has led the way in technology development. Yet, the way we interact with supercomputers today has not changed much since the days we stopped using punch cards. Enhancing productivity on supercomputers has been a priority of several programs in the past, but apparently with little success. More recent efforts, such as the European Human Brain Project, strive to go further by making supercomputers interactive instruments. Here I will argue that we first learn from data science on how to adopt more descriptive developer environments that rely on performance portable domain and data structure specific tools. I will show how energy efficiency considerations can help amortize development costs of such high-performance computing tools. In fact, the direct study of energy consumption, which now is possible with fine granularity at the application level, can give us hints on how computing systems should be designed and optimized in the future.
  • Pavan Balaji − Argonne National Laboratory, USA
    [ 08:45 - 09:30, Thursday, 4 Dec @ Auditorium, Kobe U. 2F ]
    "Debunking the Myths of MPI Programming" (PDF 12 MB)
    Abstract MPI has long been considered the de facto standard for parallel programming. One of the primary strengths of MPI is its continuously evolving nature that allows it to absorb and incorporate the best practices in parallel computing in a standard and portable form. The MPI Forum has recently announced the MPI-3 standard and is working on the MPI-4 standard to extend traditional message passing into more dynamic, one-sided and fault tolerant communication capabilities. Nevertheless, given the disruptive architectural trends for Exascale computing, there is room for more. In this talk, I’ll first describe some of the capabilities that have been added in the recent MPI-3 standard and those that are being considered for the upcoming MPI-4 standard. Next I’ll describe some research efforts to extend MPI to work in massively multithreaded and heterogeneous environments for highly dynamic and irregular applications.
  • Venkat Chakravarthy − IBM India, India
    [ 15:05 - 15:50, Tuesday, 2 Dec @ Auditorium, Kobe U. 2F ]
    "Single Source Shortest Paths on Massive Graphs" (PDF 5 MB)
    Abstract In the single-source shortest path (SSSP) problem, the goal is to find the shortest paths from a source vertex v to all the other vertices in an input graph. We discuss parallel algorithms, derived from the Bellman-Ford and Delta-stepping algorithms.
    The algorithms employ various pruning techniques, such as edge classification and direction-optimization, to dramatically reduce inter-node communication traffic, and load balancing strategies to handle higher-degree vertices.
    The extensive performance analysis shows that the algorithms work well on scale-free and real-world graphs. In the largest tested configuration, an R-MAT graph with 256 billion vertices and four trillion edges on 32,768 Blue Gene/Q nodes, the algorithms achieve a processing rate of three Trillion Edges Per Second (TTEPS), a four orders of magnitude improvement over the best published results.
  • Harald Köstler − Universität Erlangen-Nürnberg, Germany
    [ 12:45 - 13:30, Thursday, 4 Dec @ Auditorium, Kobe U. 2F ]
    "ExaStencils - Automatic Code Generation for Highly Scalable Numerical Algorithms" (PDF 4 MB)
    Abstract In the last years real-time simulations and more complex models on large HPC clusters for the various applications in computational science and engineering became feasible mostly because of the progress made in parallel computer architectures like found in GPUs and modern multi-core CPUs.
    However, the implementation effort increases, if one wants to achieve good performance.
    A solution to this problem is to formulate the underlying physical models and numerical algorithms in an abstract way in a domain-specific language (DSL) and then automatically generate efficient C++ or CUDA code.
    A multi-layered approach is sketched that allows users to describe applications in a natural way from the mathematical model down to the program specification. As an example we show the scalability of a multigrid solver on Juqueen, which is currently one of the largest Blue Gene/Q systems.
  • Wolfgang De Meuter − Vrije Universiteit Brussel, Belgium
    [ 09:50 - 10:35, Wednesday, 3 Dec @ Seminar Room, AICS 1F ]
  • Rob Ross − Argonne National Laboratory, USA
    [ 13:20 - 14:05, Wednesday, 3 Dec @ Auditorium, Kobe U. 2F ]
    "Post-Petascale System Software: Applying Lessons Learned" (PDF 10 MB)

    As a niche community, the HPC community has greatly benefitted from hardware and software technologies developed for other uses. The Beowulf computing approach and the use of GPUs for scientific computing serve as two good examples of this phenomenon. As we consider the challenges inherent in post-petascale HPC, it is again time for us to examine what we might "borrow" from other computing domains.

    In this talk I consider changes taking place in how HPC systems are architected and used for scientific computing and look outward for promising solutions to some of our challenges. I also identify opportunities for HPC system software researchers to influence and improve the development of more general-purpose system software, giving back to the communities that so often provide us assistance.

  • Olaf Schenk − Università della Svizzera italiana, Switzerland
    [ 13:20 - 14:05, Wednesday, 3 Dec @ Lecture Hall, AICS 6F ]
    "Performance Engineering for Extreme-Scale Stochastic Optimizations on Petascale Architectures" (PDF 6 MB)
    Abstract We present a scalable approach that computes stochastic optimization problems under uncertainty in operationally-compatible time. Several algorithmic and performance engineering advances are discussed for these stochastic optimization problems. The new developments include novel incomplete augmented multicore sparse factorizationss, multicore- and GPU-based dense matrix implementations, and communication-avoiding Krylov solvers. We also improve the interprocess communication on Cray systems to solve 24-hour horizon power grid problems from electrical power grid systems of realistic size with up to 1.95 billion decision variables and 1.94 billion constraints. Full-scale results are reported on Cray XC30 and BG/Q, where we observe very good parallel efficiencies and solution times within a operationally defined time interval. To our knowledge, "real-time"-compatible performance on a broad range of architectures for this class of optimization problems has not been possible prior to present work.
  • Martin Schulz − Lawrence Livermore National Laboratory, USA
    [ 09:50 - 10:35, Wednesday, 3 Dec @ Auditorium, Kobe U. 2F ]
    "Performance Analysis for the Post-Petascale Era: Going Beyond Just Measuring Flop/s and Cache Misses" (PDF 25 MB)

    With rising system and application complexity, performance analysis is more important, but also more difficult than ever. In such environments, traditional tools, which measure simple performance metrics such as Flop/s and cache misses and relate them to source code, are no longer sufficient. Instead we require novel approaches that allow a deeper correlation of performance data with application and communication structure, that provide multiple views on measured data, and that offer intuitive visualizations to enable actionable insight.

    In this talk I will discuss a general methodology that enables such kind of tools and I will present two novel performance visualization tools as case studies: MemAxis, a memory analysis tool to gather and display data movement within NUMA systems, and Ravel, a trace visualizer based on logical time. Both tools provide the developer with an application centric view of performance data, which aids in capturing the performance behavior of the application and thereby enables optimization.

  • Jeffrey Vetter − Oak Ridge National Laboratory, USA
    [ 13:20 - 14:05, Wednesday, 3 Dec @ Seminar Room, AICS 1F ]
  • Felix Wolf − German Research School for Simulation Sciences
    [ 09:15 - 10:00, Thursday, 4 Dec @ Seminar Room, AICS 1F ]
    "Mass-Producing Insightful Performance Models of Parallel Applications" (PDF 10 MB)
    Abstract Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made – a point where remediation can be difficult. However, creating performance models that would allow such issues to be pinpointed earlier is so laborious that application developers attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. By automatically generating empirical performance models for each function in the program, we make this powerful methodology easier to use and expand its coverage. This presentation gives an overview of the method, illustrates its usage in a range of examples, and assesses its potential.
  • Eduard Ayguadé − Barcelona Supercomputing Center, Spain
    [ 09:50 - 10:35, Wednesday, 3 Dec @ Lecture Hall, AICS 6F ]
    "Runtime-centric parallel programming models" (PDF 19 MB)
    Abstract When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while hardware designers were able to aggressively exploit instruction-level parallelism (ILP) in superscalar processors. However this is not true anymore wtih the new homogenoeus and heterogeneous multicore/accelerator-based architectures, which have contributed to raise the so-called programmability wall. In this talk we will discuss the opportunities, in terms of architecture-runitme co-desing, offered by runtime systems implementing new task-based parallel programming models for current and future multi-/many-core architectures. The runtime has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have. OmpSs, the task-based parallel programming model proposed at BSC which has influenced the evolution of the task model in OpenMP, will be used to demonstrate such opportunities.