Publications 

TitleAuthorsDateLinkCitation
Characterizing Deep-Learning
I/O Workloads in TensorFlow
Steven W. D. Chien, Stefano Markidis,
Chaitanya Prasad Sishtla, Luis Santos,
Pawel Herman, Sai Narasimhamurthy,
Erwin Laure
6/10/2018DOI: 10.1109/PDSW-DISCS.2018.00011
Chien, S. W., Markidis, S., Sishtla, C. P., Santos, L., Herman, P., Narasimhamurthy, S., & Laure, E. (2018, November). Characterizing deep-learning I/O workloads in TensorFlow. In 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS) (pp. 54-63). IEEE.
TensorFlow Doing HPCSteven W. D. Chien, Stefano Markidis,
Vyacheslav Olshevsky,
Yaroslav Bulatov,
Erwin Laure, Jeffrey S. Vetter
11/03/2019DOI: 10.1109/IPDPSW.2019.00092 Chien, S. W., Markidis, S., Olshevsky, V., Bulatov, Y., Laure, E., & Vetter, J. (2019, May). TensorFlow doing HPC. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (pp. 509-518). IEEE.
Multi-GPU Acceleration of the
iPIC3D Implicit Particle-in-Cell Code
Chaitanya Prasad Sishtla,
Steven W. D. Chien,
Vyacheslav Olshevsky, Erwin Laure,
Stefano Markidis
7/05/2019DOI: 10.1007/978-3-030-22750-0_58 Sishtla, C. P., Chien, S. W., Olshevsky, V., Laure, E., & Markidis, S. (2019, June). Multi-GPU acceleration of the iPIC3D implicit particle-in-cell code. In International Conference on Computational Science (pp. 612-618). Springer, Cham.
Posit NPB: Assessing the Precision Improvement
in HPC Scientific Applications
Steven W. D. Chien, Ivy B. Peng,
Stefano Markidis
12/07/2019DOI: 10.1007/978-3-030-43229-4_26 Chien, S. W., Peng, I. B., & Markidis, S. (2019, September). Posit NPB: Assessing the precision improvement in HPC scientific applications. In International Conference on Parallel Processing and Applied Mathematics (pp. 301-310). Springer, Cham.
Automated classification of plasma regions
using 3D particle energy distribution
Vyacheslav Olshevsky,
Yuri V. Khotyaintsev,
Andrey Divin, Gian Luca Delzanno,
Sven Anderzen, Pawel Herman,
Steven W.D. Chien, Levon Avanov,
Stefano Markidis
15/08/2019 https://arxiv.org/abs/1908.05715 Olshevsky, V., Khotyaintsev, Y. V., Lalti, A., Divin, A., Delzanno, G. L., Anderzén, S., ... & Markidis, S. (2021). Automated classification of plasma regions using 3D particle energy distributions. Journal of Geophysical Research: Space Physics, 126(10), e2021JA029620.
Exposition, Clarification, and Expansion
of MPI Semantic Terms and Conventions
Purushotham V. Bangalore,
Rolf Rabenseifner, Daniel J. Holmes,
Julien Jaeger, Guillaume Mercier,
Claudia Blaas-Schenner,
Anthony Skjellum
11/09/2019DOI: 10.1145/3343211.3343213 Bangalore, P. V., Rabenseifner, R., Holmes, D. J., Jaeger, J., Mercier, G., Blaas-Schenner, C., & Skjellum, A. (2019, September). Exposition, clarification, and expansion of MPI semantic terms and conventions: is a nonblocking MPI function permitted to block?. In Proceedings of the 26th European MPI Users' Group Meeting (pp. 1-10).
MPI Sessions: Evaluation of an Implementation
in Open MPI
Nathan Hjelm, Howard Pritchard,
Samuel Guitiérrez, Daniel Holmes,
Ralph Castain, Anthony Skjellum
8/10/2019DOI: 10.1109/CLUSTER.2019.8891002 Hjelm, N., Pritchard, H., Gutiérrez, S. K., Holmes, D. J., Castain, R., & Skjellum, A. (2019, September). MPI sessions: Evaluation of an implementation in open MPI. In 2019 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 1-11). IEEE.
Performance Evaluation of Advanced Features
in CUDA Unified Memory.
Steven W. D. Chien,
Ivy Peng,
& Stefano Markidis
21/10/2019DOI: 10.1109/MCHPC49590.2019.00014Chien, S., Peng, I., & Markidis, S. (2019, November). Performance evaluation of advanced features in CUDA unified memory. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) (pp. 50-57). IEEE.
Streaming Message Interface: High-performance
distributed memory programming
on reconfigurable hardware.
De Matteis, T., de Fine Licht, J.,
Beránek, J., & Hoefler, T.
17/11/2019DOI: 10.1145/3295500.3356201De Matteis, T., de Fine Licht, J., Beránek, J., & Hoefler, T. (2019, November). Streaming Message Interface: High-performance distributed memory programming on reconfigurable hardware. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-33).
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler30/6/2020DOI: Preprint Ivanov, A., Dryden, N., Ben-Nun, T., Li, S., & Hoefler, T. (2021). Data movement is all you need: A case study on optimizing transformers. Proceedings of Machine Learning and Systems, 3, 711-732.
Why is MPI (perceived to be) so complex?: Part 1—Does strong progress simplify MPI? Daniel J. Holmes,
Anthony Skjellum,
Derek Schafer
21/9/2020DOI: 10.1145/3416315.3416318 Holmes, D. J., Skjellum, A., & Schafer, D. (2020, September). Why is MPI (perceived to be) so complex? Part 1—Does strong progress simplify MPI?. In 27th European MPI Users' Group Meeting (pp. 21-30).
Communication and Timing Issues with MPI Virtualization Alexandr Nigay, Lukas Mosimann, Timo Schneider, Torsten Hoefler21/9/2020DOI: 10.1145/3416315.3416317 Nigay, A., Mosimann, L., Schneider, T., & Hoefler, T. (2020, September). Communication and Timing Issues with MPI Virtualization. In 27th European MPI Users' Group Meeting (pp. 11-20).
Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics Niclas Jansson, Martin Karp, Artur Podobas, Stefano Markidis, Philipp Schlatter21/9/2020DOI: PreprintJansson, N., Karp, M., Podobas, A., Markidis, S., & Schlatter, P. (2021). Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics. arXiv preprint arXiv:2107.01243.
Collectives and Communicators: A Case for Orthogonality: (Or: How to get rid of MPI neighbor and enhance Cartesian collectives)
Jesper Larsson Träff,
Sascha Hunold,
Guillaume Mercier,
Daniel J. Holmes
21/9/2020DOI: 10.1145/3416315.3416319Träff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2020, September). Collectives and Communicators: A Case for Orthogonality: (Or: How to get rid of MPI neighbor and enhance Cartesian collectives). In 27th European MPI Users' Group Meeting (pp. 31-38).
Learning representations in Bayesian Confidence Propagation neural networks Naresh Balaji Ravichandran, Anders Lansner, Pawel Herman28/9/2020DOI: 10.1109/IJCNN48605.2020.9207061 Ravichandran, N. B., Lansner, A., & Herman, P. (2020, July). Learning representations in Bayesian Confidence Propagation neural networks. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.
sputniPIC: an Implicit Particle-in-Cell Code for Multi-GPU Systems
Steven W. D. Chien, Jonas Nylund, Gabriel Bengtsson, Ivy B. Peng, Artur Podobas, Stefano Markidis22/10/2020DOI: 10.1109/SBAC-PAD49847.2020.00030 Chien, S. W., Nylund, J., Bengtsson, G., Peng, I. B., Podobas, A., & Markidis, S. (2020, September). sputnipic: An implicit particle-in-cell code for multi-gpu systems. In 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (pp. 149-156). IEEE.
Spectral Element Simulations on the NEC SX-Aurora TSUBASA Niclas Jansson20/1/2021DOI: 10.1145/3432261.3432265Jansson, N. (2021, January). Spectral Element Simulations on the NEC SX-Aurora TSUBASA. In The International Conference on High Performance Computing in Asia-Pacific Region (pp. 32-39).
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations G. Kwasniewski, M. Kabić, T. Ben-Nun, A. Nikolaos Ziogas, J. Eirik Saethre, A. Gaillard, T. Schneider, M. Besta, A. Kozhevnikov, J. VandeVondele, T. Hoefler17/02/2021DOI: 10.1145/3458817.3476167Kwasniewski, G., Kabic, M., Ben-Nun, T., Ziogas, A. N., Saethre, J. E., Gaillard, A., ... & Hoefler, T. (2021, November). On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15).
FBLAS: Streaming Linear Algebra on FPGA
Tiziano De Matteis, Johannes de Fine Licht, Torsten Hoefler22/2/2021DOI: 10.1109/SC41405.2020.00063 De Matteis, T., de Fine Licht, J., & Hoefler, T. (2020, November). FBLAS: Streaming linear algebra on FPGA. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-13). IEEE.
The Old and the New: Can Physics-Informed Deep-Learning Replace Traditional Linear Solvers?
Stefano Markidis12/3/2021DOI: Preprint Markidis, S. (2021). The old and the new: Can physics-informed deep-learning replace traditional linear solvers?. Frontiers in big Data, 92.
Automatic Particle Trajectory Classification in Plasma Simulations Stefano Markidis, Ivy Peng, Artur Podobas, Itthinat Jongsuebchoke, Gabriel Bengtsson, Pawel Herman23/4/2021DOI: 10.1109/MLHPCAI4S51975.2020.00014 Markidis, S., Peng, I., Podobas, A., Jongsuebchoke, I., Bengtsson, G., & Herman, P. (2020, November). Automatic particle trajectory classification in plasma simulations. In 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S) (pp. 64-71). IEEE.
RISC-V in-network accelerator for flexible high-performance low-power packet processing S. Di Girolamo, A. Kurth, A. Calotoiu, T. Benz, T. Schneider, J. Beránek, L. Benini, T. Hoefler14/06/2021DOI: 10.1109/ISCA52012.2021.00079Di Girolamo, S., Kurth, A., Calotoiu, A., Benz, T., Schneider, T., Beránek, J., ... & Hoefler, T. (2021, June). A RISC-V in-network accelerator for flexible high-performance low-power packet processing. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) (pp. 958-971). IEEE.
Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers Martin Svedin,
Steven W. D. Chien,
Gibson Chikafa,
Niclas Jansson,
Artur Podobas
21/6/2021DOI: 10.1145/3468044.3468053 Svedin, M., Chien, S. W., Chikafa, G., Jansson, N., & Podobas, A. (2021, June). Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers. In Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (pp. 1-6).
StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs Artur Podobas,
Martin Svedin,
Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner,
Stefano Markidis
21/6/2021DOI: 10.1145/3468044.3468052 Podobas, A., Svedin, M., Chien, S. W., Peng, I. B., Ravichandran, N. B., Herman, P., ... & Markidis, S. (2021, June). Streambrain: an hpc framework for brain-like neural networks on cpus, gpus and fpgas. In Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (pp. 1-6).
RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing
Marcin Copik, Konstantin Taranov, Alexandru Calotoiu, Torsten Hoefler25/6/2021DOI: Preprint Copik, M., Taranov, K., Calotoiu, A., & Hoefler, T. (2021). rFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing. arXiv preprint arXiv:2106.13859.
Semi-supervised learning with Bayesian Confidence Propagation Neural Network Naresh Balaji Ravichandran, Anders Lansner, Pawel Herman29/6/2021DOI: PreprintRavichandran, N. B., Lansner, A., & Herman, P. (2021). Semi-supervised learning with bayesian confidence propagation neural network. arXiv preprint arXiv:2106.15546.
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional PipelinesShigang Li, Torsten Hoefler14/07/2021 https://arxiv.org/abs/2107.06925Li, S., & Hoefler, T. (2021, November). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-14).
MPI collective communication through a single set of interfaces: A case for orthogonality Jesper Larsson, Träffa Sasch, Hunold Guillaume, Mercier, Daniel J.Holmes13/8/2021DOI: 10.1016/j.parco.2021.102826 Träff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2021). MPI collective communication through a single set of interfaces: A case for orthogonality. Parallel Computing, 107, 102826.
A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations Xavier Aguilar, Stefano Markidis13/10/2021DOI: 10.1109/Cluster48925.2021.00103 Aguilar, X., & Markidis, S. (2021, September). A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations. In 2021 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 692-697). IEEE.
Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain
Martin Svedin, Artur Podobas, Steven W. D. Chien, Stefano Markidis13/10/2021DOI: 10.1109/Cluster48925.2021.00105 Svedin, M., Podobas, A., Chien, S. W., & Markidis, S. (2021, September). Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain. In 2021 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 705-710). IEEE.
A Data-Centric Optimization Framework for Machine Learning Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler20/10/2021DOI: Preprint Rausch, O., Ben-Nun, T., Dryden, N., Ivanov, A., Li, S., & Hoefler, T. (2021). A Data-Centric Optimization Framework for Machine Learning. arXiv preprint arXiv:2110.10802.
Flare: Flexible In-Network Allreduce In Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisD. De Sensi, S. Di Girolamo, S. Ashkboos, S. Li, T. Hoefler14/11/ 2021DOI: 10.1145/3458817.3476178De Sensi, D., Di Girolamo, S., Ashkboos, S., Li, S., & Hoefler, T. (2021, November). Flare: flexible in-network allreduce. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-16).
Brain-Like Approaches to Unsupervised Learning of Hidden Representations - A Comparative Study Ravichandran N.B., Lansner A., Herman P.07/09/2021DOI: 10.1007/978-3-030-86383-8_13 Ravichandran, N. B., Lansner, A., & Herman, P. (2021, September). Brain-like approaches to unsupervised learning of hidden representations-a comparative study. In International Conference on Artificial Neural Networks (pp. 162-173). Springer, Cham.
Data Movement Is All You Need: A Case Study on Optimizing Transformers Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler08/11/2021 DOI: Preprint Ivanov, A., Dryden, N., Ben-Nun, T., Li, S., & Hoefler, T. (2021). Data movement is all you need: A case study on optimizing transformers. Proceedings of Machine Learning and Systems, 3, 711-732.
Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance
Systems
Dykes, T., Foyer, C., Richardson, H., Svedin, M., Podobas, A., Jansson, N., Markidis, S., Tate, A., McIntosh-Smith, S14/11/2021DOI: 10.1109/P3HPC54578.2021.00005Dykes, T., Foyer, C., Richardson, H., Svedin, M., Podobas, A., Jansson, N., ... & McIntosh-Smith, S. (2021, November). Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC) (pp. 10-21). IEEE.

Posters

TitleAuthorsDate - EventLink
Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell Code
Chaitanya Prasad Sishtla,
Steven W. D. Chien,
Vyacheslav Olshevsky,
Erwin Laure, Stefano Markidis
12/06/2019 - ICCS 2019 Multi-GPU Acceleration
of the iPIC3D Implicit Particle-in-Cell Code
User-level schedules Derek Schafer,
Martin Ruefenacht,
Anthony Skjellum,
Daniel Holmes
11/09/2019 - EuroMPI 2019 User-level schedules
MPI Semantic Terms and Conventions Explained Claudia Blaas-Schenner,
Daniel Holmes,
Rolf Rabenseifner,
Anthony Skjellum,
Guillaume Mercier,
Julien Jaeger,
Purushotham V. Bangalore
11/09/2019 - EuroMPI 2019 MPI Semantic Terms and Conventions Explained