Adrián Castelló

Cited by

	All	Since 2019
Citations	559	425
h-index	13	10
i10-index	18	13

120

201520162017201820192020202120222023202419 22 32 54 40 54 62 56 102 110

Public access

View all

40 articles

7 articles

available

not available

Based on funding mandates

Co-authors

Enrique S. Quintana-OrtíUniversitat Politècnica de València, SpainVerified email at disca.upv.es
Manuel F. DolzUniversitat Jaume IVerified email at icc.uji.es
Jose DuatoUniversitat Politècnica de ValènciaVerified email at disca.upv.es
Antonio J. PeñaBarcelona Supercomputing Center (BSC)Verified email at bsc.es
Pavan BalajiArgonne National LaboratoryVerified email at anl.gov
Sangmin SeoKlaytn FoundationVerified email at klaytn.foundation
Francisco D. IgualUniversidad Complutense de MadridVerified email at ucm.es
Pedro Alonso-JordáUniversitat Politècnica de ValènciaVerified email at upv.es
Sergio IserteSenior Researcher @ BSCVerified email at bsc.es
Sandra CatalánUniversitat Jaume IVerified email at uji.es

Adrián Castelló

Generalitat Valenciana APOSTD Fellow @ Universitat Politècnica de València (UPV)

Verified email at disca.upv.es - Homepage

Code Auto-generation Programming Models High Performance Computing Lightweight threading Deep


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Argobots: A lightweight low-level threading and tasking framework S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ... IEEE Transactions on Parallel and Distributed Systems 29 (3), 512-526, 2017	163	2017
SLURM support for remote GPU virtualization: Implementation and performance study S Iserte, A Castelló, R Mayo, ES Quintana-Ortí, F Silla, J Duato, C Reano, ... 2014 IEEE 26th International Symposium on Computer Architecture and High …, 2014	34	2014
High Performance and Portable Convolution Operators for Multicore Processors P San Juan, A Castelló, MF Dolz, P Alonso-Jordá, ES Quintana-Ortí SBAC-PAD 2020, 2020	28*	2020
Improving the User Experience of the rCUDA Remote GPU Virtualization Framework C Reano, F Silla, A Castelló, AJ Pena, R Mayo, ES Quintana-Ortí, J Duato	24	2014
PyDTNN: a user-friendly and extensible framework for distributed deep learning S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre The Journal of Supercomputing 77, 9971-9987, 2021	21	2021
Reformulating the direct convolution for high-performance deep learning inference on ARM processors S Barrachina, A Castelló, MF Dolz, TM Low, H Martínez, ES Quintana-Ortí, ... Journal of Systems Architecture 135, 102806, 2023	20	2023
Analysis of model parallelism for distributed neural networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019	20	2019
A Review of Lightweight Thread Approaches for High Performance Computing A Castelló, AJ Peña, S Seo, R Mayo, P Balaji, ES Quintana-Ortí 2016 IEEE International Conference on Cluster Computing (CLUSTER 2016), 471-480, 2016	19	2016
Micro-kernels for portable and efficient matrix multiplication in deep learning G Alaejos, A Castelló, H Martínez, P Alonso-Jordá, FD Igual, ... The Journal of Supercomputing 79 (7), 8124-8147, 2023	17	2023
Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato 2nd High Performance Machine Learning Workshop (HPML 2019), 534-541, 2019	15	2019
On the use of remote GPUs and low-power processors for the acceleration of scientific applications A Castelló, J Duato, R Mayo, AJ Pena, ES Quintana-Ortí, V Roca, F Silla The Fourth International Conference on Smart Grids, Green Communications and …, 2014	15	2014
Anatomy of the BLIS family of algorithms for matrix multiplication A Castelló, ES Quintana-Ortí, FD Igual 2022 30th Euromicro International Conference on Parallel, Distributed and …, 2022	13	2022
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña International Conference on Parallel Processing (ICPP-2017), 60-69, 2017	13	2017
Enabling GPU Virtualization in Cloud Environments S Iserte, FJ Clemente-Castelló, A Castelló, R Mayo, ES Quintana-Ortí CLOSER 2016, 2016	13	2016
A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor C Ramírez, A Castelló, ES Quintana-Orti The Journal of Supercomputing 78 (16), 18051-18060, 2022	11	2022
Accelerating distributed deep neural network training with pipelined MPI allreduce A Castelló, ES Quintana-Ortí, J Duato Cluster Computing 24 (4), 3797-3813, 2021	11	2021
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS A Castelló, S Barrachina, MF Dolz, ES Quintana-Ortí, P San Juan, ... Journal of Systems Architecture 125, 102459, 2022	10	2022
A flexible research-oriented framework for distributed training of deep neural networks S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre 2021 IEEE International Parallel and Distributed Processing Symposium …, 2021	10	2021
GLT: A unified API for lightweight thread libraries A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña Euro-Par 2017: Parallel Processing: 23rd International Conference on …, 2017	8	2017
Performance–energy trade-offs of deep learning convolution algorithms on ARM processors MF Dolz, S Barrachina, H Martínez, A Castelló, A Maciá, G Fabregat, ... The Journal of Supercomputing 79 (9), 9819-9836, 2023	7	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors