An analysis of traces from a production mapreduce cluster S Kavulya, J Tan, R Gandhi, P Narasimhan Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster …, 2010 | 450 | 2010 |
Causes of failure in web applications S Pertet, P Narasimhan Parallel Data Laboratory, Carnegie Mellon University, CMU-PDL-05-109, 2005 | 261 | 2005 |
SALSA: analyzing logs as state machines J Tan, X Pan, S Kavulya, R Gandhi, P Narasimhan Proceedings of the First USENIX conference on Analysis of system logs, 6-6, 2008 | 169 | 2008 |
Ganesha: blackBox diagnosis of MapReduce systems X Pan, J Tan, S Kavulya, R Gandhi, P Narasimhan ACM SIGMETRICS Performance Evaluation Review 37 (3), 8-13, 2010 | 120 | 2010 |
Mochi: visual log-analysis based tools for debugging hadoop J Tan, X Pan, S Kavulya, R Gandhi, P Narasimhan USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA 6 (3), 2009 | 101 | 2009 |
Kahuna: Problem diagnosis for mapreduce-based cloud computing environments J Tan, X Pan, E Marinelli, S Kavulya, R Gandhi, P Narasimhan 2010 IEEE Network Operations and Management Symposium-NOMS 2010, 112-119, 2010 | 97 | 2010 |
MEAD: support for Real‐Time Fault‐Tolerant CORBA P Narasimhan, TA Dumitraş, AM Paulos, SM Pertet, CF Reverte, ... Concurrency and Computation: Practice and Experience 17 (12), 1527-1545, 2005 | 95 | 2005 |
Tiresias: Black-box failure prediction in distributed systems AW Williams, SM Pertet, P Narasimhan 2007 IEEE International Parallel and Distributed Processing Symposium, 1-8, 2007 | 92 | 2007 |
Draco: Statistical diagnosis of chronic problems in large distributed systems SP Kavulya, S Daniels, K Joshi, M Hiltunen, R Gandhi, P Narasimhan IEEE/IFIP International Conference on Dependable Systems and Networks (DSN …, 2012 | 80 | 2012 |
Optimizing skewed joins in big data SP Kavulya, MR Alton, A Shahbazi, T Lisonbee US Patent App. 14/757,748, 2017 | 79 | 2017 |
Visual, log-based causal tracing for performance debugging of mapreduce systems J Tan, S Kavulya, R Gandhi, P Narasimhan 2010 IEEE 30th International Conference on Distributed Computing Systems …, 2010 | 79 | 2010 |
Performance Troubleshooting in Data Centers: An Annotated Bibliography C Wang, SP Kavulya, J Tan, L Hu, M Kutare, M Kasick, K Schwan, ... ACM SIGOPS Operating Systems Review 47 (3), 50-62, 2013 | 53 | 2013 |
Failure Diagnosis of Complex Systems SP Kavulya, K Joshi, F Di Giandomenico, P Narasimhan Resilience Assessment and Evaluation of Computing Systems, 239-261, 2012 | 53 | 2012 |
Diagnosis in automotive systems: A survey PE Lanigan, S Kavulya, P Narasimhan, TE Fuhrman, MA Salman Tech. Rep. CMU-PDL-11-110, Carnegie Mellon University Parallel Data Lab, 2011 | 53 | 2011 |
Proactive recovery in distributed corba applications S Pertet, P Narasimhan International Conference on Dependable Systems and Networks, 2004, 357-366, 2004 | 44 | 2004 |
Theia: visual signatures for problem diagnosis in large hadoop clusters E Garduno, SP Kavulya, J Tan, R Gandhi, P Narasimhan Proceedings of the 26th international conference on Large Installation …, 2012 | 37 | 2012 |
Fingerpointing correlated failures in replicated systems S Pertet, R Gandhi, P Narasimhan Proceedings of the 2nd USENIX workshop on Tackling computer systems problems …, 2007 | 29 | 2007 |
Experiences with fault-injection in a Byzantine fault-tolerant protocol R Martins, R Gandhi, P Narasimhan, S Pertet, A Casimiro, D Kreutz, ... ACM/IFIP/USENIX International Conference on Distributed Systems Platforms …, 2013 | 25 | 2013 |
Lightweight Black-box Failure Detection for Distributed Systems J Tan, S Kavulya, R Gandhi, P Narasimhan Proceedings of the 2012 workshop on Management of big data systems, 13-18, 2012 | 24 | 2012 |
Gumshoe: Diagnosing performance problems in replicated file-systems S Kavulya, R Gandhi, P Narasimhan Reliable Distributed Systems, 2008. SRDS'08. IEEE Symposium on, 137-146, 2008 | 24* | 2008 |