THE HYBRID PROCESS AND DATA MONITORING TOOLS FOR HIGH PERFORMANCE COMPUTING SYSTEMS
Volume 3 (1), June 2020, Pages 139-146
Due to the growing demand for parallel programs and mathematical and industrial problems that require high performance computing, cluster systems are one of the essential parts of the computing world. Clusters have different features, and there are multiple topics can be studied while working in these systems. In this paper, the related issues about visualization procedure in clusters will be discussed, analyzed, and researched. There are quite similar visualization manners in a cluster environment, like the visualization of hardware usage, process flow, data flow, environmental data, etc. And the majority of contemporary monitoring tools were prepared with a particular target; for example, Nagios was developed to mainly monitor network and devices related to the work of the network. Hence, it is almost impossible to find a unique tool that can be used for most of the purposes mentioned above. This paper will mention the possibilities of having unique monitoring tools for most cluster systems.
High performance computing, Clusters, Visualization tools, Daemons, Monitoring
Adhianto, L., Banerjee, S., Fagan, M., et al. (2010). HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6), 685-701.
Bautista, E., Romanus, M., Davis, T., Whitney, C., & Kubaska, T. (2019, August). Collecting, monitoring, and analyzing facility and systems data at the national energy research scientific computing center. In Proceedings of the 48th International Conference on Parallel Processing: Workshops (pp. 1-9).
Burgess, C. (2005). The Nagios Book. http://www.xmarks.com/site/www.nagiosbook.org/PRERELEASE_The_Nagios_Book.pdf (07 November 2012 )
Massie, M. L., Chun, B. N., & Culler, D. E. (2004). The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 30(7), 817-840.
Massie, M., Li, B., Nicholes, B., Vuksan, V., et al. (2012). Monitoring with Ganglia: tracking dynamic host and application metrics at scale. “ O’Reilly Media, Inc.”.
Montaldo, D., Mocskos, E., & Slezak, D. F. (2009) Clover: Efficient Monitoring of HPC Clusters.
Sacerdoti, F. D., Katz, M. J., Massie, M. L., & Culler, D. E. (2003, December). Wide area cluster monitoring with ganglia. In 2003 Proceedings IEEE International Conference on Cluster Computing (p. 289). IEEE
Sukhija, N., & Bautista, E. (2019, August). Towards a Framework for Monitoring and Analyzing High Performance Computing Environments Using Kubernetes and Prometheus. In 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (pp. 257-262). IEEE.
Yoo, A. B., Jette, M. A., & Grondona, M. (2003, June). Slurm: Simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing (pp. 44-60). Springer, Berlin, Heidelberg.