Optimized Systems for Big Data: Workload analysis for applying NVRAM components (2012 - 2013)
Research Intern, September 2012 ~ June 2013
Overview
Research on evaluating the impact of using NVRAM components on systems for big data analytics. Built a research prototype workload characterization module which extracts condensed application I/O models on the fly to solve the difficulties of extracting large trace data from distributed data intensive applications such as Hadoop. Hooked system calls such as open / close, read/write and lseek system calls per I/O thread to extract multiple Markov models from runtime traces with less than 5% performance degradation of the host program. The condensed Markov models were used to reproduce the application behavior. Total 10k LoC C code and 2.1k LoC Python code developed to demonstrate the idea.
Technology
LD_PRELOAD
based dynamic I/O function call intercept- Markov models
- C and Python
