Optimized Systems for Big Data: Workload analysis for applying NVRAM components (2012 - 2013)

Sep 8, 2012 · 1 min read
Internship, IBM Austin Research Laboratory, USA
Research Intern, September 2012 ~ June 2013

Overview

Research on evaluating the impact of using NVRAM components on systems for big data analytics. Built a research prototype workload characterization module which extracts condensed application I/O models on the fly to solve the difficulties of extracting large trace data from distributed data intensive applications such as Hadoop. Hooked system calls such as open / close, read/write and lseek system calls per I/O thread to extract multiple Markov models from runtime traces with less than 5% performance degradation of the host program. The condensed Markov models were used to reproduce the application behavior. Total 10k LoC C code and 2.1k LoC Python code developed to demonstrate the idea.

Technology

  • LD_PRELOAD based dynamic I/O function call intercept
  • Markov models
  • C and Python