Publications
2024
- GhOST: a GPU out-of-order scheduling technique for stall reductionIshita Chaturvedi, Bhargav Reddy Godala, Yucan Wu, and 8 more authorsIn 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 2024
Graphics Processing Units (GPUs) use massive multi-threading coupled with static scheduling to hide instruction latencies. Despite this, memory instructions pose a challenge as their latencies vary throughout the application’s execution, leading to stalls. Out-of-order (OoO) execution has been shown to effectively mitigate these types of stalls. However, prior OoO proposals involve costly techniques such as reordering loads and stores, register renaming, or two-phase execution, amplifying implementation overhead and consequently creating a substantial barrier to adoption in GPUs. This paper introduces GhOST, a minimal yet effective OoO technique for GPUs. Without expensive components, GhOST can manifest a substantial portion of the instruction reorderings found in an idealized OoO GPU. GhOST leverages the decode stage’s existing pool of decoded instructions and the existing issue stage’s information about instructions in the pipeline to select instructions for OoO execution with little additional hardware. A comprehensive evaluation of GhOST and the prior state-of-the-art OoO technique across a range of diverse GPU benchmarks yields two surprising insights: (1) Prior works utilized Nvidia’s intermediate representation PTX for evaluation; however, the optimized static instruction scheduling of the final binary form negates many purported improvements from OoO execution; and (2) The prior state-of-the-art OoO technique results in an average slowdown across this set of benchmarks. In contrast, GhOST achieves a 36% maximum and 6.9% geometric mean speedup on GPU binaries with only a 0.007% area increase, surpassing previous techniques without slowing down any of the measured benchmarks.
2023
- EMISSARY: Enhanced Miss Awareness Replacement Policy for L2 Instruction CachingNayana Prasad Nagendra, Bhargav Reddy Godala, Ishita Chaturvedi, and 7 more authorsIn Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
For decades, architects have designed cache replacement policies to reduce cache misses. Since not all cache misses affect processor performance equally, researchers have also proposed cache replacement policies focused on reducing the total miss cost rather than the total miss count. However, all prior cost-aware replacement policies have been proposed specifically for data caching and are either inappropriate or unnecessarily complex for instruction caching. This paper presents EMISSARY, the first cost-aware cache replacement family of policies specifically designed for instruction caching. Observing that modern architectures entirely tolerate many instruction cache misses, EMISSARY resists evicting those cache lines whose misses cause costly decode starvations. In the context of a modern processor with fetch-directed instruction prefetching and other aggressive front-end features, EMISSARY applied to L2 cache instructions delivers an impressive 3.24% geomean speedup (up to 23.7%) and a geomean energy savings of 2.1% (up to 17.7%) when evaluated on widely used server applications with large code footprints. This speedup is 21.6% of the total speedup obtained by an unrealizable L2 cache with a zero-cycle miss latency for all capacity and conflict instruction misses.
2022
- NOELLE Offers Empowering LLVM ExtensionsAngelo Matni, Enrico Armenio Deiana, Yian Su, and 8 more authorsIn 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2022
Modern and emerging architectures demand increasingly complex compiler analyses and transformations. As the emphasis on compiler infrastructure moves beyond support for peephole optimizations and the extraction of instruction-level parallelism, compilers should support custom tools designed to meet these demands with higher-level analysis-powered abstractions and functionalities of wider program scope. This paper introduces NOELLE, a robust open-source domain-independent compilation layer built upon LLVM providing this support. NOELLE extends abstractions and functionalities provided by LLVM enabling advanced, program-wide code analyses and transformations. This paper shows the power of NOELLE by presenting a diverse set of 11 custom tools built upon it.
2019
- Role of Y 3+ ion substitution in the enhanced electrical properties of Ba (0.9- x) Y x Ca 0.1 Zr 0.07 Ti 0.93 O 3 lead-free piezoceramicsShreya Mittal, Ishita Chaturvedi, and K Chandramani SinghJournal of Materials Science: Materials in Electronics, 2019
Ba(0.9−x)YxCa0.1Zr0.07Ti0.93O3 lead free piezoelectric ceramics were synthesized for x=0–0.035 in the steps of 0.005. The lead free piezoceramics are gaining importance as there is an urgent demand for replacing the highly toxic PZT family piezoceramics. Ba0.9Ca0.1Zr0.07Ti0.93O3 (BCZT) is one such system. It shows some high electrical properties but suffers with a low Curie temperature, which restricts its usage in low temperature range only. In the present study, the Curie temperature has been improved by 9 °C. A polymorphic phase transition (PPT) has also been observed around x=0.015, consisting of orthorhombic and tetragonal phases, which provides the polarization vector more number of favorable directions. As a result the remnant polarization (Pr), piezoelectric charge coefficient (d33) and electromechanical coupling coefficient (kp) attain their maximum values of 5.21 μC/cm2, 200 pC/N and 24.78% respectively for x=0.015. This increase in transition temperature and other electrical properties makes BCZT a potential candidate for a lead free piezoelectric system.