cfaed Publications

Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCs

Reference

Carlos Escuin, Asif Ali Khan, Pablo Ibáñez-Marín, Teresa Monreal, Jeronimo Castrillon, Víctor Viñals-Yúfera, "Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCs", In Proceeding: the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA'23), IEEE Computer Society, pp. 179–192, Los Alamitos, CA, USA, Mar 2023. [doi]

Abstract

Emerging non-volatile memory (NVM) technologies can potentially replace large SRAM memories such as the last-level cache (LLC). However, despite recent advances, NVMs suffer from higher write latency and limited write endurance. Recently, NVM-SRAM hybrid LLCs are proposed to combine the best of both worlds. Several policies have been proposed to improve the performance and lifetime of hybrid LLCs by intelligently steering the incoming LLC blocks into either the SRAM or NVM part, regarding the cache behavior of the LLC blocks and the SRAM/NVM device properties. However, these policies neither consider compressing the contents of the cache block nor using partially worn-out NVM cache blocks.This paper proposes new insertion policies for byte-level fault-tolerant hybrid LLCs that collaboratively optimize for lifetime and performance. Specifically, we leverage data compression to utilize partially defective NVM cache entries, thereby improving the LLC hit rate. The key to our approach is to guide the insertion policy by both the reuse properties of the block and the size resulting from its compression. A block is inserted in NVM only if it is a read-reuse block or its compressed size is lower than a threshold. It will be inserted in SRAM if the block is a write-reuse or its compressed size is greater than the threshold. We use set-dueling to tune the compression threshold at runtime. This compression threshold provides a knob to control the NVM write rate and, together with a rule-based mechanism, allows balancing performance and lifetime.Overall, our evaluation shows that, with affordable hardware overheads, the proposed schemes can nearly reach the performance of an SRAM cache with the same associativity while improving lifetime by 17x compared to a hybrid NVM-unaware LLC. Our proposed scheme outperforms the state-of-the-art insertion policies by 9% while achieving a comparative lifetime. The rule-based mechanism shows that by compromising, for instance, 1.1% and 1.9% performance, the NVM lifetime can be further increased by 28% and 44%, respectively.

Bibtex

@InProceedings{escuin_hpca23,
author = {Carlos Escuin and Asif Ali Khan and Pablo Ibáñez-Marín and Teresa Monreal and Jeronimo Castrillon and Víctor Viñals-Yúfera},
booktitle = {the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA'23)},
title = {Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCs},
organization = {IEEE},
pages = {179--192},
abstract = {Emerging non-volatile memory (NVM) technologies can potentially replace large SRAM memories such as the last-level cache (LLC). However, despite recent advances, NVMs suffer from higher write latency and limited write endurance. Recently, NVM-SRAM hybrid LLCs are proposed to combine the best of both worlds. Several policies have been proposed to improve the performance and lifetime of hybrid LLCs by intelligently steering the incoming LLC blocks into either the SRAM or NVM part, regarding the cache behavior of the LLC blocks and the SRAM/NVM device properties. However, these policies neither consider compressing the contents of the cache block nor using partially worn-out NVM cache blocks.This paper proposes new insertion policies for byte-level fault-tolerant hybrid LLCs that collaboratively optimize for lifetime and performance. Specifically, we leverage data compression to utilize partially defective NVM cache entries, thereby improving the LLC hit rate. The key to our approach is to guide the insertion policy by both the reuse properties of the block and the size resulting from its compression. A block is inserted in NVM only if it is a read-reuse block or its compressed size is lower than a threshold. It will be inserted in SRAM if the block is a write-reuse or its compressed size is greater than the threshold. We use set-dueling to tune the compression threshold at runtime. This compression threshold provides a knob to control the NVM write rate and, together with a rule-based mechanism, allows balancing performance and lifetime.Overall, our evaluation shows that, with affordable hardware overheads, the proposed schemes can nearly reach the performance of an SRAM cache with the same associativity while improving lifetime by 17x compared to a hybrid NVM-unaware LLC. Our proposed scheme outperforms the state-of-the-art insertion policies by 9\% while achieving a comparative lifetime. The rule-based mechanism shows that by compromising, for instance, 1.1\% and 1.9\% performance, the NVM lifetime can be further increased by 28\% and 44\%, respectively.},
doi = {10.1109/HPCA56546.2023.10070968},
url = {https://doi.ieeecomputersociety.org/10.1109/HPCA56546.2023.10070968},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = mar,
year = {2023},
}

cfaed Publications

Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCs

Reference

Abstract

Bibtex

Downloads

Related Paths

Permalink