苗宇豪,范中磊,张墨翟,等.基于Ceph存储的数据均衡分布算法[J]. 微电子学与计算机,2024,41(3):90-97. doi: 10.19304/J.ISSN1000-7180.2023.0180
引用本文: 苗宇豪,范中磊,张墨翟,等.基于Ceph存储的数据均衡分布算法[J]. 微电子学与计算机,2024,41(3):90-97. doi: 10.19304/J.ISSN1000-7180.2023.0180
MIAO Y H,FAN Z L,ZHANG M D,et al. A data balanced distribution algorithm based on Ceph storage[J]. Microelectronics & Computer,2024,41(3):90-97. doi: 10.19304/J.ISSN1000-7180.2023.0180
Citation: MIAO Y H,FAN Z L,ZHANG M D,et al. A data balanced distribution algorithm based on Ceph storage[J]. Microelectronics & Computer,2024,41(3):90-97. doi: 10.19304/J.ISSN1000-7180.2023.0180

基于Ceph存储的数据均衡分布算法

A data balanced distribution algorithm based on Ceph storage

  • 摘要: 针对Ceph分布式存储系统中可扩展哈希下的受控复制(Controlled Replication Under Scalable Hashing, CRUSH)数据分布算法导致设备间存储数据容量之差达到40%,进而在数据量大、高并发情况下“热点”成为系统性能瓶颈的问题,本文对CRUSH算法进行深入研究,设计并实现了Writing_Balance算法来对数据分布进行性能优化,以达到消除“热点”所导致的负载失衡以及磁盘利用率过高的问题。通过实验发现,Writing_Balance算法可使“热点”的PG数量分布优化率较之前提升4.4%;磁盘利用率稳定性提高了3%左右;并且在较小输入key空间下对于数据整体均衡度优化也有明显的提升。

     

    Abstract: The Controlled Replication Under Scalable Hashing(CRUSH) data distribution algorithm in Ceph distributed storage system causes the difference of storage data capacity between devices to reach 40%, and the so-called "hot spot" becomes the bottleneck of system performance in the case of large data volume and high concurrency. In this paper, CRUSH algorithm is deeply studied, and Writing is designed and implemented Writing_Balance algorithm is used to optimize the performance of data distribution to eliminate the load imbalance caused by "hot spotst" and the high disk utilization. Writing_Balance algorithm is found through experiments ,which can optimize the PG quantity distribution of "hot spotst" to 4.4% compared with storage system that do not use Writing_Balance algorithm. The stability of disk utilization has been improved by about 3% and the overall data balance optimization has also been significantly improved in a small input key space.

     

/

返回文章
返回