pve基于bcache 加速 ceph osd 实践

A、pve 关于bcache的BUG现状

# pve7到目前为止,不能正常的通过web gui 创建基于bcache的设备作为osd存储设备,如下是bug完成状态记录
    https://bugzilla.proxmox.com/show_bug.cgi?id=4679

# 创建osd(不行,提示没有设备)
    pveceph osd create /dev/bcache0
    pveceph osd create /dev/bcache1
    pveceph osd create /dev/bcache2

root@2ceph01:~# pveceph osd create /dev/bcache0
unable to get device info for '/dev/bcache0'

# 这篇文章说了,ceph-volume lvm prepare 这样不行,可以用 ceph-volume lvm create 一步到位
    https://forum.proxmox.com/threads/pve-gui-doesnt-recognize-kernel-bcache-device.109761/

B、pve安装bcache-tools

# 1、安装 bcache 管理工具,
    apt install bcache-tools

# 2、查看内核
    modprobe bcache
    lsmod | grep bcache             #当然创建了bcache设备后,开机会自动加载bcache内核模块

# 3、移植suse linux 的 bcache 、bcache-status 工具
    wget https://qiniu.wsfnk.com/bokefiles/bcache -P /usr/sbin/  ; chmod +x /usr/sbin/bcache
    wget https://qiniu.wsfnk.com/bokefiles/bcache-status -P /usr/sbin/ ; chmod +x /usr/sbin/bcache-status

    # 注意,bcache 是二进制程序,bcache-status 是纯python3脚本,若是bcache无法正常运行,需要用 ldd 检查运行库依赖。
    root@3-ceph15-1:~# ldd /usr/sbin/bcache
        linux-vdso.so.1 (0x00007ffea1743000)
        libblkid.so.1 => /lib/x86_64-linux-gnu/libblkid.so.1 (0x00007fd2e881e000)
        libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007fd2e8815000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd2e8641000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd2e8887000)

推荐文档

https://support.huawei.com/enterprise/zh/doc/EDOC1100284237/c067e9ff
https://support.huawei.com/enterprise/zh/doc/EDOC1100180377/5510fed8

I、pve上使用bcache的实践经验

1、硬件准备
    1、一定要保证充足的内存,bcache 对内存的消耗很大。
    2、cache device 一定要带超级电容,且要求存储一定数据后,不掉速

1、硬件规划,ceph节点上有“1个1.92TiB 带超级电容的企业级SSD”、“3个普通的10T HDD”
    # 将1.92T的SSD ,划分成4个分区,
    其中3个分区,每个分区200G,用于作为后端普通HDD的cache device;
    最后一个分区1.2TiB,用于存放osd的db和wal
    parted -s /dev/vde mklabel gpt

2、一键创建backing 和 cache,并指定 block 为4KB(该值匹配backing device 的扇区大小,HDD一般是512B,SSD一般是4KB),bucket 为2MB(该值匹配cache device 的erase block size)
    # 如果是一个4k sector的HDD和一个擦除block大小是2MB的SSD搭配
    #make-bcache --wipe-bcache --writeback --block 4K --bucket 2M -C /dev/vde -B /dev/vdb  /dev/vdc /dev/vdd
    make-bcache --wipe-bcache --writeback --block 4K --bucket 2M -C /dev/vde1 -B /dev/vdb
    make-bcache --wipe-bcache --writeback --block 4K --bucket 2M -C /dev/vde2 -B /dev/vdc
    make-bcache --wipe-bcache --writeback --block 4K --bucket 2M -C /dev/vde3 -B /dev/vdd

# ceph-volume lvm create --bluestore 命令可用参数
    --data /dev/bcache0         # 指定osd 数据存放路径
    --block.db /dev/nvme0n1     # 指定osd db 和 wal 存放路径
    --block.db-size=10G         # 指定db的大小
    --osd-id 8                  # 指定osd-id 创建osd
    --crush-device-class hdd    # 指定osd类型创建osd

# 要做pve环境下,用这个命令,必须先用pveceph 创建一个osd,添加下ceph的keying ,然后您销毁该osd
    # 或者 ln -s /var/lib/ceph/bootstrap-osd/ceph.keyring /etc/pve/priv/ceph.client.bootstrap-osd.keyring  执行这个应该也可以

# 需要注意的是,使用了bcache 后,需要注意的点
    # 1、创建osd时您最好手动指定class type,不然都会设置成ssd
    # 2、再怎么节约都要在 高速磁盘上 划分一个用于DB、WAL的专用空间,防止 bcache的错误策略导致,性能降低
    # ceph-volume lvm create --bluestore --data /dev/bcache0    这种使用bcache 而不使用 独立非 bcache 的 db、wal的 坚决不推荐
    ceph-volume lvm create --bluestore --data /dev/bcache0 --block.db /dev/nvme0n1

    ceph-volume lvm create --bluestore --data /dev/bcache0 --block.db /dev/nvme0n1 --block.db-size=10G
    ceph-volume lvm create --bluestore --data /dev/bcache0 --osd-id 8 --crush-device-class hdd --block.db /dev/nvme0n1 --block.db-size=10G

    ceph-volume lvm activate --bluestore 0 2f6b54af-aec8-414e-a231-3cce47249463     这是单独激活的命令

3、创建osd(使用bcache的osd默认class type为ssd)
    ceph-volume lvm create --bluestore --data /dev/bcache0 --block.db /dev/vde4 --block.db-size=30G
    ceph-volume lvm create --bluestore --data /dev/bcache1 --block.db /dev/vde4 --block.db-size=30G
    ceph-volume lvm create --bluestore --data /dev/bcache2 --block.db /dev/vde4 --block.db-size=30G

4、优化参数
    echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
    echo 0 > /sys/block/bcache1/bcache/sequential_cutoff
    echo 0 > /sys/block/bcache2/bcache/sequential_cutoff

M、CEPH如何停用Bcache

    1、停止OSD
    2、解绑Bcache,不解绑当你再次创建osd时,可能会报错 --> RuntimeError: Device /dev/bcache0 has a filesystem.
    3、观察Bcache缓存回写完为0时启用OSD

N、使用bcache与未使用bcache的性能对比

# 测试指令
    rados bench -p volumes 60 write --no-cleanup
    #测试完 清空脏数据
    rados purge volumes --yes-i-really-really-mean-it

三个ceph盘,没有db                三个ceph盘,有db                    三个ceph盘,有bcache,有db,wirteback 三个ceph盘,有bcache,有db,wirteback优化了顺序io
# 第一次测试数据
Total time run:         61.2611   Total time run:         61.5114     Total time run:         60.6557     Total time run:         60.7928
Total writes made:      498       Total writes made:      603         Total writes made:      905         Total writes made:      1254
Write size:             4194304   Write size:             4194304     Write size:             4194304     Write size:             4194304
Object size:            4194304   Object size:            4194304     Object size:            4194304     Object size:            4194304
Bandwidth (MB/sec):     32.5165   Bandwidth (MB/sec):     39.2123     Bandwidth (MB/sec):     59.6811     Bandwidth (MB/sec):     82.5097
Stddev Bandwidth:       10.978    Stddev Bandwidth:       8.70601     Stddev Bandwidth:       19.7928     Stddev Bandwidth:       20.6992
Max bandwidth (MB/sec): 56        Max bandwidth (MB/sec): 56          Max bandwidth (MB/sec): 100         Max bandwidth (MB/sec): 124
Min bandwidth (MB/sec): 0         Min bandwidth (MB/sec): 16          Min bandwidth (MB/sec): 0           Min bandwidth (MB/sec): 28
Average IOPS:           8         Average IOPS:           9           Average IOPS:           14          Average IOPS:           20
Stddev IOPS:            2.74449   Stddev IOPS:            2.1765      Stddev IOPS:            4.94821     Stddev IOPS:            5.1748
Max IOPS:               14        Max IOPS:               14          Max IOPS:               25          Max IOPS:               31
Min IOPS:               0         Min IOPS:               4           Min IOPS:               0           Min IOPS:               7
Average Latency(s):     1.94895   Average Latency(s):     1.61558     Average Latency(s):     1.06966     Average Latency(s):     0.774093
Stddev Latency(s):      1.16162   Stddev Latency(s):      1.09933     Stddev Latency(s):      0.370768    Stddev Latency(s):      0.321407
Max latency(s):         6.37118   Max latency(s):         6.87208     Max latency(s):         2.91389     Max latency(s):         1.88844
Min latency(s):         0.275261  Min latency(s):         0.204856    Min latency(s):         0.277915    Min latency(s):         0.0656808

# 第二次测试数据
Total time run:         61.3131   Total time run:         61.1041     Total time run:         60.551      Total time run:         60.2961
Total writes made:      554       Total writes made:      646         Total writes made:      721         Total writes made:      1248
Write size:             4194304   Write size:             4194304     Write size:             4194304     Write size:             4194304
Object size:            4194304   Object size:            4194304     Object size:            4194304     Object size:            4194304
Bandwidth (MB/sec):     36.1424   Bandwidth (MB/sec):     42.2885     Bandwidth (MB/sec):     47.6293     Bandwidth (MB/sec):     82.7914
Stddev Bandwidth:       10.9664   Stddev Bandwidth:       10.4169     Stddev Bandwidth:       17.8813     Stddev Bandwidth:       18.4918
Max bandwidth (MB/sec): 56        Max bandwidth (MB/sec): 64          Max bandwidth (MB/sec): 84          Max bandwidth (MB/sec): 132
Min bandwidth (MB/sec): 4         Min bandwidth (MB/sec): 8           Min bandwidth (MB/sec): 0           Min bandwidth (MB/sec): 36
Average IOPS:           9         Average IOPS:           10          Average IOPS:           11          Average IOPS:           20
Stddev IOPS:            2.7416    Stddev IOPS:            2.60422     Stddev IOPS:            4.47668     Stddev IOPS:            4.62296
Max IOPS:               14        Max IOPS:               16          Max IOPS:               21          Max IOPS:               33
Min IOPS:               1         Min IOPS:               2           Min IOPS:               0           Min IOPS:               9
Average Latency(s):     1.75992   Average Latency(s):     1.50196     Average Latency(s):     1.34148     Average Latency(s):     0.771946
Stddev Latency(s):      1.02127   Stddev Latency(s):      0.873313    Stddev Latency(s):      0.527519    Stddev Latency(s):      0.297788
Max latency(s):         7.92322   Max latency(s):         5.53811     Max latency(s):         3.18072     Max latency(s):         1.60706
Min latency(s):         0.286233  Min latency(s):         0.233545    Min latency(s):         0.230318    Min latency(s):         0.0546931
声明:本文为原创,作者为 辣条①号,转载时请保留本声明及附带文章链接:https://boke.wsfnk.com/archives/1161.html
谢谢你请我吃辣条谢谢你请我吃辣条

如果文章对你有帮助,欢迎点击上方按钮打赏作者

最后编辑于:2023/7/4作者: 辣条①号

现在在做什么? 接下来打算做什么? 你的目标什么? 期限还有多少? 进度如何? 不负遇见,不谈亏欠!

暂无评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

arrow grin ! ? cool roll eek evil razz mrgreen smile oops lol mad twisted wink idea cry shock neutral sad ???

文章目录