ceph常见故障之（OSD无法启动）

辣条①号 2023年 6月 25日 CEPH故障类 484 次浏览 0

【腾讯云】30M海外服务器99/年续费同价强烈推荐 https://cloud.tencent.com/act/cps/redirect?redirect=2446&cps_key=752564b15c36b5558a79fd5cbaf8e0bf&from=console

文章目录

系统重启后，因lvm信息丢失 OSD无法启动

其他人也遇到过该问题请点击该处查看

手动调试，启动osd（失败）

root@node16072:~# /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
2023-05-04T21:39:39.250+0800 7f8d9aae2240 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2023-05-04T21:39:39.250+0800 7f8d9aae2240 -1 AuthRegistry(0x56400f1ac140) no keyring found at /var/lib/ceph/osd/ceph-0/keyring, disabling cephx
2023-05-04T21:39:39.250+0800 7f8d9aae2240 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2023-05-04T21:39:39.250+0800 7f8d9aae2240 -1 AuthRegistry(0x7ffe9af34500) no keyring found at /var/lib/ceph/osd/ceph-0/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)

查看系统日志

root@node16072:~# dmesg -T |grep ceph 
[Thu May  4 21:32:58 2023] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[Thu May  4 21:32:58 2023] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[Thu May  4 21:32:58 2023] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[Thu May  4 21:32:58 2023] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[Thu May  4 21:32:58 2023] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[Thu May  4 21:32:58 2023] systemd[1]: remote-fs-pre.target: Found dependency on ceph.target/start
[Thu May  4 21:32:58 2023] systemd[1]: remote-fs-pre.target: Found dependency on ceph-mds.target/start
[Thu May  4 21:32:58 2023] systemd[1]: remote-fs-pre.target: Found dependency on ceph-mon.target/start
[Thu May  4 21:32:58 2023] systemd[1]: remote-fs-pre.target: Found dependency on ceph-mon@node16072.service/start
[Thu May  4 21:32:58 2023] systemd[1]: remote-fs-pre.target: Found ordering cycle on ceph-mon@node16072.service/start
[Thu May  4 21:32:58 2023] systemd[1]: remote-fs-pre.target: Job ceph-mon@node16072.service/start deleted to break ordering cycle starting with remote-fs-pre.target/start
[Thu May  4 21:32:58 2023] systemd[1]: ceph-mgr@node16072.service: Found ordering cycle on pve-cluster.service/start
[Thu May  4 21:32:58 2023] systemd[1]: ceph-mgr@node16072.service: Found dependency on rrdcached.service/start
[Thu May  4 21:32:58 2023] systemd[1]: ceph-mgr@node16072.service: Found dependency on remote-fs.target/start
[Thu May  4 21:32:58 2023] systemd[1]: ceph-mgr@node16072.service: Found dependency on remote-fs-pre.target/start
[Thu May  4 21:32:58 2023] systemd[1]: ceph-mgr@node16072.service: Found dependency on ceph-mgr@node16072.service/start
[Thu May  4 21:32:58 2023] systemd[1]: ceph-mgr@node16072.service: Job pve-cluster.service/start deleted to break ordering cycle starting with ceph-mgr@node16072.service/start
[Thu May  4 21:32:58 2023] systemd[1]: Created slice system-ceph\x2dmgr.slice.
[Thu May  4 21:32:58 2023] systemd[1]: Created slice system-ceph\x2dmon.slice.
[Thu May  4 21:32:58 2023] systemd[1]: Created slice system-ceph\x2dvolume.slice.
[Thu May  4 21:32:58 2023] systemd[1]: Reached target ceph target allowing to start/stop all ceph-fuse@.service instances at once.
[Thu May  4 21:32:58 2023] systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
[Thu May  4 21:32:58 2023] systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once.
[Thu May  4 21:32:58 2023] systemd[1]: Reached target ceph target allowing to start/stop all ceph-osd@.service instances at once.

可能的原因

# 1、osd在创建的时候，lvm2-lvmetad.service 和 lvm2-lvmetad.socket  一定要是运行的，不然机器重启后，可能会导致磁盘上lvm信息丢失

# 2、最好的办法就是 systemctl enable lvm2-lvmetad.service 然后重启机器
    systemctl start lvm2-lvmetad.service
    systemctl enable lvm2-lvmetad.service

# 3、若是 在创建osd时 lvm2-lvmetad 未运行，那么osd的日志中 会报 WARNING。

解决办法

# 解决办法
    systemctl start ceph-volume@lvm-1-fb045fd1-ce5b-4503-a37e-1c63061058ab.service

    # 不行的话试试
    /usr/sbin/ceph-volume lvm trigger 1-fb045fd1-ce5b-4503-a37e-1c63061058ab

修改ceph.conf 中osd的配置后，无法启动osd，出现 start-limit-hit

报如下错误
    root@fuse01:~# systemctl status ceph-osd@3.service 
    ● ceph-osd@3.service - Ceph object storage daemon osd.3
        Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
        Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
                └─ceph-after-pve-cluster.conf
        Active: failed (Result: start-limit-hit) since Fri 2023-12-08 13:39:54 CST; 5min ago
        Process: 2686756 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 3 (code=exited, status=0/SUCCESS)
        Process: 2686760 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 3 --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
    Main PID: 2686760 (code=exited, status=0/SUCCESS)
            CPU: 23.794s

    Dec 08 13:41:11 fuse01 systemd[1]: Failed to start Ceph object storage daemon osd.3.
    Dec 08 13:41:33 fuse01 systemd[1]: ceph-osd@3.service: Start request repeated too quickly.
    Dec 08 13:41:33 fuse01 systemd[1]: ceph-osd@3.service: Failed with result 'start-limit-hit'.

解决办法
    root@fuse01:~# systemctl daemon-reload
    root@fuse01:~# systemctl restart ceph-osd@3
    root@fuse01:~# systemctl restart ceph-osd@2

声明：本文为原创，作者为辣条①号，转载时请保留本声明及附带文章链接：https://boke.wsfnk.com/archives/1157.html

赏

谢谢你请我吃辣条

如果文章对你有帮助，欢迎点击上方按钮打赏作者

ceph常见故障之（OSD无法启动）

系统重启后，因lvm信息丢失 OSD无法启动

其他人也遇到过该问题请点击该处查看

手动调试，启动osd（失败）

查看系统日志

可能的原因

解决办法

修改ceph.conf 中osd的配置后，无法启动osd，出现 start-limit-hit

最后编辑于：2023/12/8作者：辣条①号

暂无评论

发表回复取消回复

系统重启后，因lvm信息丢失 OSD无法启动

其他人也遇到过该问题 请点击该处查看

手动调试，启动osd（失败）

查看系统日志

可能的原因

解决办法

修改ceph.conf 中osd的配置后，无法启动osd，出现 start-limit-hit

最后编辑于：2023/12/8作者： 辣条①号

暂无评论

发表回复 取消回复

其他人也遇到过该问题请点击该处查看

最后编辑于：2023/12/8作者：辣条①号

发表回复取消回复