Possible data damage 1 pg recovery_unfound
参考:
https://ceph.com/geen-categorie/ceph-manually-repair-object/
查看ceph状态
[root@node-1 ~]# ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 2.9b is active+clean+scrubbing+deep+inconsistent+repair, acting [38,41,13]
1.修复pg
一般情况采用 ceph pg [pgid],但是经过观察,并不能解决。
[root@node-1 ~]# ceph pg repair 2.9b
2.查询pg使用的主机osd
[root@node-1 ~]# ceph pg 2.9b query|grep primary
"same_primary_since": 29426,
"num_objects_missing_on_primary": 0,
"up_primary": 38,
"acting_primary": 38,
"same_primary_since": 29426,
"num_objects_missing_on_primary": 0,
"up_primary": 38,
"acting_primary": 38,
"same_primary_since": 29426,
"num_objects_missing_on_primary": 0,
"up_primary": 38,
"acting_primary": 38,
3.查找osd对应的物理节点host
[root@node-1 ~]# ceph osd find 38
{
"osd": 38,
"addrs": {
"addrvec": [
{
"type": "v2",
"addr": "10.1.2.10:6816",
"nonce": 3635776
},
{
"type": "v1",
"addr": "10.1.2.10:6817",
"nonce": 3635776
}
]
},
"osd_fsid": "1d3dd4ee-1303-4709-9fa0-e8120a930d0f",
"host": "node-10",
"crush_location": {
"host": "node-10",
"root": "default"
}
}
4.停掉osd
[root@node-10 ~]# systemctl stop ceph-osd@38.service
5.刷入日志(echo $?,如果不等于0先不用管,尝试启动osd看一下)
[root@node-10 ~]# ceph-osd -i 38 --flush-journal
6.启动osd
[root@node-10 ~]# systemctl start ceph-osd@38.service
7.查看pg所在osd
[root@node-1 ~]# ceph health detail
HEALTH_OK
[root@node-1 ~]# ceph pg ls|grep 2.9b
2.9b 4164 0 0 0 15635105244 0 0 3005 active+clean 78m 29447'25665617 29447:26608464 [38,41,13]p38 [38,41,13]p38 2022-08-08 11:32:30.636330 2022-08-08 11:32:30.636330
可以确认集群恢复OK。PG还是在 osd.38上。