请教PE 2950服务器raid5 坏两块硬盘怎么处理?

PowerEdge服务器

PowerEdge服务器
加入对PowerEdge服务器和系统管理实践的讨论

请教PE 2950服务器raid5 坏两块硬盘怎么处理?

这个问题已经回答Eahua (Dell Technology)

环境: pe2950服务器   6块146G硬盘组成raid5阵列     win2003操作系统

故障:  1.  很久以前换过两块硬盘,现在又有一块硬盘亮黄灯,反复重启。

               2. 经检查确认, 2号槽硬盘状态是ready,3号号插槽硬盘是亮黄灯硬盘。由此我们判断raid5阵列中

 2号盘没有起作用,现在3号盘也坏了。相当于raid5破坏了。

             3. 我们的操作: 启机ctrl+r,确认2号盘无法rebuiding,用新盘更换。发现,自动开始rebuilding。

             4.之后,开机,进入os, 将3号盘用新盘更换。发现开始同步。 待3号盘同步完发现,系统D盘打不开,但C盘文件都正常。

本人想确认: 是否只要数据都已同步到2块新盘当中?  若已同步,D盘能否修复? 若未同步,能否退回到最原始状态?

验证的回答
  • 根据日志分析,目前服务器的阵列有比较大的问题,对于数据影响比较大,分析如下:

    目前服务器在收集日志的时候,六个硬盘、只发现(0、1、4)3个硬盘在阵列中,(2)硬盘是一个非认证的硬盘处在Ready并不在阵列中,(3)硬盘处于Foreign 掉线状态不在阵列中,(5)硬盘消失。

    T21: EVT#1110284-T21: 236=PD 02(e0x20/s2) is not a certified drive

    而且发现阵列中的所有硬盘都有穿孔错误,这种错误逻辑错误会导致数据丢失及无法访问。

    12/02/14 20:31:16: bbmMarkBadBlock: pd=03, pdLBA=7548b16
    12/02/14 20:31:16: EVT#1111953-12/02/14 20:31:16:  97=Puncturing bad block on PD 03(e0x20/s3) at 7548b16
    12/02/14 20:31:18: EVT#1111954-12/02/14 20:31:18: 113=Unexpected sense: PD 02(e0x20/s2), CDB: 28 00 07 55 34 80 00 00 80 00, Sense: f0 00 03 07 55 34 bb 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:18: DEV_REC:Medium Error DevId[2] Tgt 2 retires=0
    12/02/14 20:31:18: ErrLBAOffset (3b) LBA(7553480) BadLba=75534bb
    12/02/14 20:31:18: EVT#1111955-12/02/14 20:31:18: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 75534bb
    12/02/14 20:31:18: BBMProcessReadError: RECOVERY, pd=02, pdErrLba=75534bb - puncture source/target drives
    12/02/14 20:31:18: bbmMarkBadBlock: pd=02, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111956-12/02/14 20:31:18:  97=Puncturing bad block on PD 02(e0x20/s2) at 75534bb
    12/02/14 20:31:18: bbmMarkBadBlock: pd=03, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111957-12/02/14 20:31:18:  97=Puncturing bad block on PD 03(e0x20/s3) at 75534bb
    12/02/14 20:31:18: EVT#1111958-12/02/14 20:31:18: 113=Unexpected sense: PD 04(e0x20/s4), CDB: 28 00 07 55 34 80 00 00 80 00, Sense: f0 00 03 07 55 34 bb 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:18: DEV_REC:Medium Error DevId[4] Tgt 4 retires=0
    12/02/14 20:31:18: ErrLBAOffset (3b) LBA(7553480) BadLba=75534bb
    12/02/14 20:31:18: EVT#1111959-12/02/14 20:31:18: 111=Unrecoverable medium error during recovery on PD 04(e0x20/s4) at 75534bb
    12/02/14 20:31:18: BBMProcessReadError: RECOVERY, pd=04, pdErrLba=75534bb - puncture source/target drives
    12/02/14 20:31:18: bbmMarkBadBlock: pd=04, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111960-12/02/14 20:31:18:  97=Puncturing bad block on PD 04(e0x20/s4) at 75534bb
    12/02/14 20:31:18: bbmMarkBadBlock: pd=03, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111961-12/02/14 20:31:18:  97=Puncturing bad block on PD 03(e0x20/s3) at 75534bb
    12/02/14 20:31:18: EVT#1111962-12/02/14 20:31:18: 103=Rebuild progress on PD 03(e0x20/s3) is 43.09%(529s)
    12/02/14 20:31:25: EVT#1111963-12/02/14 20:31:25: 113=Unexpected sense: PD 02(e0x20/s2), CDB: 28 00 07 63 47 80 00 00 80 00, Sense: f0 00 03 07 63 47 c0 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:25: DEV_REC:Medium Error DevId[2] Tgt 2 retires=0
    12/02/14 20:31:25: ErrLBAOffset (40) LBA(7634780) BadLba=76347c0
    12/02/14 20:31:25: EVT#1111964-12/02/14 20:31:25: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 76347c0
    12/02/14 20:31:25: BBMProcessReadError: RECOVERY, pd=02, pdErrLba=76347c0 - puncture source/target drives
    12/02/14 20:31:25: bbmMarkBadBlock: pd=02, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111965-12/02/14 20:31:25:  97=Puncturing bad block on PD 02(e0x20/s2) at 76347c0
    12/02/14 20:31:25: bbmMarkBadBlock: pd=03, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111966-12/02/14 20:31:25:  97=Puncturing bad block on PD 03(e0x20/s3) at 76347c0
    12/02/14 20:31:25: EVT#1111967-12/02/14 20:31:25: 113=Unexpected sense: PD 04(e0x20/s4), CDB: 28 00 07 63 47 80 00 00 80 00, Sense: f0 00 03 07 63 47 c0 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:25: DEV_REC:Medium Error DevId[4] Tgt 4 retires=0
    12/02/14 20:31:25: ErrLBAOffset (40) LBA(7634780) BadLba=76347c0
    12/02/14 20:31:25: EVT#1111968-12/02/14 20:31:25: 111=Unrecoverable medium error during recovery on PD 04(e0x20/s4) at 76347c0
    12/02/14 20:31:25: BBMProcessReadError: RECOVERY, pd=04, pdErrLba=76347c0 - puncture source/target drives
    12/02/14 20:31:25: bbmMarkBadBlock: pd=04, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111969-12/02/14 20:31:25:  97=Puncturing bad block on PD 04(e0x20/s4) at 76347c0
    12/02/14 20:31:25: bbmMarkBadBlock: pd=03, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111970-12/02/14 20:31:25:  97=Puncturing bad block on PD 03(e0x20/s3) at 76347c0
    12/02/14 20:31:27: EVT#1111971-12/02/14 20:31:27: 113=Unexpected sense: PD 02(e0x20/s2), CDB: 28 00 07 63 ce 00 00 00 80 00, Sense: f0 00 03 07 63 ce 29 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:27: DEV_REC:Medium Error DevId[2] Tgt 2 retires=0
    12/02/14 20:31:27: ErrLBAOffset (29) LBA(763ce00) BadLba=763ce29
    12/02/14 20:31:27: EVT#1111972-12/02/14 20:31:27: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 763ce29
    12/02/14 20:31:27: BBMProcessReadError: RECOVERY, pd=02, pdErrLba=763ce29 - puncture source/target drives
    12/02/14 20:31:27: bbmMarkBadBlock: pd=02, pdLBA=763ce29
    12/02/14 20:31:27: EVT#1111973-12/02/14 20:31:27:  97=Puncturing bad block on PD 02(e0x20/s2) at 763ce29
    12/02/14 20:31:27: bbmMarkBadBlock: pd=03, pdLBA=763ce29
    12/02/14 20:31:27: EVT#1111974-12/02/14 20:31:27:  97=Puncturing bad block on PD 03(e0x20/s3) at 763ce29

    该服务器目前阵列情况比较严重,如果数据比较重要就直接找恢复公司吧,如果不重要或有备份就尝试在Raid BIOS中把2号硬盘强制上线,然后把3号硬盘导入到阵列中,看看能否恢复,因为有穿孔错误,数据可能仍然无法读取

    关于操作可以参考阵列卡的用户手册。

    http://downloads.dell.com/Manuals/all-products/esuprt_ser_stor_net/esuprt_dell_adapters/poweredge-rc-6e_User's%20Guide_zh-cn.pdf

    什么叫穿孔错误(Puncturing)参考以下:

    www.dell.com/.../EN

    如果要重新安装操作系统,切记一定要在阵列中把原来的Raid信息全部删掉,并在重建后要选择完全初始化,否则穿孔错误还是会存在。

所有回复
  • 目前这种情况首先要确认整个服务器阵列卡配置信息,如有几个阵列,创建了几个虚拟磁盘。

    先下载以下工具,在服务器运行一次,用于收集服务器的配置以供分析,请把文件上传到其它如百度之类的网盘供下载。

    http://ftp.us.dell.com/diags/Dell_DSET_2.1.0.113_A00.msi

    工具使用说明见以下:

    http://zh.community.dell.com/support_forums/poweredge/f/280/t/9607

  • 感谢回复。 系统1个阵列1个vd。

  • 有收日志么,需要分析一下阵列卡重建的过程。

  • 现在os文件坏了,c系统 进不去了。无法收日志了。

  • 这个情况,如果数据重要的话,你需要去找数据恢复中心去做了。

    如果你的数据有备份,下载以下ISO,用于启动,启动完成之后,如果能看到原来的数据,可以尝试着备份你需要的,也可以收集日志:

    zh.community.dell.com/.../2728

  • 好,我试一下。感谢。

  • 你好。现在可以确认c盘东西都在,只是D盘打不开。

    日志链接  :pan.baidu.com/.../1gd9ZZxx

  • 根据日志分析,目前服务器的阵列有比较大的问题,对于数据影响比较大,分析如下:

    目前服务器在收集日志的时候,六个硬盘、只发现(0、1、4)3个硬盘在阵列中,(2)硬盘是一个非认证的硬盘处在Ready并不在阵列中,(3)硬盘处于Foreign 掉线状态不在阵列中,(5)硬盘消失。

    T21: EVT#1110284-T21: 236=PD 02(e0x20/s2) is not a certified drive

    而且发现阵列中的所有硬盘都有穿孔错误,这种错误逻辑错误会导致数据丢失及无法访问。

    12/02/14 20:31:16: bbmMarkBadBlock: pd=03, pdLBA=7548b16
    12/02/14 20:31:16: EVT#1111953-12/02/14 20:31:16:  97=Puncturing bad block on PD 03(e0x20/s3) at 7548b16
    12/02/14 20:31:18: EVT#1111954-12/02/14 20:31:18: 113=Unexpected sense: PD 02(e0x20/s2), CDB: 28 00 07 55 34 80 00 00 80 00, Sense: f0 00 03 07 55 34 bb 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:18: DEV_REC:Medium Error DevId[2] Tgt 2 retires=0
    12/02/14 20:31:18: ErrLBAOffset (3b) LBA(7553480) BadLba=75534bb
    12/02/14 20:31:18: EVT#1111955-12/02/14 20:31:18: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 75534bb
    12/02/14 20:31:18: BBMProcessReadError: RECOVERY, pd=02, pdErrLba=75534bb - puncture source/target drives
    12/02/14 20:31:18: bbmMarkBadBlock: pd=02, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111956-12/02/14 20:31:18:  97=Puncturing bad block on PD 02(e0x20/s2) at 75534bb
    12/02/14 20:31:18: bbmMarkBadBlock: pd=03, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111957-12/02/14 20:31:18:  97=Puncturing bad block on PD 03(e0x20/s3) at 75534bb
    12/02/14 20:31:18: EVT#1111958-12/02/14 20:31:18: 113=Unexpected sense: PD 04(e0x20/s4), CDB: 28 00 07 55 34 80 00 00 80 00, Sense: f0 00 03 07 55 34 bb 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:18: DEV_REC:Medium Error DevId[4] Tgt 4 retires=0
    12/02/14 20:31:18: ErrLBAOffset (3b) LBA(7553480) BadLba=75534bb
    12/02/14 20:31:18: EVT#1111959-12/02/14 20:31:18: 111=Unrecoverable medium error during recovery on PD 04(e0x20/s4) at 75534bb
    12/02/14 20:31:18: BBMProcessReadError: RECOVERY, pd=04, pdErrLba=75534bb - puncture source/target drives
    12/02/14 20:31:18: bbmMarkBadBlock: pd=04, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111960-12/02/14 20:31:18:  97=Puncturing bad block on PD 04(e0x20/s4) at 75534bb
    12/02/14 20:31:18: bbmMarkBadBlock: pd=03, pdLBA=75534bb
    12/02/14 20:31:18: EVT#1111961-12/02/14 20:31:18:  97=Puncturing bad block on PD 03(e0x20/s3) at 75534bb
    12/02/14 20:31:18: EVT#1111962-12/02/14 20:31:18: 103=Rebuild progress on PD 03(e0x20/s3) is 43.09%(529s)
    12/02/14 20:31:25: EVT#1111963-12/02/14 20:31:25: 113=Unexpected sense: PD 02(e0x20/s2), CDB: 28 00 07 63 47 80 00 00 80 00, Sense: f0 00 03 07 63 47 c0 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:25: DEV_REC:Medium Error DevId[2] Tgt 2 retires=0
    12/02/14 20:31:25: ErrLBAOffset (40) LBA(7634780) BadLba=76347c0
    12/02/14 20:31:25: EVT#1111964-12/02/14 20:31:25: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 76347c0
    12/02/14 20:31:25: BBMProcessReadError: RECOVERY, pd=02, pdErrLba=76347c0 - puncture source/target drives
    12/02/14 20:31:25: bbmMarkBadBlock: pd=02, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111965-12/02/14 20:31:25:  97=Puncturing bad block on PD 02(e0x20/s2) at 76347c0
    12/02/14 20:31:25: bbmMarkBadBlock: pd=03, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111966-12/02/14 20:31:25:  97=Puncturing bad block on PD 03(e0x20/s3) at 76347c0
    12/02/14 20:31:25: EVT#1111967-12/02/14 20:31:25: 113=Unexpected sense: PD 04(e0x20/s4), CDB: 28 00 07 63 47 80 00 00 80 00, Sense: f0 00 03 07 63 47 c0 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:25: DEV_REC:Medium Error DevId[4] Tgt 4 retires=0
    12/02/14 20:31:25: ErrLBAOffset (40) LBA(7634780) BadLba=76347c0
    12/02/14 20:31:25: EVT#1111968-12/02/14 20:31:25: 111=Unrecoverable medium error during recovery on PD 04(e0x20/s4) at 76347c0
    12/02/14 20:31:25: BBMProcessReadError: RECOVERY, pd=04, pdErrLba=76347c0 - puncture source/target drives
    12/02/14 20:31:25: bbmMarkBadBlock: pd=04, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111969-12/02/14 20:31:25:  97=Puncturing bad block on PD 04(e0x20/s4) at 76347c0
    12/02/14 20:31:25: bbmMarkBadBlock: pd=03, pdLBA=76347c0
    12/02/14 20:31:25: EVT#1111970-12/02/14 20:31:25:  97=Puncturing bad block on PD 03(e0x20/s3) at 76347c0
    12/02/14 20:31:27: EVT#1111971-12/02/14 20:31:27: 113=Unexpected sense: PD 02(e0x20/s2), CDB: 28 00 07 63 ce 00 00 00 80 00, Sense: f0 00 03 07 63 ce 29 0a 00 00 00 00 11 00 81 80 0
    12/02/14 20:31:27: DEV_REC:Medium Error DevId[2] Tgt 2 retires=0
    12/02/14 20:31:27: ErrLBAOffset (29) LBA(763ce00) BadLba=763ce29
    12/02/14 20:31:27: EVT#1111972-12/02/14 20:31:27: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 763ce29
    12/02/14 20:31:27: BBMProcessReadError: RECOVERY, pd=02, pdErrLba=763ce29 - puncture source/target drives
    12/02/14 20:31:27: bbmMarkBadBlock: pd=02, pdLBA=763ce29
    12/02/14 20:31:27: EVT#1111973-12/02/14 20:31:27:  97=Puncturing bad block on PD 02(e0x20/s2) at 763ce29
    12/02/14 20:31:27: bbmMarkBadBlock: pd=03, pdLBA=763ce29
    12/02/14 20:31:27: EVT#1111974-12/02/14 20:31:27:  97=Puncturing bad block on PD 03(e0x20/s3) at 763ce29

    该服务器目前阵列情况比较严重,如果数据比较重要就直接找恢复公司吧,如果不重要或有备份就尝试在Raid BIOS中把2号硬盘强制上线,然后把3号硬盘导入到阵列中,看看能否恢复,因为有穿孔错误,数据可能仍然无法读取

    关于操作可以参考阵列卡的用户手册。

    http://downloads.dell.com/Manuals/all-products/esuprt_ser_stor_net/esuprt_dell_adapters/poweredge-rc-6e_User's%20Guide_zh-cn.pdf

    什么叫穿孔错误(Puncturing)参考以下:

    www.dell.com/.../EN

    如果要重新安装操作系统,切记一定要在阵列中把原来的Raid信息全部删掉,并在重建后要选择完全初始化,否则穿孔错误还是会存在。