Steps for Disk replacement in IBM ESS.
Today in this post, we will see how to do disk replacement in IBM ESS server.
Suppose there is a disk failed and IBM is sending an engineer to replace the failed ESS disk.
Let us see the steps involved here.
========== Replacing a failed disk on IBM ESS Server============
1. Check which disks are not OK.
ngeess1--> mmlspdisk all --not-ok pdisk: replacementPriority = 3.42 name = "e2d2s32" device = "" recoveryGroup = "rg_ngeess2-da" declusteredArray = "DA1" state = "failing/replace" internalState = 00009.1c0 capacity = 8001524072448 freeSpace = 7997229105152 fru = "00LY450" location = "78R039C-2-32" WWN = "naa.5000C50094E9E8A7" server = "ngeess2-da.india.ngelinux.com" reads = 75139318 writes = 53230540 bytesReadInGiB = 66080.249 bytesWrittenInGiB = 43387.205 IOErrors = 2 IOTimeouts = 5 mediaErrors = 0 checksumErrors = 0 pathErrors = 0 relativePerformance = 0.836 dataBadness = 0.000 rgIndex = 37 userLocation = "NGE LAB, E1 N00-15, Enclosure 00XX-0XX-OXOXOXX Draw er 2 Slot 32" hardware = "IBM-ESXS STX000NM00XX E5 ECE4 XX19KXXX0000RXXXNWHA" hardwareType = Rotating 7200 nPaths = 0 active 0 total nsdFormatVersion = Unknown paxosAreaOffset = Unknown paxosAreaSize = Unknown logicalBlockSize = 4096 ssdEndurancePercentage = You have new mail in /var/spool/mail/root ngeess1-->
2. Replace disk in rg_ngeess2-da.
ngeess1--> mmlspdisk rg_ngeess2-da --replace pdisk: replacementPriority = 3.42 name = "e2d2s32" device = "" recoveryGroup = "rg_ngeess2-da" declusteredArray = "DA1" state = "failing/replace" internalState = 00009.1c0 capacity = 8001524072448 freeSpace = 7997229105152 fru = "00LY450" location = "78R039C-2-32" WWN = "naa.5000C50094E9E8A7" server = "ngeess2-da.india.ngelinux.com" reads = 75139318 writes = 53230540 bytesReadInGiB = 66080.249 bytesWrittenInGiB = 43387.205 IOErrors = 2 IOTimeouts = 5 mediaErrors = 0 checksumErrors = 0 pathErrors = 0 relativePerformance = 0.836 dataBadness = 0.000 rgIndex = 37 userLocation = "NGE LAB, E1 N00-15, Enclosure 00XX-0XX-OXOXOXX Draw er 2 Slot 32" hardware = "IBM-ESXS STX000NM00XX E5 ECE4 XX19KXXX0000RXXXNWHA" hardwareType = Rotating 7200 nPaths = 0 active 0 total nsdFormatVersion = Unknown paxosAreaOffset = Unknown paxosAreaSize = Unknown logicalBlockSize = 4096 ssdEndurancePercentage =
3. Check out the pdisks available.
ngeess1--> mmlsrecoverygroup rg_ngeess2-da -L --pdisk declustered current allowable recovery group arrays vdisks pdisks format version format version ----------------- ----------- ------ ------ -------------- -------------- rg_ngeess2-da 3 5 86 4.2.2.0 5.0.5.1 declustered needs replace scrub background activity array service vdisks pdisks spares threshold trim free space duration task progress priority ----------- ------- ------ ------ ------ --------- ---- ---------- -------- ------------------------- NVR no 1 2 0,0 1 no 3632 MiB 14 days scrub 62% low DA1 yes 3 83 2,44 2 no 10 TiB 14 days scrub 45% low SSD no 1 1 0,0 1 no 744 GiB 14 days scrub 33% low n. active, declustered state, pdisk total paths array free space remarks ----------------- ----------- ----------- ---------- ------- e1d1s01ssd 2, 4 SSD 744 GiB ok e1d1s02 2, 4 DA1 220 GiB ok e1d1s03 2, 4 DA1 216 GiB ok e1d1s04 2, 4 DA1 220 GiB ok e1d1s20 2, 4 DA1 216 GiB ok e1d1s21 2, 4 DA1 220 GiB ok e1d1s29 2, 4 DA1 220 GiB ok e1d1s30 2, 4 DA1 220 GiB ok e1d1s31 2, 4 DA1 220 GiB ok e1d1s32 2, 4 DA1 220 GiB ok e2d2s20 2, 4 DA1 216 GiB ok e2d2s21 2, 4 DA1 216 GiB ok e2d2s29 2, 4 DA1 216 GiB ok e2d2s30 2, 4 DA1 216 GiB ok e2d2s31 2, 4 DA1 216 GiB ok e2d2s32 0, 0 DA1 7448 GiB failing/replace e2d2s33 2, 4 DA1 216 GiB ok e2d2s34 2, 4 DA1 216 GiB ok e2d2s35 2, 4 DA1 216 GiB ok n001v001 1, 1 NVR 1816 MiB ok n002v001 1, 1 NVR 1816 MiB ok declustered checksum vdisk RAID code array vdisk size block size granularity state remarks ------------------ ------------------ ----------- ---------- ---------- ----------- ----- ------- rg_ngeess2_da_logtip 2WayReplication NVR 48 MiB 2 MiB 4096 ok logTip rg_ngeess2_da_logtipbackup Unreplicated SSD 48 MiB 2 MiB 4096 ok logTipBackup rg_ngeess2_da_loghome 4WayReplication DA1 72 GiB 2 MiB 4096 ok log rg_ngeess2_da_Meta_512K_1 3WayReplication DA1 17 TiB 512 KiB 32 KiB ok rg_ngeess2_da_Data_8M_1 8+2p DA1 420 TiB 8 MiB 32 KiB ok config data declustered array spare space remarks ------------------ ------------------ ------------- ------- rebuild space DA1 47 pdisk config data disk group fault tolerance remarks ------------------ --------------------------------- ------- rg descriptor 1 drawer + 1 pdisk limiting fault tolerance system index 1 drawer + 1 pdisk limited by rg descriptor vdisk disk group fault tolerance remarks ------------------ --------------------------------- ------- rg_ngeess2_da_logtip 1 pdisk rg_ngeess2_da_logtipbackup 0 pdisk rg_ngeess2_da_loghome 1 drawer + 1 pdisk limited by rg descriptor rg_ngeess2_da_Meta_512K_1 1 drawer + 1 pdisk limited by rg descriptor rg_ngeess2_da_Data_8M_1 2 pdisk active recovery group server servers ----------------------------------------------- ------- ngeess2-da.india.ngelinux.com ngeess2-da.india.ngelinux.com,ngeess3-da.india.ngelinux.com
4. Verify pdisk inside the recovery group.
ngeess1--> mmvdisk pdisk list --recovery-group rg_ngeess2-da declustered recovery group pdisk array paths capacity free space FRU (type) state -------------- ------------ ----------- ----- -------- ---------- --------------- ----- rg_ngeess2-da e1d1s02 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s03 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s04 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s05 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s06 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s07 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s15 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s16 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s17 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s18 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s19 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s20 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s21 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s29 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s30 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s31 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s32 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s33 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s34 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s35 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s01 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s02 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s03 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s04 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s05 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s06 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s07 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s15 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s16 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s17 DA1 2 7452 GiB 228 GiB 00LY450 ok rg_ngeess2-da e1d2s18 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s19 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s20 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s21 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s29 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s30 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s31 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s32 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s33 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e1d2s34 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d2s35 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d1s01 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e2d1s02 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e2d1s03 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e2d1s04 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d1s05 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d1s06 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d1s07 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d2s19 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s20 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s21 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s29 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s30 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s31 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s32 DA1 0 7452 GiB 7448 GiB 00LY450 failing/replace rg_ngeess2-da e2d2s33 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s34 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s35 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da n001v001 NVR 1 1992 MiB 1816 MiB IPR-10 68C8730 ok rg_ngeess2-da n002v001 NVR 1 1992 MiB 1816 MiB IPR-10 68C8C10 ok rg_ngeess2-da e1d1s01ssd SSD 2 745 GiB 744 GiB 00LY451 ok
5. Prepare the pdisk for replacement
ngeess1--> mmvdisk pdisk replace --prepare --recovery-group rg_ngeess2-da --pdisk e2d2s32 mmvdisk: Suspending pdisk e2d2s32 of RG rg_ngeess2-da in location 78R039C-2-32. mmvdisk: Location 78R039C-2-32 is Rack Pyramid Park, E5 U11-15, Enclosure 5147-084-78R039C Drawer 2 Slot 32. mmvdisk: Carrier released. mmvdisk: mmvdisk: - Remove carrier. mmvdisk: - Replace disk in location 78R039C-2-32 with type '00LY450'. mmvdisk: - Reinsert carrier. mmvdisk: - Issue the following command: mmvdisk: mmvdisk: mmvdisk pdisk replace --recovery-group rg_ngeess2-da --pdisk 'e2d2s32' ngeess1-->
6. Now ask the IBM engineer to replace the failed disk.
7. After disk replacement by engineer, initiate the new disk.
ngeess1--> mmvdisk pdisk replace --recovery-group rg_ngeess2-da --pdisk 'e2d2s32' mmvdisk: mmvdisk: mmchcarrier : [I] Preparing a new pdisk for use may take many minutes. mmvdisk: mmvdisk: 2021-08-04_12:34:01.261+0100: [I] Callback: /usr/lpp/mmfs/bin/tspreparenewpdiskforuse /dev/sdhu. mmvdisk: Attempting to update firmware if necessary. Failure will not prevent drive replacement. mmvdisk: Command: mmchfirmware --type drive --serial-number XX1XXXXR0000C020LXXX --new-pdisk mmvdisk: Command: err 0: mmchfirmware --type drive --serial-number XX1XXXXR0000C020LXXX --new-pdisk mmvdisk: mmvdisk: The following pdisks will be formatted on node ngeess2: mmvdisk: //ngeess2-da/dev/sdbz,//ngeess2-da/dev/sdhu,//ngeess3-da/dev/sdy,//ngeess3-da/dev/sdhu mmvdisk: Pdisk e2d2s32 of RG rg_ngeess2-da successfully replaced. mmvdisk: Resuming pdisk e2d2s32#0037 of RG rg_ngeess2-da. mmvdisk: Carrier resumed.
8. Check out the disk status.
ngeess1--> mmvdisk pdisk list --recovery-group rg_ngeess2-da declustered recovery group pdisk array paths capacity free space FRU (type) state -------------- ------------ ----------- ----- -------- ---------- --------------- ----- rg_ngeess2-da e1d1s02 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s03 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s04 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s05 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s06 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s07 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s15 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s16 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s17 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s18 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s19 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s20 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e1d1s21 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s29 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s30 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s31 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s32 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s33 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e1d1s34 DA1 2 7452 GiB 220 GiB 00LY450 okrg_ngeess2-da e2d1s33 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d1s34 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d1s35 DA1 2 7452 GiB 224 GiB 00LY450 ok rg_ngeess2-da e2d2s01 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s02 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s03 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s04 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s05 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e2d2s06 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s07 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s15 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s16 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s17 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s18 DA1 2 7452 GiB 220 GiB 00LY450 ok rg_ngeess2-da e2d2s19 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s20 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s21 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s29 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s30 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s31 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s32 DA1 2 7452 GiB 7448 GiB 00LY450 ok rg_ngeess2-da e2d2s33 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s34 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da e2d2s35 DA1 2 7452 GiB 216 GiB 00LY450 ok rg_ngeess2-da n001v001 NVR 1 1992 MiB 1816 MiB IPR-10 68C8730 ok rg_ngeess2-da n002v001 NVR 1 1992 MiB 1816 MiB IPR-10 68C8C10 ok rg_ngeess2-da e1d1s01ssd SSD 2 745 GiB 744 GiB 00LY451 ok