针对mdadm的RAID1失效测试
背景
对软raid(mdadm)方式进行各个场景失效测试。
一、初始信息
内核版本:
root@omv30:~# uname -a linux omv30 4.18.0-0.bpo.1-amd64 #1 smp debian 4.18.6-1~bpo9+1 (2018-09-13) x86_64 gnu/linux
使用omv创建raid1之后,查询sdb的信息,此时sdb对应的是8ac693c5的uuid,device号为1:
root@omv30:~# mdadm --query /dev/sdb /dev/sdb: is not an md array /dev/sdb: device 1 in 2 device undetected raid1 /dev/md0. use mdadm --examine for more detail.
root@omv30:~# mdadm --examine /dev/sdb /dev/sdb: magic : a92b4efc version : 1.2 feature map : 0x0 array uuid : 921a8946:b273e00e:3fa4b99d:040a4437 name : omv30:raid1 (local to host omv30) creation time : sun sep 30 22:31:39 2018 raid level : raid1 raid devices : 2 avail dev size : 2095104 (1023.00 mib 1072.69 mb) array size : 1047552 (1023.00 mib 1072.69 mb) data offset : 2048 sectors super offset : 8 sectors unused space : before=1960 sectors, after=0 sectors state : clean device uuid : 64a58fb5:c7e76b1a:29453878:8ac693c5 update time : mon oct 1 13:20:56 2018 bad block log : 512 entries available at offset 72 sectors checksum : 2e1fb65b - correct events : 21 device role : active device 1 array state : aa ('a' == active, '.' == missing, 'r' == replacing)
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 19:46:42 2018 state : clean active devices : 2 working devices : 2 failed devices : 0 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 25 number major minor raiddevice state 0 8 32 0 active sync /dev/sdc 1 8 16 1 active sync /dev/sdb
配置文件信息:
root@omv30:~# cat /etc/mdadm/mdadm.conf # mdadm.conf # # please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for md superblocks. # alternatively, specify devices to scan, using wildcards if desired. # note, if no device line is present, then "device partitions" is assumed. # to avoid the auto-assembly of raid devices a pattern that can't match is # used if no raid devices are configured. device partitions # auto-create devices with debian standard permissions create owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system homehost <system> # definitions of existing md arrays array /dev/md0 metadata=1.2 name=omv30:raid1 uuid=921a8946:b273e00e:3fa4b99d:040a4437
二、盘符错乱测试
1、盘符交换测试
在virtualbox的存储-sata下,分别选中两块硬盘,在右边的属性将sata端口调换位置,即可交换盘符。
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 19:52:46 2018 state : clean active devices : 2 working devices : 2 failed devices : 0 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 29 number major minor raiddevice state 0 8 16 0 active sync /dev/sdb 1 8 32 1 active sync /dev/sdc
可以看到盘符已经换了。
root@omv30:~# mdadm --examine /dev/sdb /dev/sdb: magic : a92b4efc version : 1.2 feature map : 0x0 array uuid : 921a8946:b273e00e:3fa4b99d:040a4437 name : omv30:raid1 (local to host omv30) creation time : sun sep 30 22:31:39 2018 raid level : raid1 raid devices : 2 avail dev size : 2095104 (1023.00 mib 1072.69 mb) array size : 1047552 (1023.00 mib 1072.69 mb) data offset : 2048 sectors super offset : 8 sectors unused space : before=1960 sectors, after=0 sectors state : clean device uuid : 6e545465:3dcf10df:1d5bb938:fe840307 update time : mon oct 1 19:52:46 2018 bad block log : 512 entries available at offset 72 sectors checksum : a47a7d1f - correct events : 29 device role : active device 0 array state : aa ('a' == active, '.' == missing, 'r' == replacing)
root@omv30:~# cat /proc/mdstat personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active (auto-read-only) raid1 sdb[0] sdc[1] 1047552 blocks super 1.2 [2/2] [uu] unused devices: <none> root@omv30:~# fdisk -l ...(省略) disk /dev/md0: 1023 mib, 1072693248 bytes, 2095104 sectors units: sectors of 1 * 512 = 512 bytes sector size (logical/physical): 512 bytes / 512 bytes i/o size (minimum/optimal): 512 bytes / 512 bytes
mount后访问正常。
2、更换硬盘位置测试
关机,新增一块硬盘,占用原来sdc的位置,启动:
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 19:52:46 2018 state : clean active devices : 2 working devices : 2 failed devices : 0 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 29 number major minor raiddevice state 0 8 16 0 active sync /dev/sdb 1 8 48 1 active sync /dev/sdd
root@omv30:~# cat /proc/mdstat personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active (auto-read-only) raid1 sdb[0] sdd[1] 1047552 blocks super 1.2 [2/2] [uu] unused devices: <none>
原来的sdc变成了sdc,raid1还完好保存,不受影响。
3、移除硬盘测试
关机,然后移除一块硬盘,启动:
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 raid level : raid0 total devices : 1 persistence : superblock is persistent state : inactive name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 29 number major minor raiddevice - 8 16 - /dev/sdb
root@omv30:~# mdadm --query /dev/sdb /dev/sdb: is not an md array /dev/sdb: device 0 in 2 device undetected raid1 /dev/md0. use mdadm --examine for more detail.
root@omv30:~# mdadm --examine /dev/sdb /dev/sdb: magic : a92b4efc version : 1.2 feature map : 0x0 array uuid : 921a8946:b273e00e:3fa4b99d:040a4437 name : omv30:raid1 (local to host omv30) creation time : sun sep 30 22:31:39 2018 raid level : raid1 raid devices : 2 avail dev size : 2095104 (1023.00 mib 1072.69 mb) array size : 1047552 (1023.00 mib 1072.69 mb) data offset : 2048 sectors super offset : 8 sectors unused space : before=1960 sectors, after=0 sectors state : clean device uuid : 6e545465:3dcf10df:1d5bb938:fe840307 update time : mon oct 1 19:52:46 2018 bad block log : 512 entries available at offset 72 sectors checksum : a47a7d1f - correct events : 29 device role : active device 0 array state : aa ('a' == active, '.' == missing, 'r' == replacing)
root@omv30:~# cat /proc/mdstat personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : inactive sdb[0](s) 1047552 blocks super 1.2 unused devices: <none>
root@omv30:~# fdisk -l disk /dev/sda: 8 gib, 8589934592 bytes, 16777216 sectors units: sectors of 1 * 512 = 512 bytes sector size (logical/physical): 512 bytes / 512 bytes i/o size (minimum/optimal): 512 bytes / 512 bytes disklabel type: dos disk identifier: 0x8c9b0fb9 device boot start end sectors size id type /dev/sda1 * 2048 12582911 12580864 6g 83 linux /dev/sda2 12584958 16775167 4190210 2g 5 extended /dev/sda5 12584960 16775167 4190208 2g 82 linux swap / solaris disk /dev/sdb: 1 gib, 1073741824 bytes, 2097152 sectors units: sectors of 1 * 512 = 512 bytes sector size (logical/physical): 512 bytes / 512 bytes i/o size (minimum/optimal): 512 bytes / 512 bytes
raid1变成了inactive,但raid信息本身是保存在磁盘中,不会丢失。
4、结论
- mdadm不是根据盘符/dev/sdx来记录raid信息的,盘符无论怎么变换,raid信息不错乱。
- mdadm是使用device uuid来区分硬盘的。与raid硬盘盒不一样,硬盘盒是记录硬盘槽位号的。
- 所以mdadm每个硬盘可以使用任意硬盘盒,不用记录位置。
三、raid降级恢复测试
场景:正常运行的raid1,突然一块盘失效,进行重建恢复。
方法:可以用模拟fail的方式,也可以用virtualbox热插拔硬盘的功能。
1、模拟fail的方式
初始信息如下:
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 20:29:44 2018 state : clean active devices : 2 working devices : 2 failed devices : 0 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 31 number major minor raiddevice state 0 8 16 0 active sync /dev/sdb 1 8 48 1 active sync /dev/sdd
(1)手工fail掉sdd:
root@omv30:~# mdadm /dev/md0 --fail /dev/sdd mdadm: set /dev/sdd faulty in /dev/md0
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 20:29:59 2018 state : clean, degraded active devices : 1 working devices : 1 failed devices : 1 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 33 number major minor raiddevice state 0 8 16 0 active sync /dev/sdb - 0 0 1 removed 1 8 48 - faulty /dev/sdd
如果未移除,要先移除损坏的硬盘:
root@omv30:~# mdadm /dev/md0 -r /dev/sdd mdadm: hot remove failed for /dev/sdd: no such device or address
(2)增加新盘:
root@omv30:~# mdadm /dev/md0 --add /dev/sdc (新加的盘是2g的,经实测不影响raid1重建) mdadm: added /dev/sdc root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 20:36:22 2018 state : clean, degraded, recovering active devices : 1 working devices : 2 failed devices : 0 spare devices : 1 rebuild status : 76% complete name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 48 number major minor raiddevice state 0 8 16 0 active sync /dev/sdb 2 8 32 1 spare rebuilding /dev/sdc
可以看到正在重建。
过一会儿再执行:
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 20:36:24 2018 state : clean active devices : 2 working devices : 2 failed devices : 0 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 53 number major minor raiddevice state 0 8 16 0 active sync /dev/sdb 2 8 32 1 active sync /dev/sdc
root@omv30:~# cat /proc/mdstat personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc[2] sdb[0] 1047552 blocks super 1.2 [2/2] [uu] unused devices: <none>
已经重建成功。
2、热插拔硬盘测试
在virtualbox的存储中,勾选sdb的热插拔,然后启动。
系统在运行时,到virtualbox的存储中移除sdb硬盘,然后查看状态:
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 1 persistence : superblock is persistent update time : mon oct 1 21:13:45 2018 state : clean, degraded active devices : 1 working devices : 1 failed devices : 0 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 56 number major minor raiddevice state - 0 0 0 removed 2 8 32 1 active sync /dev/sdc
查看系统日志,发现有硬盘离线,并且raid1降级(有日志跟踪是软raid的一个优势):
root@omv30:/var/log# dmesg | tail -20 [ 340.551533] md: recovery of raid array md0 [ 345.881625] md: md0: recovery done. [ 657.932091] ext4-fs (md0): mounted filesystem with ordered data mode. opts: (null) [ 2571.324851] ata2: sata link down (sstatus 0 scontrol 300) [ 2576.667851] ata2: sata link down (sstatus 0 scontrol 300) [ 2582.044796] ata2: sata link down (sstatus 0 scontrol 300) [ 2582.044864] ata2.00: disabled [ 2582.045573] ata2.00: detaching (scsi 3:0:0:0) [ 2582.058467] sd 3:0:0:0: [sdb] synchronizing scsi cache [ 2582.058528] sd 3:0:0:0: [sdb] synchronize cache(10) failed: result: hostbyte=did_bad_target driverbyte=driver_ok [ 2582.058536] sd 3:0:0:0: [sdb] stopping disk [ 2582.058552] sd 3:0:0:0: [sdb] start/stop unit failed: result: hostbyte=did_bad_target driverbyte=driver_ok [ 2586.709149] md/raid1:md0: disk failure on sdb, disabling device. md/raid1:md0: operation continuing on 1 devices.
加入新硬盘。新加的硬盘不能比现有的小,可以比现在的大。
root@omv30:~# mdadm /dev/md0 --add /dev/sdc
开始重建:
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 21:38:36 2018 state : clean, degraded, recovering active devices : 1 working devices : 2 failed devices : 0 spare devices : 1 rebuild status : 56% complete name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 69 number major minor raiddevice state 3 8 32 0 spare rebuilding /dev/sdc 2 8 16 1 active sync /dev/sdb
root@omv30:~# cat /proc/mdstat personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc[3] sdb[2] 1047552 blocks super 1.2 [2/1] [_u] [===================>.] recovery = 95.0% (996288/1047552) finish=0.0min speed=249072k/sec unused devices: <none>
重建完成:
root@omv30:~# mdadm -d /dev/md0 /dev/md0: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 21:38:39 2018 state : clean active devices : 2 working devices : 2 failed devices : 0 spare devices : 0 name : omv30:raid1 (local to host omv30) uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 78 number major minor raiddevice state 3 8 32 0 active sync /dev/sdc 2 8 16 1 active sync /dev/sdb
root@omv30:~# cat /proc/mdstat personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc[3] sdb[2] 1047552 blocks super 1.2 [2/2] [uu] unused devices: <none>
四、重装系统恢复raid1测试
1、raid1两块硬盘正常
将两块硬盘挂到一个新装的ubuntu中,启动。
root@ub13:/home/op# fdisk -l ...(省略) disk /dev/md127:1023 mib,1072693248 字节,2095104 个扇区 单元:扇区 / 1 * 512 = 512 字节 扇区大小(逻辑/物理):512 字节 / 512 字节 i/o 大小(最小/最佳):512 字节 / 512 字节
root@ub13:/home/op# cat /proc/mdstat personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md127 : active raid1 sdd[1] sdc[0] 1047552 blocks super 1.2 [2/2] [uu] unused devices: <none>
可以看到,自动识别出来/dev/md127。
root@ub13:/mnt/md# mdadm -d /dev/md127 /dev/md127: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 2 persistence : superblock is persistent update time : mon oct 1 19:31:08 2018 state : clean active devices : 2 working devices : 2 failed devices : 0 spare devices : 0 consistency policy : resync name : omv30:raid1 uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 25 number major minor raiddevice state 0 8 32 0 active sync /dev/sdc 1 8 48 1 active sync /dev/sdd
系统自动识别并恢复了raid1,不用执行mdadm --assemble --scan。
查看/etc/mdadm/mdadm.conf,也是自动加入了md信息。
mount后访问正常。
至于如何将md127修改成md0,详见下一节。
2、raid1损坏了一块硬盘
只有一块硬盘可用,需要在新系统上重建raid1。
root@op:/home/op# fdisk -l (未发现新的md设备,略去详细输出)
root@op:/home/op# mdadm --examine /dev/sdb /dev/sdb: magic : a92b4efc version : 1.2 feature map : 0x0 array uuid : 921a8946:b273e00e:3fa4b99d:040a4437 name : omv30:raid1 creation time : sun sep 30 22:31:39 2018 raid level : raid1 raid devices : 2 avail dev size : 4192256 (2047.00 mib 2146.44 mb) array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 2095104 (1023.00 mib 1072.69 mb) data offset : 2048 sectors super offset : 8 sectors unused space : before=1960 sectors, after=2097152 sectors state : clean device uuid : 6e2fc709:35a8f6fb:d4c0e242:6905437d update time : mon oct 1 21:40:25 2018 bad block log : 512 entries available at offset 72 sectors checksum : e65b30a8 - correct events : 78 device role : active device 1 array state : aa ('a' == active, '.' == missing, 'r' == replacing)
root@op:/home/op# cat /proc/mdstat personalities : unused devices: <none>
说明新挂载的盘raid1信息犹在,只需要重新识别:
root@op:/home/op# mdadm --assemble --scan mdadm: /dev/md/raid1 has been started with 1 drive (out of 2).
root@op:/home/op# fdisk -l ...(省略) disk /dev/md127:1023 mib,1072693248 字节,2095104 个扇区 单元:扇区 / 1 * 512 = 512 字节 扇区大小(逻辑/物理):512 字节 / 512 字节 i/o 大小(最小/最佳):512 字节 / 512 字节
root@op:/home/op# mdadm -d /dev/md127 /dev/md127: version : 1.2 creation time : sun sep 30 22:31:39 2018 raid level : raid1 array size : 1047552 (1023.00 mib 1072.69 mb) used dev size : 1047552 (1023.00 mib 1072.69 mb) raid devices : 2 total devices : 1 persistence : superblock is persistent update time : mon oct 1 21:40:25 2018 state : clean, degraded active devices : 1 working devices : 1 failed devices : 0 spare devices : 0 consistency policy : resync name : omv30:raid1 uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 78 number major minor raiddevice state - 0 0 0 removed 2 8 16 1 active sync /dev/sdb
如果这个时间重启系统,会发现fdisk -l中没有md127了。并且mdadm -d发现raid1处于inactive。
root@op:/home/op# mdadm -d /dev/md127 /dev/md127: version : 1.2 raid level : raid0 total devices : 1 persistence : superblock is persistent state : inactive working devices : 1 name : omv30:raid1 uuid : 921a8946:b273e00e:3fa4b99d:040a4437 events : 78 number major minor raiddevice - 8 16 - /dev/sdb
这时需要先mdadm -s /dev/md127删除旧的md127,
再重新mdadm --assemble --scan。
再add硬盘,完成重建即可。
root@omv30:~# mdadm /dev/md0 --add /dev/sdc
这时再重启也不会有影响,唯一的变化就是在之前主机是md0,现在变成了md127了。
解决方法:
修改/etc/mdadm/mdadm.conf,
把第二列:/dev/md/raid1
修改成:/dev/md0
再执行:
update-initramfs -u
重启,搞定。
五、常用命令
查看状态
mdadm -d /dev/md0 cat /proc/mdstat mdadm --examine /dev/sdb
删除md
mdadm -s /dev/md0
激活md
mdadm -a /dev/md0
在新os上重新导入raid
mdadm --assemble --scan
重建
mdadm /dev/md0 --add /dev/sdc