android R启动找不到super分区问题
总结一个android R打开super动态分区后,init第一阶段启动失败的例子,也为了自己后面看看趟过的坑。
在移植适配android R项目,主要做了如下事情:
打开BOARD_AVB_ENABLE := true配置 和 添加dynamic动态分区配置
物理分区表添加super分区
检查过kernel defconfig中的DM相关的配置都已经使能
fstab添加super相关配置的几个逻辑分区
打开AVB相关配置,关闭dts中的avb相关的分区配置
在以上准备工作完成后,编译版本,确实super.img镜像编译正常,vbmeta和vbmeta_system镜像也正常,dump显示内容也正常,烧录镜像后开机启动
果然还是遇到报错,启动到init第一阶段,报如下错误:
[ 7.231810] {1}[1:init]init: init first stage started!
[ 7.237895] {1}[1:init]init: Unable to open /lib/modules, skipping module loading.
[ 7.247187] {1}[1:init]init: [libfs_mgr]ReadFstabFromDt(): failed to read fstab from dt
[ 7.428625] {1}[1:init]init: Using Android DT directory /proc/device-tree/firmware/android/
[ 7.803085] {1}[1:init]init: realpath failed: /dev/block/by-name/super: No such file or directory
[ 7.813305] {1}[1:init]init: Failed to mount required partitions early ...
[ 7.822096] {1}[1:init]init: InitFatalReboot: signal 6
[ 7.847731] {1}[1:init]init: #00 pc 00000000003043e0 /init (UnwindStackCurrent::UnwindFromContext(unsigned long, void*)+96)
遇到这种问题,没得整,只能硬着头皮一步步爬了,开扒~~~
从下面这句开始找线索
[ 7.803085] {1}[1:init]init: realpath failed: /dev/block/by-name/super: No such file or directory
撸代码,找到位置在first_stage_mount.cpp,在init的第一阶段初始化分区物理分区节点时报异常了
bool FirstStageMount::InitDevices() {
std::set<std::string> devices;
GetSuperDeviceName(&devices);
if (!GetDmVerityDevices(&devices)) {
return false;
}
if (!InitRequiredDevices(std::move(devices))) {
return false;
}
if (IsDmLinearEnabled()) {
auto super_symlink = "/dev/block/by-name/"s + super_partition_name_;
if (!android::base::Realpath(super_symlink, &super_path_)) {
PLOG(ERROR) << "realpath failed: " << super_symlink;
return false;
}
}
return true;
}
从Realpath函数来看,应该是super_symlink这个节点,在boot启动后symlink不存在。super物理分区前面我们不是配置了么,且烧录super分区是正常的,问题出在哪??
在暂时看不明白原因的情况下,添加日志打印是比较好的手段,所以先添加点代码,判断一下super分区的节点是否存在,添加的代码如下,打印一下super_symlink和super_path_分别是什么,以及access判断一下/dev/block/by-name/super节点是否存在。
if (IsDmLinearEnabled()) {
auto super_symlink = "/dev/block/by-name/"s + super_partition_name_;
LOG(INFO) << " super_partition_name_ is " << super_partition_name_
<< ", super_path_ is " << super_path_;
if (access("/dev/block/by-name/super", F_OK) == 0) {
LOG(INFO) << "super partition node is existed.";
} else {
LOG(INFO) << "super partition node is not existed.";
}
if (!android::base::Realpath(super_symlink, &super_path_)) {
PLOG(ERROR) << "realpath failed: " << super_symlink;
return false;
}
}
编译运行后,发现走到else流程,即/dev/block/by-name/super分区节点不存在。好吧,应该不是原生的问题,android base code肯定是好的,估摸着跟其他地方配置有关。还得继续往下分析。
回想一下android开机的分区挂载流程,脑子里要有一个大概的思路,第一阶段要先从ramdisk中读出fstab分区配置,然后去挂载,
在挂载前要先等底层kernel把分区节点/sys文件系统准备好,不然上层是没法挂载的,
在init的第一阶段表现在这里的InitDevices函数,这个大的方向主要干哪些事呢?
- 调用GetSuperDeviceName函数,获取super分区节点名称,存到devices集合中
- 调用GetDmVerityDevices,这个函数主要是获取跟deivce mapper相关的几个分区设备,我这里是boot_a,super,vbmeta_a,vbmeta_system_a,为什么有个_a,因为我打开了A/B分区配置
至于怎么知道是_a还是_b,撸过相关代码的人应该看到过,是在fs_mgr_slotselect.cpp中的fs_mgr_get_slot_suffix函数处理的,这个是从kernel中cmdline中提供的,一般由更底层处理append到cmdline - InitRequiredDevices函数,这个函数比较关键,我们这里展开分析一下
bool FirstStageMount::InitRequiredDevices(std::set<std::string> devices) {
if (!block_dev_init_.InitDeviceMapper()) {
return false;
}
if (devices.empty()) {
return true;
}
return block_dev_init_.InitDevices(std::move(devices));
}
第一个是先初始化InitDeviceMapper,因为这个比较长,后面我准备在AVB流程中单独介绍一下,Device mapper的初始化流程要走到kernel MD模块,我们这里先跳过。
然后调用block_dev_init_.InitDevices去初始化,devices这里是指针,后面走完流程会带回来值。
调用到了block_dev_initializer.cpp的InitDevices函数,估计不少人看到下面有点蒙,下面这些代码做了啥了??
@block_dev_initializer.cpp
bool BlockDevInitializer::InitDevices(std::set<std::string> devices) {
auto uevent_callback = [&, this](const Uevent& uevent) -> ListenerAction {
return HandleUevent(uevent, &devices);
};
uevent_listener_.RegenerateUevents(uevent_callback);
// UeventCallback() will remove found partitions from |devices|. So if it
// isn't empty here, it means some partitions are not found.
if (!devices.empty()) {
LOG(INFO) << __PRETTY_FUNCTION__
<< ": partition(s) not found in /sys, waiting for their uevent(s): "
<< android::base::Join(devices, ", ");
Timer t;
uevent_listener_.Poll(uevent_callback, 10s);
LOG(INFO) << "Wait for partitions returned after " << t;
}
if (!devices.empty()) {
LOG(ERROR) << __PRETTY_FUNCTION__ << ": partition(s) not found after polling timeout: "
<< android::base::Join(devices, ", ");
return false;
}
return true;
}
我来简单点讲下就明白了,要这么想,android开机时kernel创建好文件系统节点,ramdisk中的init进程怎么知道底层有没有准备,他们怎么通知相互告知状态?
所以,才采用了epoll机制(如果不明白linux的epoll机制,建议先找一下相关资料瞅瞅)
假如kernel上报的uevent中这几个分区节点已存在,还要干什么不?当然是创建symlink软连接,你总不能让人家去用/dev/block//mmcblk0p24这种物理分区吧?
好了,有了这些概念后,再看上面这段代码:启动了一个10秒的定时器,等HandleUevent回调,如果10秒没有回调成功,直接超时挂掉~
我这里InitDevices已经走完了,并没有超时,那是哪里的问题?得继续深追原因~~
从HandleUevent开始分析,添加点打印log
ListenerAction BlockDevInitializer::HandleUevent(const Uevent& uevent,
std::set<std::string>* devices) {
...
auto iter = devices->find(name);
if (iter == devices->end()) {
return ListenerAction::kContinue;
}
LOG(VERBOSE) << __PRETTY_FUNCTION__ << ": found partition: " << name;
LOG(ERROR) << "HandleUevent found partition: " << name; //这句是我添加的,用ERROR省事,该有的log都出来了
devices->erase(iter);
device_handler_->HandleUevent(uevent); //这句是重点,一行代码决定了好多功能~~
return devices->empty() ? ListenerAction::kStop : ListenerAction::kContinue;
}
运行打印日志如下,确实是我们要的几个分区,super分区也是在的
[ 7.471662] {1}[1:init]init: HandleUevent found partition: boot_a
[ 7.557741] {1}[1:init]init: HandleUevent found partition: super
[ 7.635216] {1}[1:init]init: HandleUevent found partition: vbmeta_system_a
[ 7.712557] {1}[1:init]init: HandleUevent found partition: vbmeta_a
继续看device_handler_->HandleUevent(uevent)发生了什么?
device_handle是devices.cpp中DeviceHandler的实例,我们看一下其HandleUevent函数,看完只想说,我是谁我在哪??
void DeviceHandler::HandleUevent(const Uevent& uevent) {
...
std::string devpath;
std::vector<std::string> links;
bool block = false;
if (uevent.subsystem == "block") {
block = true;
devpath = "/dev/block/" + Basename(uevent.path);
if (StartsWith(uevent.path, "/devices")) {
links = GetBlockDeviceSymlinks(uevent);
}
} else if (const auto subsystem =
std::find(subsystems_.cbegin(), subsystems_.cend(), uevent.subsystem);
subsystem != subsystems_.cend()) {
devpath = subsystem->ParseDevPath(uevent);
...
} else {
devpath = "/dev/" + Basename(uevent.path);
}
mkdir_recursive(Dirname(devpath), 0755);
HandleDevice(uevent.action, devpath, block, uevent.major, uevent.minor, links);
}
抽丝剥茧开始吧,谁叫是码农呢。
底层上报肯定是block,且有个links,感觉有戏,有点像symlinks的感觉。
links = GetBlockDeviceSymlinks(uevent);
看下GetBlockDeviceSymlinks函数
看到links.emplace_back("/dev/block/by-name/" + uevent.device_name)这句,感觉要接近真实,还记得前面first_stage_mount.cpp中InitDevices报的错误不??不记得得往回走走,往上翻翻~~~~
/dev/block/by-name/super这个节点找不到,看到GetBlockDeviceSymlinks函数,应该知道这个节点是拼接起来的,但为什么我们会报错呢?
std::vector<std::string> DeviceHandler::GetBlockDeviceSymlinks(const Uevent& uevent) const {
...省略点uevent上报内容字段解析 FindPlatformDevice这个里面就是一级级的找
std::vector<std::string> links;
PLOG(ERROR) << "found " << type << " device " << device;
auto link_path = "/dev/block/" + type + "/" + device;
bool is_boot_device = boot_devices_.find(device) != boot_devices_.end();
//bool is_boot_device = true;
PLOG(ERROR) << " is_boot_device: " << is_boot_device;
if (!uevent.partition_name.empty()) {
std::string partition_name_sanitized(uevent.partition_name);
SanitizePartitionName(&partition_name_sanitized);
if (partition_name_sanitized != uevent.partition_name) {
PLOG(ERROR) << " Linking partition '" << uevent.partition_name << "' as '"
<< partition_name_sanitized << "'";
}
links.emplace_back(link_path + "/by-name/" + partition_name_sanitized);
// Adds symlink: /dev/block/by-name/<partition_name>.
if (is_boot_device) {
links.emplace_back("/dev/block/by-name/" + partition_name_sanitized);
}
} else if (is_boot_device) {
// If we don't have a partition name but we are a partition on a boot device, create a
// symlink of /dev/block/by-name/<device_name> for symmetry.
PLOG(ERROR) << " else is_boot_device: " << is_boot_device;
links.emplace_back("/dev/block/by-name/" + uevent.device_name);
}
auto last_slash = uevent.path.rfind('/');
links.emplace_back(link_path + "/" + uevent.path.substr(last_slash + 1));
return links;
}
先搞点Log再说,且把is_boot_device变量打印一下,下面的逻辑跟这个有点关系。
编译运行后,内容如下,四次返回的device都是0.soc/fa507000.sdhci,感觉问题可能出在这里,而且is_boot_device都是0
从上面的代码逻辑来看,貌似需要这个值为true,这样在mount的时候,才可以找到/dev/block/by-name/super的节点
[ 7.490103] {1}[1:init]init: is_boot_device: 0
[ 7.557741] {1}[1:init]init: HandleUevent found partition: super
[ 7.566017] {1}[1:init]init: found platform device 0.soc/fa507000.sdhci
[ 7.576258] {1}[1:init]init: is_boot_device: 0
[ 7.635216] {1}[1:init]init: HandleUevent found partition: vbmeta_system_a
[ 7.644339] {1}[1:init]init: found platform device 0.soc/fa507000.sdhci
[ 7.654568] {1}[1:init]init: is_boot_device: 0
[ 7.712557] {1}[1:init]init: HandleUevent found partition: vbmeta_a
[ 7.721111] {1}[1:init]init: found platform device 0.soc/fa507000.sdhci
[ 7.731347] {1}[1:init]init: is_boot_device: 0: No such file or directory
[ 7.796012] {1}[1:init]init: super partition node is existed.
[ 7.803085] {1}[1:init]init: realpath failed: /dev/block/by-name/super: No such file or directory
[ 7.813305] {1}[1:init]init: Failed to mount required partitions early ...
但为什么原生的代码,我这里会报错呢??
而且上面found device怎么是0.soc/fa507000.sdhci,这个又是从哪来的?
之前在看kernel uevent上报消息日志时,貌似看到过,先去源码目录下搜索一下0.soc,果然在BoardConfig.mk中找到了cmdline中有定义
androidboot.bootdevices=34458000.sdhci androidboot.boot_devices=0.soc/34458000.sdhci
这个明白原因了吧,bootdevices值和底层上报的不相同,修改一下,应该是不同的项目配置用错了值。
androidboot.bootdevices=fa507000.sdhci androidboot.boot_devices=0.soc/fa507000.sdhci
把上面bool is_boot_device = true;这句注释掉,重新编译和烧录开机运行,搞定,不再报这个错误了。
虽然只是修改了这么一点点,但整个分析的过程挺长的,要对整个android分区挂载流程比较清晰点,以上就是这些分析,mark一下也给其他遇到类似问题的人一个方向。
上一篇: HCIA学习笔记(1)
下一篇: 使用Scala生成随机数的方法示例