欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

android R启动找不到super分区问题

程序员文章站 2024-02-14 15:09:28
...

总结一个android R打开super动态分区后,init第一阶段启动失败的例子,也为了自己后面看看趟过的坑。
在移植适配android R项目,主要做了如下事情:
打开BOARD_AVB_ENABLE := true配置 和 添加dynamic动态分区配置
物理分区表添加super分区
检查过kernel defconfig中的DM相关的配置都已经使能
fstab添加super相关配置的几个逻辑分区
打开AVB相关配置,关闭dts中的avb相关的分区配置

在以上准备工作完成后,编译版本,确实super.img镜像编译正常,vbmeta和vbmeta_system镜像也正常,dump显示内容也正常,烧录镜像后开机启动
果然还是遇到报错,启动到init第一阶段,报如下错误:

[    7.231810]  {1}[1:init]init: init first stage started!
[    7.237895]  {1}[1:init]init: Unable to open /lib/modules, skipping module loading.
[    7.247187]  {1}[1:init]init: [libfs_mgr]ReadFstabFromDt(): failed to read fstab from dt
[    7.428625]  {1}[1:init]init: Using Android DT directory /proc/device-tree/firmware/android/
[    7.803085]  {1}[1:init]init: realpath failed: /dev/block/by-name/super: No such file or directory
[    7.813305]  {1}[1:init]init: Failed to mount required partitions early ...
[    7.822096]  {1}[1:init]init: InitFatalReboot: signal 6
[    7.847731]  {1}[1:init]init: #00 pc 00000000003043e0  /init (UnwindStackCurrent::UnwindFromContext(unsigned long, void*)+96)

遇到这种问题,没得整,只能硬着头皮一步步爬了,开扒~~~
从下面这句开始找线索
[ 7.803085] {1}[1:init]init: realpath failed: /dev/block/by-name/super: No such file or directory

撸代码,找到位置在first_stage_mount.cpp,在init的第一阶段初始化分区物理分区节点时报异常了

bool FirstStageMount::InitDevices() {
    std::set<std::string> devices;
    GetSuperDeviceName(&devices);
    if (!GetDmVerityDevices(&devices)) {
        return false;
    }
    if (!InitRequiredDevices(std::move(devices))) {
        return false;
    }
    if (IsDmLinearEnabled()) {
        auto super_symlink = "/dev/block/by-name/"s + super_partition_name_;
        if (!android::base::Realpath(super_symlink, &super_path_)) {
            PLOG(ERROR) << "realpath failed: " << super_symlink;
            return false;
        }
    }
    return true;
}

从Realpath函数来看,应该是super_symlink这个节点,在boot启动后symlink不存在。super物理分区前面我们不是配置了么,且烧录super分区是正常的,问题出在哪??

在暂时看不明白原因的情况下,添加日志打印是比较好的手段,所以先添加点代码,判断一下super分区的节点是否存在,添加的代码如下,打印一下super_symlink和super_path_分别是什么,以及access判断一下/dev/block/by-name/super节点是否存在。

    if (IsDmLinearEnabled()) {
        auto super_symlink = "/dev/block/by-name/"s + super_partition_name_;
        LOG(INFO) << " super_partition_name_ is " << super_partition_name_ 
                    << ", super_path_ is " << super_path_;
        if (access("/dev/block/by-name/super", F_OK) == 0) {
            LOG(INFO) << "super partition node is existed.";
        } else {
            LOG(INFO) << "super partition node is not existed.";
        }

        if (!android::base::Realpath(super_symlink, &super_path_)) {
            PLOG(ERROR) << "realpath failed: " << super_symlink;
            return false;
        }
    }

编译运行后,发现走到else流程,即/dev/block/by-name/super分区节点不存在。好吧,应该不是原生的问题,android base code肯定是好的,估摸着跟其他地方配置有关。还得继续往下分析。

回想一下android开机的分区挂载流程,脑子里要有一个大概的思路,第一阶段要先从ramdisk中读出fstab分区配置,然后去挂载,
在挂载前要先等底层kernel把分区节点/sys文件系统准备好,不然上层是没法挂载的,
在init的第一阶段表现在这里的InitDevices函数,这个大的方向主要干哪些事呢?

  1. 调用GetSuperDeviceName函数,获取super分区节点名称,存到devices集合中
  2. 调用GetDmVerityDevices,这个函数主要是获取跟deivce mapper相关的几个分区设备,我这里是boot_a,super,vbmeta_a,vbmeta_system_a,为什么有个_a,因为我打开了A/B分区配置
    至于怎么知道是_a还是_b,撸过相关代码的人应该看到过,是在fs_mgr_slotselect.cpp中的fs_mgr_get_slot_suffix函数处理的,这个是从kernel中cmdline中提供的,一般由更底层处理append到cmdline
  3. InitRequiredDevices函数,这个函数比较关键,我们这里展开分析一下
bool FirstStageMount::InitRequiredDevices(std::set<std::string> devices) {
    if (!block_dev_init_.InitDeviceMapper()) {
        return false;
    }
    if (devices.empty()) {
        return true;
    }
    return block_dev_init_.InitDevices(std::move(devices));
}

第一个是先初始化InitDeviceMapper,因为这个比较长,后面我准备在AVB流程中单独介绍一下,Device mapper的初始化流程要走到kernel MD模块,我们这里先跳过。
然后调用block_dev_init_.InitDevices去初始化,devices这里是指针,后面走完流程会带回来值。

调用到了block_dev_initializer.cpp的InitDevices函数,估计不少人看到下面有点蒙,下面这些代码做了啥了??

@block_dev_initializer.cpp
bool BlockDevInitializer::InitDevices(std::set<std::string> devices) {
    auto uevent_callback = [&, this](const Uevent& uevent) -> ListenerAction {
        return HandleUevent(uevent, &devices);
    };
    uevent_listener_.RegenerateUevents(uevent_callback);

    // UeventCallback() will remove found partitions from |devices|. So if it
    // isn't empty here, it means some partitions are not found.
    if (!devices.empty()) {
        LOG(INFO) << __PRETTY_FUNCTION__
                  << ": partition(s) not found in /sys, waiting for their uevent(s): "
                  << android::base::Join(devices, ", ");
        Timer t;
        uevent_listener_.Poll(uevent_callback, 10s);
        LOG(INFO) << "Wait for partitions returned after " << t;
    }

    if (!devices.empty()) {
        LOG(ERROR) << __PRETTY_FUNCTION__ << ": partition(s) not found after polling timeout: "
                   << android::base::Join(devices, ", ");
        return false;
    }
    return true;
}

我来简单点讲下就明白了,要这么想,android开机时kernel创建好文件系统节点,ramdisk中的init进程怎么知道底层有没有准备,他们怎么通知相互告知状态?
所以,才采用了epoll机制(如果不明白linux的epoll机制,建议先找一下相关资料瞅瞅)
假如kernel上报的uevent中这几个分区节点已存在,还要干什么不?当然是创建symlink软连接,你总不能让人家去用/dev/block//mmcblk0p24这种物理分区吧?

好了,有了这些概念后,再看上面这段代码:启动了一个10秒的定时器,等HandleUevent回调,如果10秒没有回调成功,直接超时挂掉~
我这里InitDevices已经走完了,并没有超时,那是哪里的问题?得继续深追原因~~

从HandleUevent开始分析,添加点打印log

ListenerAction BlockDevInitializer::HandleUevent(const Uevent& uevent,
                                                 std::set<std::string>* devices) {
...
    auto iter = devices->find(name);
    if (iter == devices->end()) {
        return ListenerAction::kContinue;
    }

    LOG(VERBOSE) << __PRETTY_FUNCTION__ << ": found partition: " << name;
    LOG(ERROR) << "HandleUevent found partition: " << name; //这句是我添加的,用ERROR省事,该有的log都出来了

    devices->erase(iter);
    device_handler_->HandleUevent(uevent); //这句是重点,一行代码决定了好多功能~~
    return devices->empty() ? ListenerAction::kStop : ListenerAction::kContinue;
}

运行打印日志如下,确实是我们要的几个分区,super分区也是在的
[ 7.471662] {1}[1:init]init: HandleUevent found partition: boot_a
[ 7.557741] {1}[1:init]init: HandleUevent found partition: super
[ 7.635216] {1}[1:init]init: HandleUevent found partition: vbmeta_system_a
[ 7.712557] {1}[1:init]init: HandleUevent found partition: vbmeta_a

继续看device_handler_->HandleUevent(uevent)发生了什么?
device_handle是devices.cpp中DeviceHandler的实例,我们看一下其HandleUevent函数,看完只想说,我是谁我在哪??

void DeviceHandler::HandleUevent(const Uevent& uevent) {
...
    std::string devpath;
    std::vector<std::string> links;
    bool block = false;

    if (uevent.subsystem == "block") {
        block = true;
        devpath = "/dev/block/" + Basename(uevent.path);

        if (StartsWith(uevent.path, "/devices")) {
            links = GetBlockDeviceSymlinks(uevent);
        }
    } else if (const auto subsystem =
                   std::find(subsystems_.cbegin(), subsystems_.cend(), uevent.subsystem);
               subsystem != subsystems_.cend()) {
        devpath = subsystem->ParseDevPath(uevent);
		...
    } else {
        devpath = "/dev/" + Basename(uevent.path);
    }

    mkdir_recursive(Dirname(devpath), 0755);
    HandleDevice(uevent.action, devpath, block, uevent.major, uevent.minor, links);
}

抽丝剥茧开始吧,谁叫是码农呢。
底层上报肯定是block,且有个links,感觉有戏,有点像symlinks的感觉。
links = GetBlockDeviceSymlinks(uevent);

看下GetBlockDeviceSymlinks函数
看到links.emplace_back("/dev/block/by-name/" + uevent.device_name)这句,感觉要接近真实,还记得前面first_stage_mount.cpp中InitDevices报的错误不??不记得得往回走走,往上翻翻~~~~
/dev/block/by-name/super这个节点找不到,看到GetBlockDeviceSymlinks函数,应该知道这个节点是拼接起来的,但为什么我们会报错呢?

std::vector<std::string> DeviceHandler::GetBlockDeviceSymlinks(const Uevent& uevent) const {

    ...省略点uevent上报内容字段解析 FindPlatformDevice这个里面就是一级级的找

    std::vector<std::string> links;

    PLOG(ERROR) << "found " << type << " device " << device;

    auto link_path = "/dev/block/" + type + "/" + device;

    bool is_boot_device = boot_devices_.find(device) != boot_devices_.end();
    //bool is_boot_device = true;
    PLOG(ERROR) << " is_boot_device: " << is_boot_device;

    if (!uevent.partition_name.empty()) {
        std::string partition_name_sanitized(uevent.partition_name);
        SanitizePartitionName(&partition_name_sanitized);
        if (partition_name_sanitized != uevent.partition_name) {
            PLOG(ERROR) << " Linking partition '" << uevent.partition_name << "' as '"
                         << partition_name_sanitized << "'";
        }
        links.emplace_back(link_path + "/by-name/" + partition_name_sanitized);
        // Adds symlink: /dev/block/by-name/<partition_name>.
        if (is_boot_device) {
            links.emplace_back("/dev/block/by-name/" + partition_name_sanitized);
        }
    } else if (is_boot_device) {
        // If we don't have a partition name but we are a partition on a boot device, create a
        // symlink of /dev/block/by-name/<device_name> for symmetry.
        PLOG(ERROR) << " else is_boot_device: " << is_boot_device;
        links.emplace_back("/dev/block/by-name/" + uevent.device_name);
    }

    auto last_slash = uevent.path.rfind('/');
    links.emplace_back(link_path + "/" + uevent.path.substr(last_slash + 1));

    return links;
}

先搞点Log再说,且把is_boot_device变量打印一下,下面的逻辑跟这个有点关系。
编译运行后,内容如下,四次返回的device都是0.soc/fa507000.sdhci,感觉问题可能出在这里,而且is_boot_device都是0
从上面的代码逻辑来看,貌似需要这个值为true,这样在mount的时候,才可以找到/dev/block/by-name/super的节点

[    7.490103]  {1}[1:init]init: is_boot_device: 0
[    7.557741]  {1}[1:init]init: HandleUevent found partition: super
[    7.566017]  {1}[1:init]init: found platform device 0.soc/fa507000.sdhci
[    7.576258]  {1}[1:init]init: is_boot_device: 0
[    7.635216]  {1}[1:init]init: HandleUevent found partition: vbmeta_system_a
[    7.644339]  {1}[1:init]init: found platform device 0.soc/fa507000.sdhci
[    7.654568]  {1}[1:init]init: is_boot_device: 0
[    7.712557]  {1}[1:init]init: HandleUevent found partition: vbmeta_a
[    7.721111]  {1}[1:init]init: found platform device 0.soc/fa507000.sdhci
[    7.731347]  {1}[1:init]init: is_boot_device: 0: No such file or directory
[    7.796012]  {1}[1:init]init: super partition node is existed.
[    7.803085]  {1}[1:init]init: realpath failed: /dev/block/by-name/super: No such file or directory
[    7.813305]  {1}[1:init]init: Failed to mount required partitions early ...

但为什么原生的代码,我这里会报错呢??
而且上面found device怎么是0.soc/fa507000.sdhci,这个又是从哪来的?

之前在看kernel uevent上报消息日志时,貌似看到过,先去源码目录下搜索一下0.soc,果然在BoardConfig.mk中找到了cmdline中有定义
androidboot.bootdevices=34458000.sdhci androidboot.boot_devices=0.soc/34458000.sdhci
这个明白原因了吧,bootdevices值和底层上报的不相同,修改一下,应该是不同的项目配置用错了值。
androidboot.bootdevices=fa507000.sdhci androidboot.boot_devices=0.soc/fa507000.sdhci
把上面bool is_boot_device = true;这句注释掉,重新编译和烧录开机运行,搞定,不再报这个错误了。

虽然只是修改了这么一点点,但整个分析的过程挺长的,要对整个android分区挂载流程比较清晰点,以上就是这些分析,mark一下也给其他遇到类似问题的人一个方向。

相关标签: android