2020年2月

跟踪qemu-kvm下的磁盘写入

傻了,上一个调试的时候没加-enable-kvm,而且电脑的虚拟化也是关着的。假装无事发生过,一切调整就绪后,重新在KVM模式下调试。终于在另一台linux老爷机上装好了qemu和各种软件,继续从这里来,qcow2_pre_write_overlap_check下个断点,这里的栈和TCG模式一样,继续操作,b blk_aio_prwv

(gdb) bt
#0  qcow2_pre_write_overlap_check (bs=0x558eef1841a0, ign=0, offset=1670656, 
    size=4096, data_file=true) at block/qcow2-refcount.c:2817
#1  0x0000558eedcb12e6 in qcow2_co_pwritev_part (bs=0x558eef1841a0, 
    offset=1879080448, bytes=4096, qiov=0x7fa0e4236760, qiov_offset=0, flags=0)
    at block/qcow2.c:2513
#2  0x0000558eedcfe0de in bdrv_driver_pwritev (bs=0x558eef1841a0, 
    offset=1879080448, bytes=4096, qiov=0x7fa0e4236760, qiov_offset=0, flags=0)
    at block/io.c:1171
#3  0x0000558eedd000a5 in bdrv_aligned_pwritev (child=0x558eef191900, 
    req=0x7fa0b8acae10, offset=1879080448, bytes=4096, align=1, 
    qiov=0x7fa0e4236760, qiov_offset=0, flags=0) at block/io.c:1980
#4  0x0000558eedd0087f in bdrv_co_pwritev_part (child=0x558eef191900, 
    offset=1879080448, bytes=4096, qiov=0x7fa0e4236760, qiov_offset=0, flags=0)
    at block/io.c:2137
#5  0x0000558eedce6f6d in blk_co_pwritev_part (blk=0x558eef183e40, 
    offset=1879080448, bytes=4096, qiov=0x7fa0e4236760, qiov_offset=0, flags=0)
    at block/block-backend.c:1211
#6  0x0000558eedce6fbf in blk_co_pwritev (blk=0x558eef183e40, 
    offset=1879080448, bytes=4096, qiov=0x7fa0e4236760, flags=0)
    at block/block-backend.c:1221
#7  0x0000558eedce7795 in blk_aio_write_entry (opaque=0x7fa0e4238780)
    at block/block-backend.c:1415
#8  0x0000558eedddcc2f in coroutine_trampoline (i0=-467430144, i1=32672)
    at util/coroutine-ucontext.c:115
#9  0x00007fa0f56c8000 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007fa0e9cbad90 in ?? ()
#11 0x0000000000000000 in ?? ()

断下来以后,可以看到,除了上层的消息循环变成了kvm的,后面都是一样,通过直接向ioport写数据,然后转移到对应的后端处理函数中。差不多就调试完了,后面开设了一个网站督促自己读代码,qemu.world,等我想起来就更新。

(gdb) bt
#0  blk_aio_prwv (blk=0x558eef183e40, offset=0, bytes=0, iobuf=0x0, 
    co_entry=0x558eedce7a28 <blk_aio_flush_entry>, flags=0, 
    cb=0x558eedaad47c <ide_flush_cb>, opaque=0x558eefc24730)
    at block/block-backend.c:1360
#1  0x0000558eedce7ab1 in blk_aio_flush (blk=0x558eef183e40, 
    cb=0x558eedaad47c <ide_flush_cb>, opaque=0x558eefc24730)
    at block/block-backend.c:1503
#2  0x0000558eedaad5da in ide_flush_cache (s=0x558eefc24730)
    at hw/ide/core.c:1088
#3  0x0000558eedaae5b3 in cmd_flush_cache (s=0x558eefc24730, cmd=231 '\347')
    at hw/ide/core.c:1554
#4  0x0000558eedaaf8c5 in ide_exec_cmd (bus=0x558eefc246b0, val=231)
    at hw/ide/core.c:2085
#5  0x0000558eedaaddef in ide_ioport_write (opaque=0x558eefc246b0, addr=503, 
    val=231) at hw/ide/core.c:1294
#6  0x0000558eed85cd3f in portio_write (opaque=0x558eefcbff30, addr=7, 
    data=231, size=1) at /home/leon/qemu-4.2.0/ioport.c:201
#7  0x0000558eed861fbc in memory_region_write_accessor (mr=0x558eefcbff30, 
    addr=7, value=0x7fa0e9cbb818, size=1, shift=0, mask=255, attrs=...)
    at /home/leon/qemu-4.2.0/memory.c:483
#8  0x0000558eed8621a6 in access_with_adjusted_size (addr=7, 
    value=0x7fa0e9cbb818, size=1, access_size_min=1, access_size_max=4, 
    access_fn=0x558eed861efc <memory_region_write_accessor>, 
    mr=0x558eefcbff30, attrs=...) at /home/leon/qemu-4.2.0/memory.c:544
#9  0x0000558eed8650d7 in memory_region_dispatch_write (mr=0x558eefcbff30, addr=7, data=231, op=MO_8, attrs=...) at /home/leon/qemu-4.2.0/memory.c:1475
#10 0x0000558eed803386 in flatview_write_continue (fv=0x7fa0e410c970, addr=503, attrs=..., buf=0x7fa0f86ac000 "\347\200\354\036", len=1, addr1=7, l=1, mr=0x558eefcbff30) at /home/leon/qemu-4.2.0/exec.c:3129
#11 0x0000558eed8034cb in flatview_write (fv=0x7fa0e410c970, addr=503, attrs=..., buf=0x7fa0f86ac000 "\347\200\354\036", len=1) at /home/leon/qemu-4.2.0/exec.c:3169
#12 0x0000558eed803818 in address_space_write (as=0x558eee7a4b60 <address_space_io>, addr=503, attrs=..., buf=0x7fa0f86ac000 "\347\200\354\036", len=1) at /home/leon/qemu-4.2.0/exec.c:3259
#13 0x0000558eed803885 in address_space_rw (as=0x558eee7a4b60 <address_space_io>, addr=503, attrs=..., buf=0x7fa0f86ac000 "\347\200\354\036", len=1, is_write=true) at /home/leon/qemu-4.2.0/exec.c:3269
#14 0x0000558eed87cf9f in kvm_handle_io (port=503, attrs=..., data=0x7fa0f86ac000, direction=1, size=1, count=1) at /home/leon/qemu-4.2.0/accel/kvm/kvm-all.c:2104
#15 0x0000558eed87d737 in kvm_cpu_exec (cpu=0x558eef1b29b0) at /home/leon/qemu-4.2.0/accel/kvm/kvm-all.c:2350
#16 0x0000558eed853017 in qemu_kvm_cpu_thread_fn (arg=0x558eef1b29b0) at /home/leon/qemu-4.2.0/cpus.c:1318
#17 0x0000558eeddc042b in qemu_thread_start (args=0x558eef1da7e0) at util/qemu-thread-posix.c:519
#18 0x00007fa0f5a2a4a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007fa0f576cd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

调试qemu 硬盘io的过程

好久没有水文章了……在家无聊,正好最近也是在研究虚拟化相关的东西,就调一调qemu中文件写入的流程吧。

这里说的写入是指,qemu启动的虚拟机,虚拟机中如果发生文件IO,那么qemu如何知道要更新对应的虚拟磁盘文件呢?qemu这方面我比较菜,说实话,刚接触不到1周,感觉能水的文章还是挺多的。而且本篇大概率会有错误……反正不管,先从这个开始吧。

先粘一下编译选项,后面换机器不用再找了……直接复制
./configure --target-list=x86_64-softmmu --enable-kvm --enable-debug --enable-debug-info --enable-modules --enable-vnc --disable-strip

为了方便调试,我将qemu启动的虚拟机设置成为TinyCore Linux(http://www.tinycorelinux.net/)。毕竟现在我还在老家,搞不到Linux电脑,实际的调试环境是Windows上跑一个VirtualBox,里面跑个Linux,Linux再跑Qemu,如果是比较完整的Linux,估计我这台老爷机得卡死,所以一切最简化,用这个Linux安装一个命令行版的就可以了。

(后记:因为我启动参数配置错误,整个虚拟机跑在tcg模式下,性能依旧很慢,不过先不管这些,直接看看tcg下是如何通知到硬盘写入操作的,是否和kvm不同。)

我为虚拟机设置的磁盘格式是qcow2格式,然而问题来了,我该从哪里下手,换言之,我该断哪个函数?众所周知,也可能不知,与块设备相关的文件大部分位于block/下面。于是直接在block/下搜索qcow2 AND write,很快,发现几个函数,其中一个是qcow2_pre_write_overlap_check,看起来是一个很有用的校验函数。gdb挂上qemu后下个断点,很快地,就能断到它。

Thread 5 (Thread 0x7f8f31d33700 (LWP 23615)):
#0  0x0000562359abf4f0 in qcow2_pre_write_overlap_check (bs=0x56235abb8280, ign=0, offset=359936, size=4096, data_file=true) at block/qcow2-refcount.c:2817
#1  0x0000562359ab132a in qcow2_co_pwritev_part (bs=0x56235abb8280, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/qcow2.c:2513
#2  0x0000562359afe694 in bdrv_driver_pwritev (bs=0x56235abb8280, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/io.c:1171
#3  0x0000562359b0066a in bdrv_aligned_pwritev (child=0x56235aa76db0, req=0x7f8f183e9e10, offset=32256, bytes=4096, align=1, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/io.c:1980
#4  0x0000562359b00e44 in bdrv_co_pwritev_part (child=0x56235aa76db0, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/io.c:2137
#5  0x0000562359ae736b in blk_co_pwritev_part (blk=0x56235aaa6ed0, offset=32256, bytes=4096, qiov=0x7f8f14136db0, qiov_offset=0, flags=0) at block/block-backend.c:1211
#6  0x0000562359ae73bd in blk_co_pwritev (blk=0x56235aaa6ed0, offset=32256, bytes=4096, qiov=0x7f8f14136db0, flags=0) at block/block-backend.c:1221
#7  0x0000562359ae7b93 in blk_aio_write_entry (opaque=0x7f8f14024650) at block/block-backend.c:1415
#8  0x0000562359beafcb in coroutine_trampoline (i0=335845504, i1=32655) at util/coroutine-ucontext.c:115
#9  0x00007f8f504286b0 in __start_context () at /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007f8f31d2ef80 in  ()
#11 0x0000000000000000 in  ()

coroutine_trampoline是qemu实现协程的主要函数,而进入的入口则是blk_aio_write_entry

搜索对blk_aio_write_entry的引用,可以发现仅有这两处引用:

block-backend.c
1424    return blk_aio_prwv(blk, offset, count, NULL, blk_aio_write_entry,
1428                        blk_aio_write_entry, flags, cb, opaque);

分别位于

1424
blk_aio_pwrite_zeroes -> blk_aio_prwv

1428:
blk_aio_pwritev -> blk_aio_prwv

而在blk_aio_prwv中,可以明显的看到这个协程的创建过程。

static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
                                void *iobuf, CoroutineEntry co_entry,
                                BdrvRequestFlags flags,
                                BlockCompletionFunc *cb, void *opaque) {
    BlkAioEmAIOCB *acb;
    Coroutine *co;

    blk_inc_in_flight(blk);
    acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
    acb->rwco = (BlkRwCo) {
        .blk    = blk,
        .offset = offset,
        .iobuf  = iobuf,
        .flags  = flags,
        .ret    = NOT_DONE,
    };
    acb->bytes = bytes;
    acb->has_returned = false;

    /* HERE */co = qemu_coroutine_create(co_entry, acb);
    bdrv_coroutine_enter(blk_bs(blk), co);

    acb->has_returned = true;
    if (acb->rwco.ret != NOT_DONE) {
        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
                                         blk_aio_complete_bh, acb);
    }

    return &acb->common; }

协程非常类似于线程。但是协程是协作式多任务的,而线程典型是抢占式多任务的。这意味着协程提供并发性而非并行性。
知道协程的创建位置就好办了,继续往上层的blk_aio_prwv挂断点。

很快,我们可以拿到这样的栈,而且是带消息循环的栈,大致就能知道断点下对了。

#0  blk_aio_prwv (blk=0x55a4a09c5800, offset=0, bytes=4096, iobuf=0x7f1dc8036c60, co_entry=0x55a49e41d9d0 <blk_aio_read_entry>, flags=0, cb=0x55a49e0ddbc2 <dma_blk_cb>, opaque=0x7f1dc8036c00)
    at block/block-backend.c:1360
#1  0x000055a49e41ddc5 in blk_aio_preadv (blk=0x55a4a09c5800, offset=0, qiov=0x7f1dc8036c60, flags=0, cb=0x55a49e0ddbc2 <dma_blk_cb>, opaque=0x7f1dc8036c00) at block/block-backend.c:1479
#2  0x000055a49e0de16a in dma_blk_read_io_func (offset=0, iov=0x7f1dc8036c60, cb=0x55a49e0ddbc2 <dma_blk_cb>, cb_opaque=0x7f1dc8036c00, opaque=0x55a4a09c5800) at dma-helpers.c:243
#3  0x000055a49e0dde9a in dma_blk_cb (opaque=0x7f1dc8036c00, ret=0) at dma-helpers.c:168
#4  0x000055a49e0de119 in dma_blk_io (ctx=0x55a4a08876d0, sg=0x55a4a171b788, offset=0, align=512, io_func=0x55a49e0de11f <dma_blk_read_io_func>, io_func_opaque=0x55a4a09c5800, 
    cb=0x55a49e1cadf1 <ide_dma_cb>, opaque=0x55a4a171b460, dir=DMA_DIRECTION_FROM_DEVICE) at dma-helpers.c:232
#5  0x000055a49e0de1c7 in dma_blk_read (blk=0x55a4a09c5800, sg=0x55a4a171b788, offset=0, align=512, cb=0x55a49e1cadf1 <ide_dma_cb>, opaque=0x55a4a171b460) at dma-helpers.c:250
#6  0x000055a49e1cb11f in ide_dma_cb (opaque=0x55a4a171b460, ret=0) at hw/ide/core.c:915
#7  0x000055a49e1d4d79 in bmdma_cmd_writeb (bm=0x55a4a171c5b0, val=9) at hw/ide/pci.c:306
#8  0x000055a49e1d5aad in bmdma_write (opaque=0x55a4a171c5b0, addr=0, val=9, size=1) at hw/ide/piix.c:75
#9  0x000055a49df42831 in memory_region_write_accessor (mr=0x55a4a171c700, addr=0, value=0x7f1dd8ea5a48, size=1, shift=0, mask=255, attrs=...) at /home/leon/qemu-4.2.0/memory.c:483
#10 0x000055a49df42a18 in access_with_adjusted_size (addr=0, value=0x7f1dd8ea5a48, size=1, access_size_min=1, access_size_max=4, access_fn=0x55a49df42771 <memory_region_write_accessor>, 
    mr=0x55a4a171c700, attrs=...) at /home/leon/qemu-4.2.0/memory.c:544
#11 0x000055a49df459c2 in memory_region_dispatch_write (mr=0x55a4a171c700, addr=0, data=9, op=MO_8, attrs=...) at /home/leon/qemu-4.2.0/memory.c:1475
#12 0x000055a49dee5a07 in address_space_stb (as=0x55a49eeac0e0 <address_space_io>, addr=49216, val=9, attrs=..., result=0x0) at /home/leon/qemu-4.2.0/memory_ldst.inc.c:378
#13 0x000055a49e0a7d16 in helper_outb (env=0x55a4a0bfa3e0, port=49216, data=9) at /home/leon/qemu-4.2.0/target/i386/misc_helper.c:33
#14 0x00007f1dbd998d65 in code_gen_buffer ()
#15 0x000055a49df7ad63 in cpu_tb_exec (cpu=0x55a4a0bf1b80, itb=0x7f1dbde60980 <code_gen_buffer+31852886>) at /home/leon/qemu-4.2.0/accel/tcg/cpu-exec.c:172
#16 0x000055a49df7bc47 in cpu_loop_exec_tb (cpu=0x55a4a0bf1b80, tb=0x7f1dbde60980 <code_gen_buffer+31852886>, last_tb=0x7f1dd8ea6078, tb_exit=0x7f1dd8ea6070)
    at /home/leon/qemu-4.2.0/accel/tcg/cpu-exec.c:618
#17 0x000055a49df7bf61 in cpu_exec (cpu=0x55a4a0bf1b80) at /home/leon/qemu-4.2.0/accel/tcg/cpu-exec.c:731
#18 0x000055a49df33eb8 in tcg_cpu_exec (cpu=0x55a4a0bf1b80) at /home/leon/qemu-4.2.0/cpus.c:1473
#19 0x000055a49df3470e in qemu_tcg_cpu_thread_fn (arg=0x55a4a0bf1b80) at /home/leon/qemu-4.2.0/cpus.c:1781
#20 0x000055a49e50488c in qemu_thread_start (args=0x55a4a0956070) at util/qemu-thread-posix.c:519
#21 0x00007f1df39476db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#22 0x00007f1df366988f in clone () from /lib/x86_64-linux-gnu/libc.so.6

基本就是ioport直接写的方式。通过这个硬件直接操作的方式,向cmd646设备写数据,来通知bmdma_write后面一系列函数。具体的后面再看,等过段时间我去linux机器上再确认Kvm的通知方式是否不一样,虽然感觉应该是一样的。