基于FFmpeg的封装格式MP4(TS)

程序员文章站 2022-07-12 21:20:54

...

一、封装MP4原理：

每一帧音频或视频都有一个持续时间：duration：
采样频率是指将模拟声音波形进行数字化时，每秒钟抽取声波幅度样本的次数。
。正常人听觉的频率范围大约在20Hz~20kHz之间，根据奈奎斯特采样理论，为了保证声音不失真，采样频率应该在40kHz左右。常用的音频采样频率有8kHz、

11.025kHz、22.05kHz、16kHz、37.8kHz、44.1kHz、48kHz等，如果采用更高的采样频率，还可以达到DVD的音质
对采样率为44.1kHz的AAC音频进行解码时，一帧的解码时间须控制在23.22毫秒内。
背景知识:
(一个AAC原始帧包含一段时间内1024个采样及相关数据)
分析：
1) AAC
音频帧的播放时间=一个AAC帧对应的采样样本的个数/采样频率(单位为s)
一帧 1024个 sample。采样率 Samplerate 44100KHz，每秒44100个sample, 所以根据公式音频帧的播放时间=一个AAC帧对应的采样样本的个数/采样频率
当前AAC一帧的播放时间是= 1024*1000000/44100= 22.32ms(单位为ms)
2) MP3
mp3 每帧均为1152个字节，则：
frame_duration = 1152 * 1000000 / sample_rate
例如：sample_rate = 44100HZ时，计算出的时长为26.122ms，这就是经常听到的mp3每帧播放时间固定为26ms的由来。
3)H264
视频的播放时间跟帧率有关 frame_duration = 1000/fps
例如：fps = 25.00 ，计算出来的时常为40ms，这就是同行所说的40ms一帧视频数据。

理论上的音视频(播放)同步是这样的：
由此得到了每一帧数据的持续时间，音视频交叉存储在容器中：一个时间轴：
时间轴：0   22.32   40     44.62    66.96    80     89.16      111.48    120       ................
音   频：0   22.32            44.62    66.96             89.16      111.48                ................
视   频：0              40                              80                                   120       ................
即视频的持续时间相加和音频的持续时间相加作比较，谁小写入哪个。

但实际情况(播放)是不成立的

1：首先解决一个问题

为什么不音频播音频的视频播视频的即上面的到第22.32ms播一帧音频，到40ms播一帧视频。

因为这个22.32ms 或40ms是算不准的或者说和声卡播的时间是不一样的。这里就需要知道声卡播一帧/或者说播放一个buf音频需要多长时间。

2：声卡每次播一个采样点而不是一帧。声音当一个采样点丢失了都可以听出来，视频则不然。

基于FFmpeg的封装格式MP4(TS)

3：音视频同步方式：1----回调方式

假设声卡有两块缓存都是存放要播放的声音pcm的一直在播放"B"buf 首先确定几点

(1)buf大小是固定的这样播放一个buf的时间就是固定的，假设30ms;

(2)当buf“B”播放完毕即buf用完，再播放buf“A",保证音频pcm一直都连续

(3)当一个buf播放完毕,那说明系统(声卡)过了30ms, 这时候有可能真正的时间过了40ms(这里不用关心),这里则通过回调得到一次时间30ms;

(4)再去用视频对应音频的30ms,这时候的时间就是准确的：

时间轴：0                   30                         60                         90                                       120       ................
音   频：0    22.32                 44.62                 66.96     89.16                       111.48                    ................
视   频：0                         40                                    80                                                 120       ................

(5)这里有个问题就是视频中 30ms 到40ms 这中间的10ms是怎么算出来的，这个是不用关心的，因为人的眼睛10ms是看不出来的，

即当音频的30ms一次回调时，就可以播放第二帧视频，如上图

第一次回调(30ms)---播(40ms)视频，

第一次回调(60ms)---播(80ms)视频，

第一次回调(90ms)---不播视频，

第一次回调(120ms)---播(120ms)视频。

基于FFmpeg的封装格式MP4(TS)

4：音视频同步方式：1----阻塞方式

还是看上面的图

(1)buf"B"一直在播放，传入buf"A"的外部buf把数据给buf"A"后不立即返回，等到buf"B"播放完成再返回，

这时从传入到经过阻塞出来就是一个buf的时间例如上面的30ms。

(2)然后buf"A"一直在播放，传入buf"B"的外部buf把数据给buf"B"后不立即返回，等到buf"A"播放完成再返回，

这时从传入到经过阻塞出来就是一个buf的时间例如上面的30ms。

(3)循环上面(1)(2),即得到了如回调方式同样的那个30ms时间。下面和回调方式一样，见回调方式(4)(5)。

二、基于FFmpeg的封装格式处理：

本文记录一个基于FFmpeg的视音频复用器（Simplest FFmpeg muxer）。视音频复用器（Muxer）即是将视频压缩数据（例如H.264）和音频压缩数据（例如AAC）合并到一个封装格式数据（例如MKV）中去。如图所示。在这个过程中并不涉及到编码和解码。

基于FFmpeg的封装格式MP4(TS)

本文记录的程序将一个H.264编码的视频码流文件和一个MP3编码的音频码流文件，合成为一个MP4封装格式的文件。

流程

程序的流程如下图所示。从流程图中可以看出，一共初始化了3个AVFormatContext，其中2个用于输入，1个用于输出。3个AVFormatContext初始化之后，通过avcodec_copy_context()函数可以将输入视频/音频的参数拷贝至输出视频/音频的AVCodecContext结构体。然后分别调用视频输入流和音频输入流的av_read_frame()，从视频输入流中取出视频的AVPacket，音频输入流中取出音频的AVPacket，分别将取出的AVPacket写入到输出文件中即可。其间用到了一个不太常见的函数av_compare_ts()，是比较时间戳用的。通过该函数可以决定该写入视频还是音频。

单击查看更清晰的图片

本文介绍的视音频复用器，输入的视频不一定是H.264裸流文件，音频也不一定是纯音频文件。可以选择两个封装过的视音频文件作为输入。程序会从视频输入文件中“挑”出视频流，音频输入文件中“挑”出音频流，再将“挑选”出来的视音频流复用起来。
PS1：对于某些封装格式（例如MP4/FLV/MKV等）中的H.264，需要用到名称为“h264_mp4toannexb”的bitstream filter。
PS2：对于某些封装格式（例如MP4/FLV/MKV等）中的AAC，需要用到名称为“aac_adtstoasc”的bitstream filter。

简单介绍一下流程中各个重要函数的意义：

avformat_open_input()：打开输入文件。
avcodec_copy_context()：赋值AVCodecContext的参数。
avformat_alloc_output_context2()：初始化输出文件。
avio_open()：打开输出文件。
avformat_write_header()：写入文件头。
av_compare_ts()：比较时间戳，决定写入视频还是写入音频。这个函数相对要少见一些。
av_read_frame()：从输入文件读取一个AVPacket。
av_interleaved_write_frame()：写入一个AVPacket到输出文件。
av_write_trailer()：写入文件尾。

代码

下面贴上代码：

 /** 
 * 最简单的基于FFmpeg的视音频复用器 
 * Simplest FFmpeg Muxer 
 * 本程序可以将视频码流和音频码流打包到一种封装格式中。 
 * 程序中将AAC编码的音频码流和H.264编码的视频码流打包成 
 * MPEG2TS封装格式的文件。 
 * 需要注意的是本程序并不改变视音频的编码格式。 
 * 
 * This software mux a video bitstream and a audio bitstream  
 * together into a file. 
 * In this example, it mux a H.264 bitstream (in MPEG2TS) and  
 * a AAC bitstream file together into MP4 format file. 
 * 
 */ 
 
#include <stdio.h>  
 
#define __STDC_CONSTANT_MACROS  
 
#ifdef _WIN32  
//Windows  
extern "C"  
{  
#include "libavformat/avformat.h"  
};  
#else  
//Linux...  
#ifdef __cplusplus  
extern "C"  
{  
#endif  
#include <libavformat/avformat.h>  
#ifdef __cplusplus  
};  
#endif  
#endif  
  
/* 
FIX: H.264 in some container format (FLV, MP4, MKV etc.) need  
"h264_mp4toannexb" bitstream filter (BSF) 
  *Add SPS,PPS in front of IDR frame 
  *Add start code ("0,0,0,1") in front of NALU 
H.264 in some container (MPEG2TS) don't need this BSF. 
*/  
//'1': Use H.264 Bitstream Filter   
#define USE_H264BSF 0  
  
/* 
FIX:AAC in some container format (FLV, MP4, MKV etc.) need  
"aac_adtstoasc" bitstream filter (BSF) 
*/  
//'1': Use AAC Bitstream Filter   
#define USE_AACBSF 0  
  
  
  
int main(int argc, char* argv[])  
{  
    AVOutputFormat *ofmt = NULL;  
    //Input AVFormatContext and Output AVFormatContext  
    AVFormatContext *ifmt_ctx_v = NULL, *ifmt_ctx_a = NULL,*ofmt_ctx = NULL;  
    AVPacket pkt;  
    int ret, i;  
    int videoindex_v=-1,videoindex_out=-1;  
    int audioindex_a=-1,audioindex_out=-1;  
    int frame_index=0;  
    int64_t cur_pts_v=0,cur_pts_a=0;  
  
    //const char *in_filename_v = "cuc_ieschool.ts";//Input file URL  
    const char *in_filename_v = "cuc_ieschool.h264";  
    //const char *in_filename_a = "cuc_ieschool.mp3";  
    //const char *in_filename_a = "gowest.m4a";  
    //const char *in_filename_a = "gowest.aac";  
    const char *in_filename_a = "huoyuanjia.mp3";  
  
    const char *out_filename = "cuc_ieschool.mp4";//Output file URL  
    av_register_all();  
    //Input  
    if ((ret = avformat_open_input(&ifmt_ctx_v, in_filename_v, 0, 0)) < 0) {  
        printf( "Could not open input file.");  
        goto end;  
    }  
    if ((ret = avformat_find_stream_info(ifmt_ctx_v, 0)) < 0) {  
        printf( "Failed to retrieve input stream information");  
        goto end;  
    }  
  
    if ((ret = avformat_open_input(&ifmt_ctx_a, in_filename_a, 0, 0)) < 0) {  
        printf( "Could not open input file.");  
        goto end;  
    }  
    if ((ret = avformat_find_stream_info(ifmt_ctx_a, 0)) < 0) {  
        printf( "Failed to retrieve input stream information");  
        goto end;  
    }  
    printf("===========Input Information==========\n");  
    av_dump_format(ifmt_ctx_v, 0, in_filename_v, 0);  
    av_dump_format(ifmt_ctx_a, 0, in_filename_a, 0);  
    printf("======================================\n");  
    //Output  
    avformat_alloc_output_context2(&ofmt_ctx, NULL, NULL, out_filename);  
    if (!ofmt_ctx) {  
        printf( "Could not create output context\n");  
        ret = AVERROR_UNKNOWN;  
        goto end;  
    }  
    ofmt = ofmt_ctx->oformat;  
  
    for (i = 0; i < ifmt_ctx_v->nb_streams; i++) {  
        //Create output AVStream according to input AVStream  
        if(ifmt_ctx_v->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO){  
        AVStream *in_stream = ifmt_ctx_v->streams[i];  
        AVStream *out_stream = avformat_new_stream(ofmt_ctx, in_stream->codec->codec);  
        videoindex_v=i;  
        if (!out_stream) {  
            printf( "Failed allocating output stream\n");  
            ret = AVERROR_UNKNOWN;  
            goto end;  
        }  
        videoindex_out=out_stream->index;  
        //Copy the settings of AVCodecContext  
        if (avcodec_copy_context(out_stream->codec, in_stream->codec) < 0) {  
            printf( "Failed to copy context from input to output stream codec context\n");  
            goto end;  
        }  
        out_stream->codec->codec_tag = 0;  
        if (ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)  
            out_stream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;  
        break;  
        }  
    }  
  
    for (i = 0; i < ifmt_ctx_a->nb_streams; i++) {  
        //Create output AVStream according to input AVStream  
        if(ifmt_ctx_a->streams[i]->codec->codec_type==AVMEDIA_TYPE_AUDIO){  
            AVStream *in_stream = ifmt_ctx_a->streams[i];  
            AVStream *out_stream = avformat_new_stream(ofmt_ctx, in_stream->codec->codec);  
            audioindex_a=i;  
            if (!out_stream) {  
                printf( "Failed allocating output stream\n");  
                ret = AVERROR_UNKNOWN;  
                goto end;  
            }  
            audioindex_out=out_stream->index;  
            //Copy the settings of AVCodecContext  
            if (avcodec_copy_context(out_stream->codec, in_stream->codec) < 0) {  
                printf( "Failed to copy context from input to output stream codec context\n");  
                goto end;  
            }  
            out_stream->codec->codec_tag = 0;  
            if (ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)  
                out_stream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;  
  
            break;  
        }  
    }  
  
    printf("==========Output Information==========\n");  
    av_dump_format(ofmt_ctx, 0, out_filename, 1);  
    printf("======================================\n");  
    //Open output file  
    if (!(ofmt->flags & AVFMT_NOFILE)) {  
        if (avio_open(&ofmt_ctx->pb, out_filename, AVIO_FLAG_WRITE) < 0) {  
            printf( "Could not open output file '%s'", out_filename);  
            goto end;  
        }  
    }  
    //Write file header  
    if (avformat_write_header(ofmt_ctx, NULL) < 0) {  
        printf( "Error occurred when opening output file\n");  
        goto end;  
    }  
  
  
    //FIX  
#if USE_H264BSF  
    AVBitStreamFilterContext* h264bsfc =  av_bitstream_filter_init("h264_mp4toannexb");   
#endif  
#if USE_AACBSF  
    AVBitStreamFilterContext* aacbsfc =  av_bitstream_filter_init("aac_adtstoasc");   
#endif  
  
    while (1) {  
        AVFormatContext *ifmt_ctx;  
        int stream_index=0;  
        AVStream *in_stream, *out_stream;  
  
        //Get an AVPacket  
        if(av_compare_ts(cur_pts_v,ifmt_ctx_v->streams[videoindex_v]->time_base,cur_pts_a,ifmt_ctx_a->streams[audioindex_a]->time_base) <= 0){  
            ifmt_ctx=ifmt_ctx_v;  
            stream_index=videoindex_out;  
  
            if(av_read_frame(ifmt_ctx, &pkt) >= 0){  
                do{  
                    in_stream  = ifmt_ctx->streams[pkt.stream_index];  
                    out_stream = ofmt_ctx->streams[stream_index];  
  
                    if(pkt.stream_index==videoindex_v){  
                        //FIX：No PTS (Example: Raw H.264)  
                        //Simple Write PTS  
                        if(pkt.pts==AV_NOPTS_VALUE){  
                            //Write PTS  
                            AVRational time_base1=in_stream->time_base;  
                            //Duration between 2 frames (us)  
                            int64_t calc_duration=(double)AV_TIME_BASE/av_q2d(in_stream->r_frame_rate);  
                            //Parameters  
                            pkt.pts=(double)(frame_index*calc_duration)/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            pkt.dts=pkt.pts;  
                            pkt.duration=(double)calc_duration/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            frame_index++;  
                        }  
  
                        cur_pts_v=pkt.pts;  
                        break;  
                    }  
                }while(av_read_frame(ifmt_ctx, &pkt) >= 0);  
            }else{  
                break;  
            }  
        }else{  
            ifmt_ctx=ifmt_ctx_a;  
            stream_index=audioindex_out;  
            if(av_read_frame(ifmt_ctx, &pkt) >= 0){  
                do{  
                    in_stream  = ifmt_ctx->streams[pkt.stream_index];  
                    out_stream = ofmt_ctx->streams[stream_index];  
  
                    if(pkt.stream_index==audioindex_a){  
  
                        //FIX：No PTS  
                        //Simple Write PTS  
                        if(pkt.pts==AV_NOPTS_VALUE){  
                            //Write PTS  
                            AVRational time_base1=in_stream->time_base;  
                            //Duration between 2 frames (us)  
                            int64_t calc_duration=(double)AV_TIME_BASE/av_q2d(in_stream->r_frame_rate);  
                            //Parameters  
                            pkt.pts=(double)(frame_index*calc_duration)/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            pkt.dts=pkt.pts;  
                            pkt.duration=(double)calc_duration/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            frame_index++;  
                        }  
                        cur_pts_a=pkt.pts;  
  
                        break;  
                    }  
                }while(av_read_frame(ifmt_ctx, &pkt) >= 0);  
            }else{  
                break;  
            }  
  
        }  
  
        //FIX:Bitstream Filter  
#if USE_H264BSF  
        av_bitstream_filter_filter(h264bsfc, in_stream->codec, NULL, &pkt.data, &pkt.size, pkt.data, pkt.size, 0);  
#endif  
#if USE_AACBSF  
        av_bitstream_filter_filter(aacbsfc, out_stream->codec, NULL, &pkt.data, &pkt.size, pkt.data, pkt.size, 0);  
#endif  
  
  
        //Convert PTS/DTS  
        pkt.pts = av_rescale_q_rnd(pkt.pts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));  
        pkt.dts = av_rescale_q_rnd(pkt.dts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));  
        pkt.duration = av_rescale_q(pkt.duration, in_stream->time_base, out_stream->time_base);  
        pkt.pos = -1;  
        pkt.stream_index=stream_index;  
  
        printf("Write 1 Packet. size:%5d\tpts:%lld\n",pkt.size,pkt.pts);  
        //Write  
        if (av_interleaved_write_frame(ofmt_ctx, &pkt) < 0) {  
            printf( "Error muxing packet\n");  
            break;  
        }  
        av_free_packet(&pkt);  
  
    }  
    //Write file trailer  
    av_write_trailer(ofmt_ctx);  
  
#if USE_H264BSF  
    av_bitstream_filter_close(h264bsfc);  
#endif  
#if USE_AACBSF  
    av_bitstream_filter_close(aacbsfc);  
#endif  
  
end:  
    avformat_close_input(&ifmt_ctx_v);  
    avformat_close_input(&ifmt_ctx_a);  
    /* close output */  
    if (ofmt_ctx && !(ofmt->flags & AVFMT_NOFILE))  
        avio_close(ofmt_ctx->pb);  
    avformat_free_context(ofmt_ctx);  
    if (ret < 0 && ret != AVERROR_EOF) {  
        printf( "Error occurred.\n");  
        return -1;  
    }  
    return 0;  
}

结果

输入文件为：
视频：cuc_ieschool.ts

音频：huoyuanjia.mp3

输出文件为：
cuc_ieschool.mp4
输出的文件视频为“cuc_ieschool”，配合“霍元甲”的音频。

参考博客
最简单的基于FFmpeg的封装格式处理：视音频复用器（muxer）
http://blog.csdn.net/leixiaohua1020/article/details/39802913/
音视频同步(播放)原理
http://blog.csdn.net/zhuweigangzwg/article/details/25815851

相关标签：音视频播放同步原理视音频复用器 FFmpeg音视频同步 MP4封装格式 TS封装格式

上一篇： ffmpeg 换容器之 MP4转flv 命令行加C语言

下一篇： python + ffmpeg 将某站的缓存视频文件批量转换为MP4格式

基于FFmpeg的封装格式MP4(TS)

二、基于FFmpeg的封装格式处理：

流程

代码

结果

基于FFmpeg源码分析TS数据格式的解析

ffmpeg合并M3U8加密的视频 ts 合并为 mp4

ffmpeg开发之旅(3)：AAC编码格式分析与MP4文件封装(MediaCodec+MediaMuxer)