@[toc]
文章目录
1.简介
2.DEBUG日志详情
3.结论
1. 简介
下面是在hive debug模式下,截取的一些关键日志内容(按时间顺序记录的,并加了些注释) 如果不想看杂乱的日志文件,我也可以直接告诉你结果。 Hive load外部数据时,先读取外部数据,然后把外部数据 copy 到了本地 hive/warehouse 目录下,最后把外部数据 delete 掉(这操作真骚,Hive为什么要这么做呢?)。
2. DEBUG日志详情
默认main 进程在操作
s3a.S3AFileSystem: op_glob_status += 1 -> 2
s3a.S3AFileSystem: op_get_file_status += 1 -> 2
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz) //get path status
s3a.S3AFileSystem: object_metadata_requests += 1 -> 2
s3a.S3AFileSystem: Found exact file: normal file //找到了精确文件(normal)
s3a.S3AFileSystem: List status for path: s3a://BucketKun/bbb/000000.gz //List path
s3a.S3AFileSystem: op_list_status += 1 -> 1
s3a.S3AFileSystem: op_get_file_status += 1 -> 3
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz) //get path status
s3a.S3AFileSystem: object_metadata_requests += 1 -> 3
s3a.S3AFileSystem: Found exact file: normal file //找到了精确文件(normal)
s3a.S3AFileSystem: Adding: rd (not a dir): s3a://BucketKun/bbb/000000.gz //???
s3a.S3AFileSystem: op_get_file_status += 1 -> 4
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz) //get path status
s3a.S3AFileSystem: object_metadata_requests += 1 -> 4
s3a.S3AFileSystem: Found exact file: normal file //找到了精确文件(normal)
s3a.S3AFileSystem: Opening 's3a://BucketKun/bbb/000000.gz' for reading. //open the file for reading
s3a.S3AFileSystem: op_get_file_status += 1 -> 5
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz) //get path status
s3a.S3AFileSystem: object_metadata_requests += 1 -> 5
s3a.S3AFileSystem: Found exact file: normal file //找到了精确文件(normal)
s3a.S3AInputStream: reopen(s3a://BucketKun/bbb/000000.gz) for read from new offset range[0-386], length=4, streamPosition=0, nextReadPosition=0, policy=normal //reopen for read
s3a.S3AInputStream: Closing stream close() operation: soft //close
s3a.S3AInputStream: Drained stream of 382 bytes
s3a.S3AInputStream: Stream s3a://BucketKun/bbb/000000.gz closed: close() operation; remaining=382 streamPos=4, nextReadPos=4, request range 0-386 length=386
s3a.S3AInputStream: Statistics of stream bbb/000000.gz //bbb/000000.gz流的统计数据
StreamStatistics{OpenOperations=1, CloseOperations=1, Closed=1, Aborted=0, SeekOperations=0, ReadExceptions=0, ForwardSeekOperations=0, BackwardSeekOperations=0, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=0, BytesRead=4, BytesRead excluding skipped=4, ReadOperations=1, ReadFullyOperations=0, ReadsIncomplete=0, BytesReadInClose=382, BytesDiscardedInAbort=0, InputPolicy=0, InputPolicySetCount=1}
FileOperations: moving s3a://BucketKun/bbb/000000.gz to file:/user/hive/warehouse/text_gzip5 (replace = KEEP_EXISTING)
s3a.S3AFileSystem: op_glob_status += 1 -> 3
s3a.S3AFileSystem: op_get_file_status += 1 -> 6
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz)
s3a.S3AFileSystem: object_metadata_requests += 1 -> 6
s3a.S3AFileSystem: Found exact file: normal file
复制代码
move-thread-0 进程在操作
s3a.S3AFileSystem: op_get_file_status += 1 -> 7
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz) //get path status
s3a.S3AFileSystem: object_metadata_requests += 1 -> 7
s3a.S3AFileSystem: Found exact file: normal file //找到了精确文件(normal)
s3a.S3AFileSystem: Opening 's3a://BucketKun/bbb/000000.gz' for reading.
s3a.S3AFileSystem: op_get_file_status += 1 -> 8
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz)
s3a.S3AFileSystem: object_metadata_requests += 1 -> 8
s3a.S3AFileSystem: Found exact file: normal file
s3a.S3AInputStream: reopen(s3a://BucketKun/bbb/000000.gz) for read from new offset range[0-386], length=4096, streamPosition=0, nextReadPosition=0, policy=normal
s3a.S3AInputStream: Closing stream close() operation: soft
s3a.S3AInputStream: Drained stream of 0 bytes
s3a.S3AInputStream: Stream s3a://BucketKun/bbb/000000.gz closed: close() operation; remaining=0 streamPos=386, nextReadPos=386, request range 0-386 length=386
s3a.S3AInputStream: Statistics of stream bbb/000000.gz
StreamStatistics{OpenOperations=1, CloseOperations=1, Closed=1, Aborted=0, SeekOperations=0, ReadExceptions=0, ForwardSeekOperations=0, BackwardSeekOperations=0, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=0, BytesRead=386, BytesRead excluding skipped=386, ReadOperations=1, ReadFullyOperations=0, ReadsIncomplete=1, BytesReadInClose=0, BytesDiscardedInAbort=0, InputPolicy=0, InputPolicySetCount=1}
s3a.S3AFileSystem: op_get_file_status += 1 -> 9
s3a.S3AFileSystem: Getting path status for s3a://BucketKun/bbb/000000.gz (bbb/000000.gz)
s3a.S3AFileSystem: object_metadata_requests += 1 -> 9
s3a.S3AFileSystem: Found exact file: normal file
s3a.S3AFileSystem: Delete path s3a://BucketKun/bbb/000000.gz - recursive true
s3a.S3AFileSystem: delete: Path is a file
s3a.S3AFileSystem: object_delete_requests += 1 -> 1
s3a.S3AFileSystem: object_metadata_requests += 1 -> 10s3a.S3AFileSystem: object_metadata_requests += 1 -> 11
s3a.S3AFileSystem: Found file (with /): fake directory
复制代码
main 进程操作
metadata.Hive: Moved src: s3a://BucketKun/bbb/000000.gz, to dest: file:/user/hive/warehouse/text_gzip5/000000_copy_1.gz
复制代码
3. 结论
Hive load外部数据时,先读取外部数据,然后把外部数据 copy 到了本地 hive/warehouse 目录下,最后把外部数据 delete 掉(这操作真骚,Hive为什么要这么做呢?)。