FSShell-EC-du
程序员文章站
2022-04-11 07:57:20
...
client端du的调用:
public static class Du extends FsUsage
public ContentSummary getContentSummary(Path f) throws IOException {
FileStatus status = getFileStatus(f);
if (status.isFile()) {
// f is a file
long length = status.getLen();
return new ContentSummary.Builder().length(length).
fileCount(1).directoryCount(0).spaceConsumed(length).build();
}
// f is a directory
long[] summary = {0, 0, 1};
for(FileStatus s : listStatus(f)) {
long length = s.getLen();
ContentSummary c = s.isDirectory() ? getContentSummary(s.getPath()) :
new ContentSummary.Builder().length(length).
fileCount(1).directoryCount(0).spaceConsumed(length).build();
summary[0] += c.getLength();
summary[1] += c.getFileCount();
summary[2] += c.getDirectoryCount();
}
return new ContentSummary.Builder().length(summary[0]).
fileCount(summary[1]).directoryCount(summary[2]).
spaceConsumed(summary[0]).build();
}
getContentSummary 如果是目录,则循环调用直到是文件。
调用DFSClient的方法RPC调用:
ContentSummary getContentSummary(String src) throws IOException {
checkOpen();
try (TraceScope ignored = newPathTraceScope("getContentSummary", src)) {
return namenode.getContentSummary(src);
} catch (RemoteException re) {
throw re.unwrapRemoteException(AccessControlException.class,
FileNotFoundException.class,
UnresolvedPathException.class);
}
}
**断点调试NameNode的时候,一定要在命令行模式搞。客户端就不要再搞Shell调试了。**经验,否则出不来。
iip:
INodesInPath: path = /ec/t.log
inodes = [, ec, t.log], length=3
isSnapshot = false
snapshotId = 2147483646
关键方法:
ContentSummary cs = targetNode.computeAndConvertContentSummary(
iip.getPathSnapshotId(), cscc);
跳转到Inode中:
public final ContentSummary computeAndConvertContentSummary(int snapshotId,
ContentSummaryComputationContext summary) throws AccessControlException {
computeContentSummary(snapshotId, summary);
final ContentCounts counts = summary.getCounts();
final ContentCounts snapshotCounts = summary.getSnapshotCounts();
final QuotaCounts q = getQuotaCounts();
return new ContentSummary.Builder().
length(counts.getLength()).
fileCount(counts.getFileCount() + counts.getSymlinkCount()).
directoryCount(counts.getDirectoryCount()).
quota(q.getNameSpace()).
spaceConsumed(counts.getStoragespace()).
spaceQuota(q.getStorageSpace()).
typeConsumed(counts.getTypeSpaces()).
typeQuota(q.getTypeSpaces().asArray()).
snapshotLength(snapshotCounts.getLength()).
snapshotFileCount(snapshotCounts.getFileCount()).
snapshotDirectoryCount(snapshotCounts.getDirectoryCount()).
snapshotSpaceConsumed(snapshotCounts.getStoragespace()).
erasureCodingPolicy(summary.getErasureCodingPolicyName(this)).
build();
}
//
public abstract ContentSummaryComputationContext computeContentSummary(
int snapshotId, ContentSummaryComputationContext summary)
throws AccessControlException;
计算EC文件的物理存储:
// TODO: support EC with heterogeneous storage
public final QuotaCounts storagespaceConsumedStriped() {
QuotaCounts counts = new QuotaCounts.Builder().build();
for (BlockInfo b : blocks) {
Preconditions.checkState(b.isStriped());
long blockSize = b.isComplete() ?
((BlockInfoStriped)b).spaceConsumed() : getPreferredBlockSize() *
((BlockInfoStriped)b).getTotalBlockNum();
counts.addStorageSpace(blockSize);
}
return counts;
}
numBytes是逻辑空间。
对于每个块都调用 ((BlockInfoStriped)b).spaceConsumed() 方法来计算实际存储。
public long spaceConsumed() {
// In case striped blocks, total usage by this striped blocks should
// be the total of data blocks and parity blocks because
// `getNumBytes` is the total of actual data block size.
return StripedBlockUtil.spaceConsumedByStripedBlock(getNumBytes(),
ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits(),
ecPolicy.getCellSize());
}
return numDataBlkBytes + numParityBlkBytes;
数据空间+校验空间
数据空间就是我们说的逻辑空间,直接就知道。
校验空间。
逻辑空间:
long numParityBlkBytes = getInternalBlockLength(numDataBlkBytes, cellSize,
dataBlkNum, parityIndex) * parityBlkNum;
StripedBlockUtil:
public static long getInternalBlockLength(long dataSize,
int cellSize, int numDataBlocks, int idxInBlockGroup) {
Preconditions.checkArgument(dataSize >= 0);
Preconditions.checkArgument(cellSize > 0);
Preconditions.checkArgument(numDataBlocks > 0);
Preconditions.checkArgument(idxInBlockGroup >= 0);
// Size of each stripe (only counting data blocks)
final int stripeSize = cellSize * numDataBlocks;
// If block group ends at stripe boundary, each internal block has an equal
// share of the group
final int lastStripeDataLen = (int)(dataSize % stripeSize);
if (lastStripeDataLen == 0) {
return dataSize / numDataBlocks;
}
final int numStripes = (int) ((dataSize - 1) / stripeSize + 1);
return (numStripes - 1L)*cellSize
+ lastCellSize(lastStripeDataLen, cellSize,
numDataBlocks, idxInBlockGroup);
}
计算有几个条带非常巧妙:(分组算法)
final int numStripes = (int) ((dataSize - 1) / stripeSize + 1);
return (numStripes - 1L)*cellSize
+ lastCellSize(lastStripeDataLen, cellSize,
numDataBlocks, idxInBlockGroup);
lastStripeDataLen: 最后一个条带的大小。
numDataBlocks:3-2策略下就是3。
idxInBlockGroup:numDataBlocks+1=4
private static int lastCellSize(int size, int cellSize, int numDataBlocks,
int i) {
if (i < numDataBlocks) { //前边已经传了比numDataBlocks+1个,所以不可能小
// parity block size (i.e. i >= numDataBlocks) is the same as
// the first data block size (i.e. i = 0).
size -= i*cellSize;
if (size < 0) {
size = 0;
}
}
return size > cellSize? cellSize: size; //直接走的这个逻辑
}
总结
- 要知道总的逻辑大小,如300M的文件。
- 要知道划分了多少个块组。如3-2策略下,128M一个块,一个块组就能放128*3=384M的逻辑空间。
public static long spaceConsumedByStripedBlock(long numDataBlkBytes,
int dataBlkNum, int parityBlkNum, int cellSize) {
int parityIndex = dataBlkNum + 1;
long numParityBlkBytes = getInternalBlockLength(numDataBlkBytes, cellSize,
dataBlkNum, parityIndex) * parityBlkNum;
return numDataBlkBytes + numParityBlkBytes;
}
这个值为什么始终 dataBlkNum + 1; ?
推荐阅读