欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

FSShell-EC-du

程序员文章站 2022-04-11 07:57:20
...

client端du的调用:
public static class Du extends FsUsage
FSShell-EC-du

public ContentSummary getContentSummary(Path f) throws IOException {
    FileStatus status = getFileStatus(f);
    if (status.isFile()) {
      // f is a file
      long length = status.getLen();
      return new ContentSummary.Builder().length(length).
          fileCount(1).directoryCount(0).spaceConsumed(length).build();
    }
    // f is a directory
    long[] summary = {0, 0, 1};
    for(FileStatus s : listStatus(f)) {
      long length = s.getLen();
      ContentSummary c = s.isDirectory() ? getContentSummary(s.getPath()) :
          new ContentSummary.Builder().length(length).
          fileCount(1).directoryCount(0).spaceConsumed(length).build();
      summary[0] += c.getLength();
      summary[1] += c.getFileCount();
      summary[2] += c.getDirectoryCount();
    }
    return new ContentSummary.Builder().length(summary[0]).
        fileCount(summary[1]).directoryCount(summary[2]).
        spaceConsumed(summary[0]).build();
  }

getContentSummary 如果是目录,则循环调用直到是文件。
调用DFSClient的方法RPC调用:

  ContentSummary getContentSummary(String src) throws IOException {
    checkOpen();
    try (TraceScope ignored = newPathTraceScope("getContentSummary", src)) {
      return namenode.getContentSummary(src);
    } catch (RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class,
          FileNotFoundException.class,
          UnresolvedPathException.class);
    }
  }

**断点调试NameNode的时候,一定要在命令行模式搞。客户端就不要再搞Shell调试了。**经验,否则出不来。
FSShell-EC-du
iip:

INodesInPath: path = /ec/t.log
  inodes = [, ec, t.log], length=3
  isSnapshot        = false
  snapshotId        = 2147483646

FSShell-EC-du

关键方法:

ContentSummary cs = targetNode.computeAndConvertContentSummary(
            iip.getPathSnapshotId(), cscc);

跳转到Inode中:

 public final ContentSummary computeAndConvertContentSummary(int snapshotId,
      ContentSummaryComputationContext summary) throws AccessControlException {
    computeContentSummary(snapshotId, summary);
    final ContentCounts counts = summary.getCounts();
    final ContentCounts snapshotCounts = summary.getSnapshotCounts();
    final QuotaCounts q = getQuotaCounts();
    return new ContentSummary.Builder().
        length(counts.getLength()).
        fileCount(counts.getFileCount() + counts.getSymlinkCount()).
        directoryCount(counts.getDirectoryCount()).
        quota(q.getNameSpace()).
        spaceConsumed(counts.getStoragespace()).
        spaceQuota(q.getStorageSpace()).
        typeConsumed(counts.getTypeSpaces()).
        typeQuota(q.getTypeSpaces().asArray()).
        snapshotLength(snapshotCounts.getLength()).
        snapshotFileCount(snapshotCounts.getFileCount()).
        snapshotDirectoryCount(snapshotCounts.getDirectoryCount()).
        snapshotSpaceConsumed(snapshotCounts.getStoragespace()).
        erasureCodingPolicy(summary.getErasureCodingPolicyName(this)).
        build();
  }
  //
  public abstract ContentSummaryComputationContext computeContentSummary(
      int snapshotId, ContentSummaryComputationContext summary)
      throws AccessControlException;
      

计算EC文件的物理存储:

// TODO: support EC with heterogeneous storage
  public final QuotaCounts storagespaceConsumedStriped() {
    QuotaCounts counts = new QuotaCounts.Builder().build();
    for (BlockInfo b : blocks) {
      Preconditions.checkState(b.isStriped());
      long blockSize = b.isComplete() ?
          ((BlockInfoStriped)b).spaceConsumed() : getPreferredBlockSize() *
          ((BlockInfoStriped)b).getTotalBlockNum();
      counts.addStorageSpace(blockSize);
    }
    return  counts;
  }

FSShell-EC-du
FSShell-EC-du
numBytes是逻辑空间。
对于每个块都调用 ((BlockInfoStriped)b).spaceConsumed() 方法来计算实际存储。

  public long spaceConsumed() {
    // In case striped blocks, total usage by this striped blocks should
    // be the total of data blocks and parity blocks because
    // `getNumBytes` is the total of actual data block size.
    return StripedBlockUtil.spaceConsumedByStripedBlock(getNumBytes(),
        ecPolicy.getNumDataUnits(), ecPolicy.getNumParityUnits(),
        ecPolicy.getCellSize());
    }

FSShell-EC-du

    return numDataBlkBytes + numParityBlkBytes;

数据空间+校验空间
数据空间就是我们说的逻辑空间,直接就知道。
校验空间。

逻辑空间:

    long numParityBlkBytes = getInternalBlockLength(numDataBlkBytes, cellSize,
        dataBlkNum, parityIndex) * parityBlkNum;

StripedBlockUtil:

public static long getInternalBlockLength(long dataSize,
      int cellSize, int numDataBlocks, int idxInBlockGroup) {
    Preconditions.checkArgument(dataSize >= 0);
    Preconditions.checkArgument(cellSize > 0);
    Preconditions.checkArgument(numDataBlocks > 0);
    Preconditions.checkArgument(idxInBlockGroup >= 0);
    // Size of each stripe (only counting data blocks)
    final int stripeSize = cellSize * numDataBlocks;
    // If block group ends at stripe boundary, each internal block has an equal
    // share of the group
    final int lastStripeDataLen = (int)(dataSize % stripeSize);
    if (lastStripeDataLen == 0) {
      return dataSize / numDataBlocks;
    }

    final int numStripes = (int) ((dataSize - 1) / stripeSize + 1);
    return (numStripes - 1L)*cellSize
        + lastCellSize(lastStripeDataLen, cellSize,
        numDataBlocks, idxInBlockGroup);
  }

FSShell-EC-du
计算有几个条带非常巧妙:(分组算法)

final int numStripes = (int) ((dataSize - 1) / stripeSize + 1);
    return (numStripes - 1L)*cellSize
        + lastCellSize(lastStripeDataLen, cellSize,
        numDataBlocks, idxInBlockGroup);

lastStripeDataLen: 最后一个条带的大小。
numDataBlocks:3-2策略下就是3。
idxInBlockGroup:numDataBlocks+1=4

  private static int lastCellSize(int size, int cellSize, int numDataBlocks,
      int i) {
    if (i < numDataBlocks) {  //前边已经传了比numDataBlocks+1个,所以不可能小
      // parity block size (i.e. i >= numDataBlocks) is the same as
      // the first data block size (i.e. i = 0).
      size -= i*cellSize;
      if (size < 0) {
        size = 0;
      }
    }
    return size > cellSize? cellSize: size;  //直接走的这个逻辑
  }

总结

  1. 要知道总的逻辑大小,如300M的文件。
  2. 要知道划分了多少个块组。如3-2策略下,128M一个块,一个块组就能放128*3=384M的逻辑空间。
  public static long spaceConsumedByStripedBlock(long numDataBlkBytes,
      int dataBlkNum, int parityBlkNum, int cellSize) {
    int parityIndex = dataBlkNum + 1;
    long numParityBlkBytes = getInternalBlockLength(numDataBlkBytes, cellSize,
        dataBlkNum, parityIndex) * parityBlkNum;
    return numDataBlkBytes + numParityBlkBytes;
  }

这个值为什么始终 dataBlkNum + 1; ?

相关标签: HDFS

推荐阅读