软件工程 wc.exe 代码统计作业

程序员文章站 2022-04-25 18:10:10

软件工程 wc.exe 代码统计作业分享 1. Github 项目地址 "https://github.com/EdwardLiu Aurora/WordCount" "更好地阅读本文，可点击这里" 基本要求 [x] c 统计文件字符数 (实现) [x] w 统计文件词数（实现） [x] l 统计 ......

软件工程 wc.exe 代码统计作业分享

1. github 项目地址

https://github.com/edwardliu-aurora/wordcount

更好地阅读本文，可点击这里

基本要求
- [x] -c 统计文件字符数 (实现)
- [x] -w 统计文件词数（实现）
- [x] -l 统计文件行数（实现）
扩展功能
- [x] -s 递归处理目录下符合条件得文件（实现）
- [x] -a 返回文件代码行 / 空行 / 注释行（实现）
- [x] 支持各种文件的通配符（*,?）（实现）
高级功能
- [ ] -x 图形化界面（未实现）

2. psp 表格

psp2.1	personal software process stages	预估耗时(分钟)	实际耗时(分钟)
planning	计划	5	5
· estimate	· 估计这个任务需要多少时间	600	730
development	开发	480	610
· analysis	· 需求分析 (包括学习新技术)	60	60
· design spec	· 生成设计文档	60	60
· design review	· 设计复审 (和同事审核设计文档)	30	30
· coding standard	· 代码规范 (为目前的开发制定合适的规范)	30	10
· design	· 具体设计	30	60
· coding	· 具体编码	120	240
· code review	· 代码复审	30	30
· test	· 测试（自我测试，修改代码，提交修改）	120	120
reporting	报告	120	120
· test report	· 测试报告	60	60
· size measurement	· 计算工作量	30	30
· postmortem & process improvement plan	· 事后总结, 并提出过程改进计划	30	30
合计		605	735

3. 解题思路描述

(1) 返回文件的字符数

定义：返回文件中除去的字符总数（中文字符分离出来计算）

思路：使用 java 按行读取文件，每个行就是一个 string 对象。
使用 string.length() 来统计该行的字符数，并且按照 character 的值范围判断是否为中文字。使用两个 int 变量来计算总的字符数以及总的中文字符数。

(2) 返回文件的词汇数

定义：不包含中文字符，只包含 0-9,a-z,a-z 和 _ 的连续字段称为词汇

思路：查看 java 根据以上规则，编写符合的正则表达式，使用正则表达式进行按行累加单词数。

(3) 返回文件的行数

定义：返回文件中总行数（根据换行符决定）

思路：根据 java 按行读取文件，设定计数器。

(4) 递归处理目录下符合条件的文件

定义：该目录及子目录下的文件全部分析，附带用户需要的数据

思路：使用一个函数，将该目录下的所有符合条件的文件路径转成一个 arraylist 对象并且返回到 main 函数，由 main 函数继续处理。

(5) 返回更复杂的数据

代码行：除了格式控制符号（如 "{}" "()" ";" 等）之外，包含多余一个字符的代码；

思路：设置一个 set 里面存储了所有的格式控制字符，如果检测到字符不在 set 内，则判断为代码行（要注意的跟注释行冲突的情况：

1. 当该行有 // 和 /* 时，观察哪一个在前面
2. 当该行在 /* */ 注释内时，则不属于代码行；
3. 当该行是 /* 注释行第一行或者末尾一行的时候，要注意检测 /* 前 或 */ 后 的字符；
4. 当该行仅包含 // 时，检查 // 前的符号

注释行：包括注释的行号，无论本行是不是代码行；即包含 // 或在 /* */ 范围内的行；

思路：按行读取，按照 // 或者 /* */ 区分情况

空行：全是空格或者格式控制字符的行；

思路：按照正则表达式和 string.indexof() 函数进行匹配处理

(6) 文件通配符

定义：可以按照 * 代表任意 0 ~ 多个字符以及 ? 代表 1 个任意字符进行匹配

思路：先将 ? 和 * 替换为特定的正则表达式表示，然后将 . 替换为正则表达式表示，然后进行每一个路径中的正则匹配。

4. 设计实现过程

包的说明

bean：存放将要返回的复合类型
service: 存放具体业务的函数实现
com.edwardliu_aurora：main 函数的具体实现

类的说明

charcount:
- allcharcount 所有字符总数
- chncharcount 所有中字总数
linecount:
- blanklinecount 空白行数统计
- codelinecount 代码行数统计
- commentlinecount 注释行数统计
basicstatistic:
- public charcount getcharcount(string filepath) 返回字数统计
- public long getwordcount(string filepath) 返回词数统计
- public long getlinecount(string filepath) 返回行数统计
extrastatistic:
- public linecount getdetaillinecount(string filepath) 获取详细的行数信息
utils:
- public static charset charsetrecognize(string filepath) 识别文本的编码类型(仅支持 gbk 和 utf-8)
- public static arraylist
main:
- 主要负责输入输出以及以上函数的合理调用

5. 代码说明

charcount:
- allcharcount 所有字符总数
- chncharcount 所有中字总数
```
package bean;

/
记录总字符数和中文字符数的类
/
public class charcount {
// 全体字符数目
long allcharcount = 0;
// 中文字符数目
long chncharcount = 0;

public charcount(long allcharcount, long chncharcount) {
    this.allcharcount = allcharcount;
    this.chncharcount = chncharcount;
}

public long getallcharcount() {
    return allcharcount;
}

public void setallcharcount(long allcharcount) {
    this.allcharcount = allcharcount;
}

public long getchncharcount() {
    return chncharcount;
}

public void setchncharcount(long chncharcount) {
    this.chncharcount = chncharcount;
}

}

- linecount:
    - blanklinecount    空白行数统计
    - codelinecount     代码行数统计
    - commentlinecount  注释行数统计

package bean;

// 记录详细行数的类
public class linecount {
// 空行
int blanklinecount = 0;
// 代码行
int codelinecount = 0;
// 注释行
int commentlinecount = 0;

public linecount(int blanklinecount, int codelinecount, int commentlinecount) {
    this.blanklinecount = blanklinecount;
    this.codelinecount = codelinecount;
    this.commentlinecount = commentlinecount;
}

public int getblanklinecount() {
    return blanklinecount;
}

public void setblanklinecount(int blanklinecount) {
    this.blanklinecount = blanklinecount;
}

public int getcodelinecount() {
    return codelinecount;
}

public void setcodelinecount(int codelinecount) {
    this.codelinecount = codelinecount;
}

public int getcommentlinecount() {
    return commentlinecount;
}

public void setcommentlinecount(int commentlinecount) {
    this.commentlinecount = commentlinecount;
}

}

- basicstatistic:
    - public charcount getcharcount(string filepath)    返回字数统计

// 返回文件字符数的函数
public charcount getcharcount(string filepath){
// 全体字符数变量和中文字符数变量
long allcharcount = 0, chncharcount = 0;
// 新建 nio 文件路径对象
path path = paths.get(filepath);
// 为了避免文本太大，这里采用惰性的 stream

    - public long getwordcount(string filepath)         返回词数统计

// 返回文件的词汇数
public long getwordcount(string filepath){
long wordcount = 0;
// 为了避免文本太大，这里采用惰性的 stream

    - public long getlinecount(string filepath)         返回行数统计

// 返回文件的行数
public long getlinecount(string filepath){
long linecount = 0;
// 为了避免文本太大，这里采用惰性的 stream

- extrastatistic:
    - public linecount getdetaillinecount(string filepath)  获取详细的行数信息

package service;

import bean.linecount;

import java.io.bufferedreader;
import java.io.fileinputstream;
import java.io.inputstreamreader;
import java.nio.file.files;
import java.nio.file.path;
import java.nio.file.paths;
import java.util.regex.matcher;
import java.util.regex.pattern;
import java.util.stream.stream;

// 高级统计功能
public class extrastatistic {

public linecount getdetaillinecount(string filepath) {
    linecount linecount = new linecount(0,0,0);
    // 正则表达式匹配任何非空字符
    pattern pattern = pattern.compile("\\s");
    // 统计空行数
    path path = paths.get(filepath);
    try(stream<string> lines = files.lines(path, utils.charsetrecognize(filepath))){
        linecount.setblanklinecount(
                (int) lines.filter(line -> {
                    if(line.length() == 0) return true;
                    int i = 0;
                    matcher matcher = pattern.matcher(line);
                    while(matcher.find()){
                        i++;
                        // 如果有超过一个非空白字符，则不为空行
                        if(i > 1) return false;
                    }
                    // 其余为空行
                    return true;
                }).count()
        );
    }
    catch(exception e){
        e.printstacktrace();
        system.out.println("文件不存在或无法访问");
        return null;
    }
    // 统计注释行和代码行
    try(bufferedreader bufferedreader = new bufferedreader(
            new inputstreamreader(
                    new fileinputstream(filepath),
                    utils.charsetrecognize(filepath)
            ))){
        int commentlinecount = 0;
        int codelinecount = 0;
        // 按行读取文件
        for(string line = bufferedreader.readline(); line != null; line = bufferedreader.readline())
        {
            // 单行注释符号位置
            int onelinepos = line.indexof("//");
            // 多行注释符号位置
            int mullinepos = line.indexof("/*");
            // 如果该行有 //，且 // 在前，则将第一次匹配到的 // 后的内容删去，注释行 +1
            if(onelinepos >= 0 && (mullinepos < 0 || (mullinepos >= 0 && onelinepos < mullinepos))){
                line = line.substring(0,onelinepos);
                commentlinecount++;
                // 如果有 >1 个非空字符，则同时也为代码行
                matcher matcher = pattern.matcher(line);
                int i = 0;
                while(matcher.find()) i++;
                if(i > 1) codelinecount++;
            }
            // 如果该行只有 /* ，则检查是否同时为代码行。注释行 +1，连续读取直到遇到 */ 行
            else if(mullinepos >= 0){
                line = line.substring(0, mullinepos);
                commentlinecount++;
                // 如果有 >1 个非空字符，则同时也为代码行
                matcher matcher = pattern.matcher(line);
                int i = 0;
                while(matcher.find()) i++;
                if(i > 1) codelinecount++;
                line = bufferedreader.readline();
                while(line.indexof("*/") < 0) {
                    commentlinecount++;
                    line = bufferedreader.readline();
                }
                commentlinecount++;
                line = line.substring(line.indexof("*/")+2);
                // 如果有超过一个非空字符，则也为代码行
                i = 0;
                matcher = pattern.matcher(line);
                while(matcher.find()) i++;
                if(i > 1) codelinecount++;
            }
            // 如果没有注释，则看是否能匹配到 >1 个非空字符
            else{
                int i = 0;
                matcher matcher = pattern.matcher(line);
                while(matcher.find()) i++;
                if(i > 1) codelinecount++;
            }
        }
        linecount.setcodelinecount(codelinecount);
        linecount.setcommentlinecount(commentlinecount);
    }
    catch(exception e){
        e.printstacktrace();
        system.out.println("文件不存在或无法访问");
        return null;
    }
    return linecount;
}

}

- utils:
    - public static charset charsetrecognize(string filepath)                               识别文本的编码类型(仅支持 gbk 和 utf-8)

// 文件编码类型简单识别
public static charset charsetrecognize(string filepath){
try{
file file = new file(filepath);
inputstream in = new java.io.fileinputstream(file);
byte[] b = new byte[3];
in.read(b);
in.close();
if (b[0] == -17 && b[1] == -69 && b[2] == -65)
return charset.forname("utf-8");
else{
try (stream

    - public static arraylist<string> getfilespath(string folderpath,string filepattern)    返回某目录下的所有符合 filepattern 通配符的文件路径

// 获取一个目录下及其子目录下的所有的文件路径
public static arraylist

- main:
    - 主要负责输入输出以及以上函数的合理调用

package com.edwardliu_aurora;

import bean.charcount;
import bean.linecount;
import service.basicstatistic;
import service.extrastatistic;
import service.utils;

import java.io.file;
import java.util.arraylist;

public class main {
public static void main(string[] args) {
boolean charcount = false;
boolean wordcount = false;
boolean linecount = false;
boolean directory = false;
boolean detailline = false;
for(int i=0;i<args.length-1;i++){
if(args[i].equals("-c")) charcount = true;
else if(args[i].equals("-w")) wordcount = true;
else if(args[i].equals("-l")) linecount = true;
else if(args[i].equals("-s")) directory = true;
else if(args[i].equals("-a")) detailline = true;
}
basicstatistic basicstatistic = new basicstatistic();
extrastatistic extrastatistic = new extrastatistic();
if(directory) {
string filepattern = args[args.length-1];
filepattern = filepattern.
replaceall("\?","[^/\\\\:*?<>|]").
replaceall("\","[^/\\\\:*?<>|]").
replaceall("\.","\\.");
arraylist

6. 测试运行

单元测试 (已经在开发过程中进行，在这里就不展示了)

测试文件

空文件

软件工程 wc.exe 代码统计作业

只有一个字符的文件

软件工程 wc.exe 代码统计作业

只有一个词的文件

软件工程 wc.exe 代码统计作业

只有一行的文件

软件工程 wc.exe 代码统计作业

一个典型的源文件

软件工程 wc.exe 代码统计作业

基本功能

统计特定文件字符数、词汇数、行数

已经在上方图片展示

高级功能

软件工程 wc.exe 代码统计作业

代码覆盖率测试

软件工程 wc.exe 代码统计作业

7. 实际花费时间 (见开头 psp 表，已填入)

8. 项目小结

我在这个项目中使用了 java 1.8 中才开始支持的 stream api。好处是可以支持函数式编程，可以很方便地用并行运算对文件进行统计工作。而同时这也带来了一个问题——用户必须使用 jre 1.8+ 的版本才能运行我的程序。
在这个项目中，我在读取文件时发现了一个文件编码的识别问题。文件编码的识别问题本身比较复杂，因为 windows 下的文本文件，有的带有 bom 头信息，而有的没有携带 bom 头信息。对于有携带 bom 头信息的，我可以很方便地识别出该文件是否为 utf-8 编码。然而，有很多文件是没有 bom 编码的，我只能根据异常来猜测这个文件是什么编码的。所以目前我的编码识别函数只能支持 utf-8 和 gbk 两种编码，并没有支持其他编码。一旦用户的文本文件是其他编码的，我的程序会出现不可预知的错误。
我没有对用户输入的参数进行判断和校验。我只对是否存在这个文件做了简单的校验。一旦用户把命令输入错误了，我的程序将发生无法预料的错误。
使用软件工程的方法来进行项目的设计，前期也许会花费很多时间，但是事后在编写的过程中会更加清晰有条理，让整体项目设计变得更加可控。

上一篇： Java实现AES/CBC/PKCS7Padding加解密的方法

下一篇： PHP基于array_unique实现二维数组去重

软件工程 wc.exe 代码统计作业