欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

[NIO.2] 第三十四篇 编写一个文件搜索应用

程序员文章站 2022-04-25 10:00:07
...
大多数操作系统都提供了独立的工具用于文件搜索(例如,Linux 上的 find 命令,Windows 上的文件搜索工具)。从简单到高级,这些工具提供的搜索模式都大同小异:用户提供搜索条件,等待搜索工具返回搜索结果。如果你想自己编写搜索程序,那么可以利用 FileVisitor 接口。你可以编写按文件名、按文件扩展名、按区域匹配、按文件内容来搜索文件的功能。

利用 FileVisitor 来编写搜索工具,需要明确以下几点:

  •      visitFile() 是用于比较当前文件和搜索条件的最佳地方。在这里,你可以获取当前文件名、文件扩展名、文件属性或者打开文件读取文件内容。这个方法不会搜索目录。
  •     如果要搜索目录,必须将比较代码放入 preVisitDirectory() 或 postVisitDirectory() 方法中,至于放入哪个方法,取决于你的需求。
  •     如果文件还未搜索出来, visitFileFailed() 需要返回 FileVisitResult.CONTINUE,因为你并不需要让整个搜索过程停止。
  •     如果只需要返回一个结果,那么在搜索到结果之后需要在 visitFile() 方法中返回  FileVisitResult.TERMINATE 否则要返回 FileVisitResult.CONTINUE。
  •     搜索过程可以将软连接处理为目标文件,但是在递归删除的时候,建议只删除软链接自身。


通过名称搜索

下面代码演示了如何通过文件名称进行搜索。代码中将会在整个默认文件系统中搜索名为  rafa_1.jpg 的文件,并在搜索到结果后停止。

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitOption;
import java.nio.file.FileVisitResult;
import java.nio.file.FileVisitor;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.EnumSet;

class Search implements FileVisitor {

    private final Path searchedFile;
    public boolean found;

    public Search(Path searchedFile) {
        this.searchedFile = searchedFile;
        this.found = false;
    }

    void search(Path file) throws IOException {
        Path name = file.getFileName();
        if (name != null && name.equals(searchedFile)) {
            System.out.println("Searched file was found: " + searchedFile
                    + " in " + file.toRealPath().toString());
            found = true;
        }
    }

    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
            throws IOException {
        System.out.println("Visited: " + (Path) dir);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
            throws IOException {
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
            throws IOException {
        search((Path) file);
        if (!found) {
            return FileVisitResult.CONTINUE;
        } else {
            return FileVisitResult.TERMINATE;
        }
    }

    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
            throws IOException {
        //report an error if necessary
        return FileVisitResult.CONTINUE;
    }
}

class Main {

    public static void main(String[] args) throws IOException {

        Path searchFile = Paths.get("rafa_1.jpg");
        Search walk = new Search(searchFile);
        EnumSet opts = EnumSet.of(FileVisitOption.FOLLOW_LINKS);

        Iterable<Path> dirs = FileSystems.getDefault().getRootDirectories();
        for (Path root : dirs) {
            if (!walk.found) {
                Files.walkFileTree(root, opts, Integer.MAX_VALUE, walk);
            }
        }

        if (!walk.found) {
            System.out.println("The file " + searchFile + " was not found!");
        }
    }
}


通过区块匹配搜索

有的时候,你可能只知道部分文件名,那么就可以使用区块匹配功能。下面的代码将演示如何在 C:\rafaelnadal 目录树中查找 *.jpg 的文件。整个目录树搜索完才会停止。

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitOption;
import java.nio.file.FileVisitResult;
import java.nio.file.FileVisitor;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.EnumSet;

class Search implements FileVisitor {

    private final PathMatcher matcher;

    public Search(String glob) {
        matcher = FileSystems.getDefault().getPathMatcher("glob:" + glob);
    }

    void search(Path file) throws IOException {
        Path name = file.getFileName();
        if (name != null && matcher.matches(name)) {
            System.out.println("Searched file was found: " + name
                    + " in " + file.toRealPath().toString());
        }
    }

    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
            throws IOException {
        System.out.println("Visited: " + (Path) dir);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
            throws IOException {
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
            throws IOException {
        search((Path) file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
            throws IOException {
        //report an error if necessary 
        return FileVisitResult.CONTINUE;
    }
}

class Main {

    public static void main(String[] args) throws IOException {
        String glob = "*.jpg";
        Path fileTree = Paths.get("C:/rafaelnadal/");
        Search walk = new Search(glob);
        EnumSet opts = EnumSet.of(FileVisitOption.FOLLOW_LINKS);

        Files.walkFileTree(fileTree, opts, Integer.MAX_VALUE, walk);

    }
}


如果你知道文件的更多属性,那么可以编写更复杂的过滤条件。例如,除了文件名外,你可能还知道文件大小小于多少 KB、或者文件的创建时间、文件的最后编辑时间、文件是否只读、文件是否隐藏、文件所有者是谁等。这些都是文件属性中的一部分,下面的代码将搜索按 *.jpg 匹配并且文件大小小于 100 KB 的文件。

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitOption;
import java.nio.file.FileVisitResult;
import java.nio.file.FileVisitor;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.EnumSet;

class Search implements FileVisitor {

    private final PathMatcher matcher;
    private final long accepted_size;

    public Search(String glob, long accepted_size) {
        matcher = FileSystems.getDefault().getPathMatcher("glob:" + glob);
        this.accepted_size = accepted_size;
    }

    void search(Path file) throws IOException {
        Path name = file.getFileName();
        long size = (Long) Files.getAttribute(file, "basic:size");

        if (name != null && matcher.matches(name) && size <= accepted_size) {
            System.out.println("Searched file was found: " + name + " in "
                    + file.toRealPath().toString() + " size (bytes):" + size);
        }
    }

    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
            throws IOException {
        System.out.println("Visited: " + (Path) dir);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
            throws IOException {
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
            throws IOException {
        search((Path) file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
            throws IOException {
        //report an error if necessary 
        return FileVisitResult.CONTINUE;
    }
}

class Main {

    public static void main(String[] args) throws IOException {

        String glob = "*.jpg";
        long size = 102400; //100 kilobytes in bytes 
        Path fileTree = Paths.get("C:/rafaelnadal/");
        Search walk = new Search(glob, size);
        EnumSet opts = EnumSet.of(FileVisitOption.FOLLOW_LINKS);

        Files.walkFileTree(fileTree, opts, Integer.MAX_VALUE, walk);
    }
}


通过文件内容进行搜索

文件搜索中比较高级的功能是通过文件内容来进行搜索。你传入一句话或几个单词,返回文件内容中包含这些文本的文件。这种搜索非常消耗时间,因为它需要访问每个文件(每个文件都需要打开、读取、关闭操作)。另外,很多文件格式都支持文本,例如 PDF、
Microsoft Word、 Excel、 PowerPoint、 简单文本文件、 XML、 HTML、 XHTML 等等。这些文件类型的读取各不相同,它们都需要单独的代码来进行内容提取。

下面,我们将开发按文件内容搜索的应用。搜索条件是一系列单词或逗号分隔的句子。例如“Rafael Nadal,tennis,winner of Roland Garros,BNP Paribas tournament draws”。使用 StringTokenizer 类按逗号分隔,将每个单词存入 ArrayList:

… 
String words="Rafael Nadal,tennis,winner of Roland Garros,BNP Paribas tournament draws"; 
ArrayList<String> wordsarray = new ArrayList<>(); 
… 
StringTokenizer st = new StringTokenizer(words, ","); 
while (st.hasMoreTokens()) { 
       wordsarray.add(st.nextToken()); 
}


编写 searchText() 方法,将文件中提取的文本传入,循环前面 ArrayList 依次比较文本是否匹配:

//search text 
private boolean searchText(String text) { 

   boolean flag = false; 
   for (int j = 0; j < wordsarray.size(); j++) { 
        if ((text.toLowerCase()).contains(wordsarray.get(j).toLowerCase())) { 
            flag = true; 
            break; 
       } 
   } 
   return flag; 
}


下面将用一组方法来提取不同格式的文件。为了“不重复发明*”,将使用第三方包。

搜索 PDF 格式

读取 PDF 内容,可以使用比较流行的 iText 和 Apache PDFBox。这两个包可以到  http://itextpdf.com/ 和 http://pdfbox.apache.org/ 下载。下面的代码基于 iText 5.1.2 和 PDFBox 1.6.0。

//使用 iText 搜索 PDF 文件 
boolean searchInPDF_iText(String file) { 
   PdfReader reader = null; 
   boolean flag = false; 
   try { 
       reader = new PdfReader(file); 
       int n = reader.getNumberOfPages(); 
       OUTERMOST: 
       for (int i = 1; i <= n; i++) { 
         String str = PdfTextExtractor.getTextFromPage(reader, i); 
         flag = searchText(str); 
         if (flag) { 
               break OUTERMOST; 
         } 
       } 
   } catch (Exception e) { 
      } finally { 
          if (reader != null) { 
              reader.close(); 
            } 
            return flag; 
      } 
}


如果你更熟悉 PDFBox,那么可以使用下面代码:

boolean searchInPDF_PDFBox(String file) { 
   PDFParser parser = null; 
   String parsedText = null; 
   PDFTextStripper pdfStripper = null; 
   PDDocument pdDoc = null; 
   COSDocument cosDoc = null; 
   boolean flag = false; 
   int page = 0; 

   File pdf = new File(file); 

   try { 
       parser = new PDFParser(new FileInputStream(pdf)); 
       parser.parse(); 

       cosDoc = parser.getDocument(); 
       pdfStripper = new PDFTextStripper(); 
       pdDoc = new PDDocument(cosDoc); 

       OUTERMOST: 
       while (page < pdDoc.getNumberOfPages()) { 
            page++; 
            pdfStripper.setStartPage(page); 
            pdfStripper.setEndPage(page + 1); 
            parsedText = pdfStripper.getText(pdDoc); 

            flag = searchText(parsedText); 
            if (flag) { 
                 break OUTERMOST; 
            } 
       } 
   } catch (Exception e) { 
   } finally { 
          try { 
             if (cosDoc != null) { 
                  cosDoc.close(); 
             } 
             if (pdDoc != null) { 
                  pdDoc.close(); 
             } 
          } catch (Exception e) {} 
   return flag; 
   } 
}


搜索 Microsoft Word、 Excel 和 PowerPoint


Microsoft office 可以用 Apache POI 包来解析,可访问 http://poi.apache.org/ 下载,下面的代码基于 3.7 版本。

处理 word:

boolean searchInWord(String file) { 

   POIFSFileSystem fs = null; 
   boolean flag = false; 

   try { 
       fs = new POIFSFileSystem(new FileInputStream(file)); 

       HWPFDocument doc = new HWPFDocument(fs); 
       WordExtractor we = new WordExtractor(doc); 
       String[] paragraphs = we.getParagraphText(); 

       OUTERMOST: 
       for (int i = 0; i < paragraphs.length; i++) { 

             flag = searchText(paragraphs[i]); 
             if (flag) { 
                   break OUTERMOST; 
             } 
       } 

     } catch (Exception e) { 
     } finally { 
            return flag; 
     } 
}


处理 excel:

boolean searchInExcel(String file) { 

   Row row; 
   Cell cell; 
   String text; 
   boolean flag = false; 
   InputStream xls = null; 

   try { 
       xls = new FileInputStream(file); 
       HSSFWorkbook wb = new HSSFWorkbook(xls); 

       int sheets = wb.getNumberOfSheets(); 

       OUTERMOST: 
       for (int i = 0; i < sheets; i++) { 
            HSSFSheet sheet = wb.getSheetAt(i); 

            Iterator<Row> row_iterator = sheet.rowIterator(); 
            while (row_iterator.hasNext()) { 
                row = (Row) row_iterator.next(); 
                Iterator<Cell> cell_iterator = row.cellIterator(); 
                while (cell_iterator.hasNext()) { 
                    cell = cell_iterator.next(); 
                    int type = cell.getCellType(); 
                    if (type == HSSFCell.CELL_TYPE_STRING) { 
                          text = cell.getStringCellValue(); 
                          flag = searchText(text); 
                          if (flag) { 
                                break OUTERMOST; 
                          } 
                    } 
                } 
            } 
       } 

   } catch (IOException e) { 
   } finally { 
         try { 
             if (xls != null) { 
                    xls.close(); 
             } 
         } catch (IOException e) {} 
   return flag; 
   } 
}


处理 PPT:

boolean searchInPPT(String file) { 

   boolean flag = false; 
   InputStream fis = null; 
   String text; 

   try { 
       fis = new FileInputStream(new File(file)); 
       POIFSFileSystem fs = new POIFSFileSystem(fis); 
       HSLFSlideShow show = new HSLFSlideShow(fs); 

       SlideShow ss = new SlideShow(show); 
       Slide[] slides = ss.getSlides(); 

       OUTERMOST: 
       for (int i = 0; i < slides.length; i++) { 

          TextRun[] runs = slides[i].getTextRuns(); 
          for (int j = 0; j < runs.length; j++) { 
             TextRun run = runs[j]; 
             if (run.getRunType() == TextHeaderAtom.TITLE_TYPE) { 
                 text = run.getText(); 
             } else { 
                 text = run.getRunType() + " " + run.getText(); 
             } 

             flag = searchText(text); 
             if (flag) { 
                    break OUTERMOST; 
             } 
          } 

       Notes notes = slides[i].getNotesSheet(); 
       if (notes != null) { 
           runs = notes.getTextRuns(); 
           for (int j = 0; j < runs.length; j++) { 
                text = runs[j].getText(); 
                flag = searchText(text); 
                if (flag) { 
                      break OUTERMOST; 
                } 
           } 
       } 
     } 
   } catch (IOException e) { 
   } finally { 
         try { 
           if (fis != null) { 
                 fis.close(); 
             } 
          } catch (IOException e) {} 
   return flag; 
   } 
}


搜索文本文件

文本文件(.txt, .html, .xml 等)不需要第三方包,只需要原始的 NIO.2 功能即可完成:

boolean searchInText(Path file) {
   boolean flag = false; 
   Charset charset = Charset.forName("UTF-8"); 
   try (BufferedReader reader = Files.newBufferedReader(file, charset)) { 
        String line = null; 
 
        OUTERMOST: 
        while ((line = reader.readLine()) != null) { 
              flag = searchText(line); 
              if (flag) { 
                   break OUTERMOST; 
              } 
        } 
 
    } catch (IOException e) { 
    } finally { 
          return flag; 
    } 
 }


编写一个完整的按内容搜索应用程序

好了,有了以上的基础,我们把所有代码结合起来:

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitOption;
import java.nio.file.FileVisitResult;
import java.nio.file.FileVisitor;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.poi.hslf.HSLFSlideShow;
import org.apache.poi.hslf.model.Notes;
import org.apache.poi.hslf.model.Slide;
import org.apache.poi.hslf.model.TextRun;
import org.apache.poi.hslf.record.TextHeaderAtom;
import org.apache.poi.hslf.usermodel.SlideShow;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;

class Search implements FileVisitor {

    ArrayList<String> wordsarray = new ArrayList<>();
    ArrayList<String> documents = new ArrayList<>();
    boolean found = false;

    public Search(String words) {
        wordsarray.clear();
        documents.clear();

        StringTokenizer st = new StringTokenizer(words, ",");
        while (st.hasMoreTokens()) {
            wordsarray.add(st.nextToken().trim());
        }
    }

    void search(Path file) throws IOException {

        found = false;

        String name = file.getFileName().toString();
        int mid = name.lastIndexOf(".");
        String ext = name.substring(mid + 1, name.length());

        if (ext.equalsIgnoreCase("pdf")) {
            found = searchInPDF_iText(file.toString());
            if (!found) {
                found = searchInPDF_PDFBox(file.toString());
            }
        }

        if (ext.equalsIgnoreCase("doc") || ext.equalsIgnoreCase("docx")) {
            found = searchInWord(file.toString());
        }

        if (ext.equalsIgnoreCase("ppt")) {
            searchInPPT(file.toString());
        }

        if (ext.equalsIgnoreCase("xls")) {
            searchInExcel(file.toString());
        }

        if ((ext.equalsIgnoreCase("txt")) || (ext.equalsIgnoreCase("xml")
                || ext.equalsIgnoreCase("html"))
                || ext.equalsIgnoreCase("htm") || ext.equalsIgnoreCase("xhtml")
                || ext.equalsIgnoreCase("rtf")) {
            searchInText(file);
        }

        if (found) {
            documents.add(file.toString());
        }
    }

    //search in text files
    boolean searchInText(Path file) {

        boolean flag = false;
        Charset charset = Charset.forName("UTF-8");
        try (BufferedReader reader = Files.newBufferedReader(file, charset)) {
            String line = null;

            OUTERMOST:
            while ((line = reader.readLine()) != null) {
                flag = searchText(line);
                if (flag) {
                    break OUTERMOST;
                }
            }

        } catch (IOException e) {
        } finally {
            return flag;
        }
    }

    //search in Excel files
    boolean searchInExcel(String file) {

        Row row;
        Cell cell;
        String text;
        boolean flag = false;
        InputStream xls = null;

        try {
            xls = new FileInputStream(file);
            HSSFWorkbook wb = new HSSFWorkbook(xls);

            int sheets = wb.getNumberOfSheets();
            OUTERMOST:
            for (int i = 0; i < sheets; i++) {
                HSSFSheet sheet = wb.getSheetAt(i);

                Iterator<Row> row_iterator = sheet.rowIterator();
                while (row_iterator.hasNext()) {
                    row = (Row) row_iterator.next();
                    Iterator<Cell> cell_iterator = row.cellIterator();
                    while (cell_iterator.hasNext()) {
                        cell = cell_iterator.next();
                        int type = cell.getCellType();
                        if (type == HSSFCell.CELL_TYPE_STRING) {
                            text = cell.getStringCellValue();
                            flag = searchText(text);
                            if (flag) {
                                break OUTERMOST;
                            }
                        }
                    }
                }
            }

        } catch (IOException e) {
        } finally {
            try {
                if (xls != null) {
                    xls.close();
                }
            } catch (IOException e) {
            }
            return flag;
        }
    }

    //search in PowerPoint files
    boolean searchInPPT(String file) {

        boolean flag = false;
        InputStream fis = null;
        String text;

        try {
            fis = new FileInputStream(new File(file));
            POIFSFileSystem fs = new POIFSFileSystem(fis);
            HSLFSlideShow show = new HSLFSlideShow(fs);

            SlideShow ss = new SlideShow(show);
            Slide[] slides = ss.getSlides();

            OUTERMOST:
            for (int i = 0; i < slides.length; i++) {
                TextRun[] runs = slides[i].getTextRuns();
                for (int j = 0; j < runs.length; j++) {
                    TextRun run = runs[j];
                    if (run.getRunType() == TextHeaderAtom.TITLE_TYPE) {
                        text = run.getText();
                    } else {
                        text = run.getRunType() + " " + run.getText();
                    }

                    flag = searchText(text);
                    if (flag) {
                        break OUTERMOST;
                    }

                }

                Notes notes = slides[i].getNotesSheet();
                if (notes != null) {
                    runs = notes.getTextRuns();
                    for (int j = 0; j < runs.length; j++) {
                        text = runs[j].getText();
                        flag = searchText(text);
                        if (flag) {
                            break OUTERMOST;
                        }
                    }
                }
            }

        } catch (IOException e) {
        } finally {
            try {
                if (fis != null) {
                    fis.close();
                }
            } catch (IOException e) {
            }
            return flag;
        }

    }

    //search in Word files
    boolean searchInWord(String file) {

        POIFSFileSystem fs = null;
        boolean flag = false;

        try {
            fs = new POIFSFileSystem(new FileInputStream(file));

            HWPFDocument doc = new HWPFDocument(fs);
            WordExtractor we = new WordExtractor(doc);
            String[] paragraphs = we.getParagraphText();
            OUTERMOST:
            for (int i = 0; i < paragraphs.length; i++) {
                flag = searchText(paragraphs[i]);
                if (flag) {
                    break OUTERMOST;
                }
            }
        } catch (Exception e) {
        } finally {
            return flag;
        }
    }

    //search in PDF files using PDFBox library
    boolean searchInPDF_PDFBox(String file) {
        PDFParser parser = null;
        String parsedText = null;
        PDFTextStripper pdfStripper = null;
        PDDocument pdDoc = null;
        COSDocument cosDoc = null;
        boolean flag = false;
        int page = 0;
        File pdf = new File(file);
        try {
            parser = new PDFParser(new FileInputStream(pdf));
            parser.parse();
            cosDoc = parser.getDocument();
            pdfStripper = new PDFTextStripper();
            pdDoc = new PDDocument(cosDoc);
            OUTERMOST:
            while (page < pdDoc.getNumberOfPages()) {
                page++;
                pdfStripper.setStartPage(page);
                pdfStripper.setEndPage(page + 1);
                parsedText = pdfStripper.getText(pdDoc);
                flag = searchText(parsedText);
                if (flag) {
                    break OUTERMOST;
                }
            }
        } catch (Exception e) {
        } finally {
            try {
                if (cosDoc != null) {
                    cosDoc.close();
                }
                if (pdDoc != null) {
                    pdDoc.close();
                }
            } catch (Exception e) {
            }
            return flag;
        }
    }

    //search in PDF files using iText library
    boolean searchInPDF_iText(String file) {

        PdfReader reader = null;
        boolean flag = false;

        try {
            reader = new PdfReader(file);
            int n = reader.getNumberOfPages();

            OUTERMOST:
            for (int i = 1; i <= n; i++) {
                String str = PdfTextExtractor.getTextFromPage(reader, i);

                flag = searchText(str);
                if (flag) {
                    break OUTERMOST;
                }
            }

        } catch (Exception e) {
        } finally {
            if (reader != null) {
                reader.close();
            }
            return flag;
        }

    }

    //search text
    private boolean searchText(String text) {

        boolean flag = false;
        for (int j = 0; j < wordsarray.size(); j++) {
            if ((text.toLowerCase()).contains(wordsarray.get(j).toLowerCase())) {
                flag = true;
                break;
            }
        }

        return flag;
    }

    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
            throws IOException {
        System.out.println("Visited: " + (Path) dir);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
            throws IOException {
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
            throws IOException {
        search((Path) file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
            throws IOException {
        //report an error if necessary

        return FileVisitResult.CONTINUE;
    }
}

class Main {

    public static void main(String[] args) throws IOException {

        String words = "Rafael Nadal, tennis, winner of Roland Garros, BNP Paribas tournament draws";
        Search walk = new Search(words);
        EnumSet opts = EnumSet.of(FileVisitOption.FOLLOW_LINKS);

        Iterable<Path> dirs = FileSystems.getDefault().getRootDirectories();
        for (Path root : dirs) {
            Files.walkFileTree(root, opts, Integer.MAX_VALUE, walk);
        }

        System.out.println("____________________________________________________________");
        for (String path_string : walk.documents) {
            System.out.println(path_string);
        }
        System.out.println("____________________________________________________________");

    }
}


文章来源:http://www.aptusource.org/2014/04/nio-2-writing-a-rile-search-application/
相关标签: Java NIO.2