您现在的位置是: 首页


程序员文章站 2022-04-14 12:43:26


5. 尽可能的使用RAM

    原文 写道


Use as much RAM as you can afford.
More RAM before flushing means Lucene writes larger segments to begin with which means less merging later. Testing in LUCENE-843 found that around 48 MB is the sweet spot for that content set, but, your application could have a different sweet spot.


    在flush之前使用的RAM越多意味着segments越大, segments越大意味着以后需要合并的次数就越少。经  LUCENE-843 测试,发现对于内容集合来说,缓存设置为48MB时性能最好。不过,你的应用应该不是这个,呵呵.







Turn off compound file format.

Call setUseCompoundFile(false). Building the compound file format takes time during indexing (7-33% in testing for LUCENE-888). However, note that doing this will greatly increase the number of file descriptors used by indexing and by searching, so you could run out of file descriptors if mergeFactor is also large.

     调用setUseCompoundFile(false)方法可以关闭复合索引。从 LUCENE-888中的实验中可以看出,建立复合索引的时间大概是正常索引的7-33%。然后,这样做的后果是将大大增加了索引和搜索的文件数量,……

    博客分类: 翻译Lucene lucene索引速度提高java     




7.复用Document and Field实例

原文 写道

Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance.

Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details.

      尽量重用Document 和 Field实例。在lucene2.3中新增了setValue(...)方法,这个方法可以改变Field的value值。通过该方法将使得added 多个documents时只有一个Field实例就可以了,从而降低垃圾回收的代价。另外,最好也只建立一个Document实例,然后向Document实例添加多个Field实例,不过这些Field对象……
例如,你可能有一个idField,bodyField、ameField, storedField1等等。在这些文档被added之后,你可以直接改变Field的value(例如,调用idField.setValue(...),……),然后重新加入到你的文档实例中。



在lucene 2.3中,新增了一个叫setValue的方法,可以允许你改变字段的值。这样的好处是你可以在整个索引进程中复用一个Filed实例。这将极大的减少GC负担。

            writerFS = new IndexWriter(dirFS, new StandardAnalyzer(Version.LUCENE_30), true, MaxFieldLength.UNLIMITED);
            Field f1 = new Field("f1", "", Store.YES, Index.ANALYZED);
            Field f2 = new Field("f2", "", Store.YES, Index.ANALYZED);
            for (int i = 0; i < 1000000; i++) {
                Document doc = new Document();
                f1.setValue("f1 hello doc" + i);
                f2.setValue("f2 world doc" + i);
//            writer.commit();