lucene-wiki翻译:如何提高索引速度-2
- 原文:http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
- 导航:Lucene-java Wiki-》1 Overview-》1.1 Informational-》 1.1.1BasicsOfPerformance-》1.1.1.4 ImproveIndexingSpeed
- 注意:“ 红色 ”,表示不知道、不确定怎么翻译。 “ 蓝色”自己的描述。
- 状态:完成
- 上接:lucene-wiki翻译:如何提高索引速度-1
5. 尽可能的使用RAM
原文 写道
More RAM before flushing means Lucene writes larger segments to begin with which means less merging later. Testing in LUCENE-843 found that around 48 MB is the sweet spot for that content set, but, your application could have a different sweet spot.
在flush之前使用的RAM越多意味着segments越大, segments越大意味着以后需要合并的次数就越少。经 LUCENE-843 测试,发现对于内容集合来说,缓存设置为48MB时性能最好。不过,你的应用应该不是这个,呵呵.
下面,看看高人的翻译
在flush前使用更多的内存意味着Lucene将在索引时生成更大的segment,也意味着合并次数也随之减少。在Lucene-843中测试,大概48MB内存可能是一个比较合适的值。但是,你的程序可能会是另外一个值。这跟不同的机器也有一定的关系,请自己多加测试,选择一个权衡值。
6.关闭复合索引
Turn off compound file format.
Call setUseCompoundFile(false). Building the compound file format takes time during indexing (7-33% in testing for LUCENE-888). However, note that doing this will greatly increase the number of file descriptors used by indexing and by searching, so you could run out of file descriptors if mergeFactor is also large.
Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance.
Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details.
最好创建一个单一的Document实例,然后添加你想要的字段到文档中。同时复用添加到文档的Field实例,通用调用相应的SetValue方法改变相应的字段的值。然后重新将Document添加到索引中。
注意:你不能在一个文档中多个字段共用一个Field实例,在文档添加到索引之前,Field的值都不应该改变。也就是说如果你有3个字段,你必须创建3个Field实例,然后再之后的Document添加过程中复用它们。
writerFS = new IndexWriter(dirFS, new StandardAnalyzer(Version.LUCENE_30), true, MaxFieldLength.UNLIMITED); // Field f1 = new Field("f1", "", Store.YES, Index.ANALYZED); Field f2 = new Field("f2", "", Store.YES, Index.ANALYZED); for (int i = 0; i < 1000000; i++) { Document doc = new Document(); f1.setValue("f1 hello doc" + i); doc.add(f1); f2.setValue("f2 world doc" + i); doc.add(f2); writer.addDocument(doc); } // writer.commit(); writerFS.addIndexes(writer.getReader());
推荐阅读