<small id='KT19U'></small><noframes id='KT19U'>

      <tfoot id='KT19U'></tfoot>
        <bdo id='KT19U'></bdo><ul id='KT19U'></ul>

      <legend id='KT19U'><style id='KT19U'><dir id='KT19U'><q id='KT19U'></q></dir></style></legend>

        <i id='KT19U'><tr id='KT19U'><dt id='KT19U'><q id='KT19U'><span id='KT19U'><b id='KT19U'><form id='KT19U'><ins id='KT19U'></ins><ul id='KT19U'></ul><sub id='KT19U'></sub></form><legend id='KT19U'></legend><bdo id='KT19U'><pre id='KT19U'><center id='KT19U'></center></pre></bdo></b><th id='KT19U'></th></span></q></dt></tr></i><div id='KT19U'><tfoot id='KT19U'></tfoot><dl id='KT19U'><fieldset id='KT19U'></fieldset></dl></div>
      1. 如何从 Lucene 中的文档术语向量中获取位置?

        How to get positions from a document term vector in Lucene?(如何从 Lucene 中的文档术语向量中获取位置?)

          <i id='HAfZr'><tr id='HAfZr'><dt id='HAfZr'><q id='HAfZr'><span id='HAfZr'><b id='HAfZr'><form id='HAfZr'><ins id='HAfZr'></ins><ul id='HAfZr'></ul><sub id='HAfZr'></sub></form><legend id='HAfZr'></legend><bdo id='HAfZr'><pre id='HAfZr'><center id='HAfZr'></center></pre></bdo></b><th id='HAfZr'></th></span></q></dt></tr></i><div id='HAfZr'><tfoot id='HAfZr'></tfoot><dl id='HAfZr'><fieldset id='HAfZr'></fieldset></dl></div>

            <legend id='HAfZr'><style id='HAfZr'><dir id='HAfZr'><q id='HAfZr'></q></dir></style></legend><tfoot id='HAfZr'></tfoot>

            <small id='HAfZr'></small><noframes id='HAfZr'>

              <bdo id='HAfZr'></bdo><ul id='HAfZr'></ul>
                  <tbody id='HAfZr'></tbody>

                1. 本文介绍了如何从 Lucene 中的文档术语向量中获取位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我需要遍历 Lucene 索引中的所有文档,并获取每个术语在每个文档中出现的位置.据我能够从 Lucene javadoc 中了解到,这样做的方法是执行以下操作:

                  I need to iterate over all documents in a Lucene index, and obtain the positions at which each term occurs in each document. As far as I am able to understand from the Lucene javadoc, the way to do this is to do something like this:

                  IndexReader ir = obtainIndexReader();
                  Terms tv = ir.getTermVector( doc, field );
                  TermsEnum terms = tv.iterator();
                  PostingsEnum p = null;
                  while( terms.next() != null ) {
                      p = terms.postings( p, PostingsEnum.ALL );
                      while( p.nextDoc() != PostingsEnum.NO_MORE_DOCS ) {
                          int freq = p.freq();
                          for( int i = 0; i < freq; i++ ) {
                              int pos = p.nextPosition();   // Always returns -1!!!
                              BytesRef data = p.getPayload();
                              doStuff( freq, pos, data ); // Fails miserably, of course.
                          }
                      }
                  }
                  

                  但是,即使 (1) 索引确实包含相关字段上的位置,并且 (2) 术语向量声称具有位置(即:tv.hasPositions() == true),我仍然得到-1" 适用于所有职位.

                  However, even though (1) the index does indeed include positions on the relevant field and (2) the term vector claims to have positions (i.e.: tv.hasPositions() == true), I keep getting "-1" for all positions.

                  首先,我是不是做错了什么?是否有另一种方法可以在每个文档的基础上迭代过帐?第二:到底发生了什么?该索引包含位置,getTermVector 返回的术语实例声称包含位置,并且我正在查看 Luke 中的正确位置值,但是当我尝试在我的代码中访问所述值时仍然得到 -1.什么给了?

                  First, am I doing something wrong? Is there an alternative way of iterating over postings on a per-document basis? Second: What is going on anyway? The index contains positions, the Terms instance returned by getTermVector claims to include positions, and I'm looking at the correct position values in Luke, yet I still get -1 when I try to access said values in my code. What gives?

                  相关字段配置有以下选项:

                  The relevant field was configured with the following options:

                      FieldType ft = new FieldType();
                      ft.setIndexOptions( IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS );
                      ft.setStoreTermVectors( true );
                      ft.setStoreTermVectorOffsets( true );
                      ft.setStoreTermVectorPayloads( true );
                      ft.setStoreTermVectorPositions( true );
                      ft.setTokenized( true );
                      return ft;
                  

                  推荐答案

                  您是否在索引时为您的字段类型设置了 FieldType.setStoreTermVectorPositions(true)?http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                  Did you set FieldType.setStoreTermVectorPositions(true) on your field type at index time? http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                  这篇关于如何从 Lucene 中的文档术语向量中获取位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Lucene Porter Stemmer not public(Lucene Porter Stemmer 未公开)
                  How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?(如何在 lucene 中索引 pdf、ppt、xl 文件(基于 java 或 python 或 php 中的任何一个都可以)?)
                  KeywordAnalyzer and LowerCaseFilter/LowerCaseTokenizer(KeywordAnalyzer 和 LowerCaseFilter/LowerCaseTokenizer)
                  How to search between dates (Hibernate Search)?(如何在日期之间搜索(休眠搜索)?)
                  Java Lucene 4.5 how to search by case insensitive(Java Lucene 4.5如何按不区分大小写进行搜索)
                  solrj: how to store and retrieve Listlt;POJOgt; via multivalued field in index(solrj:如何存储和检索Listlt;POJOgt;通过索引中的多值字段)

                    <small id='hCeCV'></small><noframes id='hCeCV'>

                          <tbody id='hCeCV'></tbody>
                      1. <legend id='hCeCV'><style id='hCeCV'><dir id='hCeCV'><q id='hCeCV'></q></dir></style></legend>

                      2. <tfoot id='hCeCV'></tfoot>
                        <i id='hCeCV'><tr id='hCeCV'><dt id='hCeCV'><q id='hCeCV'><span id='hCeCV'><b id='hCeCV'><form id='hCeCV'><ins id='hCeCV'></ins><ul id='hCeCV'></ul><sub id='hCeCV'></sub></form><legend id='hCeCV'></legend><bdo id='hCeCV'><pre id='hCeCV'><center id='hCeCV'></center></pre></bdo></b><th id='hCeCV'></th></span></q></dt></tr></i><div id='hCeCV'><tfoot id='hCeCV'></tfoot><dl id='hCeCV'><fieldset id='hCeCV'></fieldset></dl></div>

                            <bdo id='hCeCV'></bdo><ul id='hCeCV'></ul>