`

Regain增加中文Paoding分词模块及界面显示的修改笔记

阅读更多
Regain修改笔记
 
 
一、修改增加中文分词模块为 Paoding-analysis
 
非常简单,只需要修改一个源码文件。
 
源代码文件(以下都用下划线表示):src\net\sf\regainRegainToolKit.java

import net.paoding.analysis.analyzer.PaodingAnalyzer;
import org.apache.lucene.analysis.cn.ChineseAnalyzer;
    
 
  public static Analyzer createAnalyzer(String analyzerType,
    String[] stopWordList, String[] exclusionList, String[] untokenizedFieldNames)
    throws RegainException
 
    if (analyzerType.equalsIgnoreCase("english")) {
      analyzerClassName = StandardAnalyzer.class.getName();
    } else if (analyzerType.equalsIgnoreCase("german")) {
      analyzerClassName = GermanAnalyzer.class.getName();
    } else if (analyzerType.equalsIgnoreCase("chinese")){
      analyzerClassName = ChineseAnalyzer.class.getName();//Add by ping. 
    }  else if (analyzerType.equalsIgnoreCase("paoding")){
      analyzerClassName = PaodingAnalyzer.class.getName();//Add by ping. 
    }
 
源码修改只涉及以上一个文件,但是要完整编译和最终运成功,还需要其他修改。
主要包括:
1.修改ant的编译配置文件build.xml,
2.拷贝paoding-analysis.jar到lib目录。
 
build.xml修改如下:
[这里摘录修改的片段,修改增加部分为粗体]
...
  <target name="runtime-desktop" depends="prepare-once, runtime-desktop-fast">
    <echo message="Creating the jars ..." />
    <fileset id="desktop-common-jars" dir="build/included-lib-classes/common">
      <include name="org/apache/lucene/**"/>
      <include name="org/apache/log4j/**"/>
      <include name="org/apache/regexp/**"/>
      <!-- Add by ping. -->
      <include name="net/paoding/analysis/**"/>
      <include name="paoding-*.properties"/>
      <include name="org/apache/commons/**"/>
 
...
  <target name="runtime-server" depends="prepare-once, runtime-server-fast, -web-temps">
    <jar jarfile="build/runtime/crawler/${programname.file}-crawler.jar"
         compress="false"
         index="true">
      <manifest>
        <attribute name="Main-Class" value="net.sf.regain.crawler.Main"/>
      </manifest>
      <fileset dir="build/included-lib-classes/common">
        <include name="org/apache/lucene/**"/>
        <include name="org/apache/log4j/**"/>
        <include name="org/apache/regexp/**"/>
 
      <!-- Add by ping. -->
      <include name="net/paoding/analysis/**"/>
      <include name="paoding-*.properties"/>
      <include name="org/apache/commons/**"/>
...
 
    <mkdir dir="build/runtime/search/webapps"/>
    <war destfile="build/runtime/search/webapps/${programname.file}.war"
         webxml="web/server/web-inf/web.xml">
      <classes dir="build/classes">
        <exclude name="net/sf/regain/crawler/**"/>
        <exclude name="net/sf/regain/ui/desktop/**"/>
        <exclude name="net/sf/regain/util/sharedtag/simple/**"/>
        <exclude name="net/sf/regain/util/ui/**"/>
      </classes>
      <lib dir="lib">
        <include name="lucene-*.jar"/>
        <include name="jakarta-regexp-*.jar"/>
        <include name="log4j-*.jar"/>
        <!--Add by ping.-->
        <include name="paoding-*.jar"/>       
        <include name="commons-logging*.jar"/>
       
      </lib>
 
...
    <mkdir dir="${deploy-target.dir}/${programname.file}/WEB-INF/lib"/>
 <copy todir="${deploy-target.dir}/${programname.file}/WEB-INF/lib">
   <fileset dir="lib">
        <include name="lucene-*.jar"/>
        <include name="jakarta-regexp-*.jar"/>
        <include name="log4j-*.jar"/>
        <!--Add by ping.-->
        <include name="paoding-*.jar"/>       
        <include name="commons-logging*.jar"/>
      
    </fileset>
 </copy>
 
 
二、修改查询结果片段长度
 
 
1.默认查询结果显示片段为100个字节,
个人认为比较短,可以修改为结果片段长度为300.
 
lucene\contrib\highlighter\src\java
    org.apache.lucene.search.highlight
        SimpleFragmenter.java
 
public class SimpleFragmenter implements Fragmenter
{
 private static final int DEFAULT_FRAGMENT_SIZE =100*3;
定于查询结果片段的长度。默认为100字节,修改为300字节
 
 
 
三、另外,对查询结果页面进行稍微修改。
 
1.package net.sf.regain.search.results;
SingleSearchRusults.jsp
 
     public void highlightHitDocument(int index)
            resHighlSummary = highlighter.getBestFragments(tokenStream, text, 3,
 " . . .  . . . <br><span class=\"resultTag\">[Result]</span> ");
 定于查询结果显示。
 
2.web\web\common
    search.jsp
 
      <search:list msgNoResults="<tr><td colspan='2'>{msg:noResultsFound}<br/><br/></td></tr>">
        <tr><td colspan="2">
            <search:hit_typeicon imgpath="img/ext"/> <search:hit_link/>
            <span class="hitDetails">
              (<search:msg key="relevance"/>: <search:hit_score/>)<br/>
            <span class="resultTag">[Result]</span>
              <search:hit_field field="summary"/><br/>
              <search:hit_content/>
              <search:hit_path after="<br/>" createLinks="true"/>
              <search:hit_field field="mimetype"/>&nbsp;
              <span class="hitInfo"><search:hit_url beautified="true"/> - <search:hit_size/></span><br/>
            <br/></span>
        </td></tr>
      </search:list>     
    
    查询结果显示页面和显示数据域的定义。
 
 
3.增加显示样式
src\web\common
    regain.css
 
.resultTag {
 color: #0000FF;
 font-weight: bold;
}
 
4.一点小修饰,获取文章内容的按钮默认是德文,翻译成英文表示。
src/net/sf/regain/search/sharedlib/hit/ContentTag.java
  protected void printEndTag(PageRequest request, PageResponse response,
    Document hit, int hitIndex)
    throws RegainException {
 
    String content = null;
    content = hit.get("content");
    if (content != null) {
      String hitNumber = Integer.toString(hitIndex + 1);
      response.print("<input type=\"button\" class=\"button\" onclick=\"return toggleMe('hit_" +
        hitNumber + "')\" value=\"Click here Get " + hitNumber + " content\">");
 
重新编译后,效果还不错呢!
 
  • 大小: 316.7 KB
分享到:
评论

相关推荐

    基于lucene的搜索引擎regain安装版

    基于lucene的搜索引擎regain安装版

    搜索引擎regain_v1.2.3_server

    搜索引擎regain_v1.2.3_server

    PyPI 官网下载 | regain-0.1.7.tar.gz

    资源来自pypi官网。 资源全名:regain-0.1.7.tar.gz

    regain:在桌面或服务器上运行的搜索引擎,支持各种文件格式

    重新获得您的隐藏信息regain是在桌面或服务器上运行的搜索引擎,支持各种文件格式。重新获得什么? regain是一个类似于Google之类的网络搜索引擎的搜索引擎,区别在于您不搜索网络,而是搜索自己的文件和文档。 使用...

    Regain Power-开源

    文件夹选项、任务管理器、regedit 大多被 windows 中的病毒禁用。该程序可以带回您的文件夹选项、任务管理器、regedit(windows 注册表编辑器)搜索选项、运行选项、显示隐藏文件和文件夹等.. 选项

    英文读后感《To Regain the Nature of Goodness》.doc

    英文读后感《To Regain the Nature of Goodness》.doc

    Python库 | regain-0.2.2.tar.gz

    资源分类:Python库 所属语言:Python 资源全名:regain-0.2.2.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059

    Regain:一个基于Jakarta Lucene的Java搜索引擎-开源

    Regain是一个基于Jakarta Lucene的Java搜索引擎。 它提供了索引和搜索文件的多种格式(HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java)。 TagLibrary使您可以轻松地将搜索结果集成到基于...

    重获:REGAIN(规则图形推论)

    恢复考虑到潜在变量的影响,跨多个时间戳的正则化图形推断。...安装安装重新获得收益的最简单方法是使用pip pip install regain 或conda conda install -c fdtomasi regain 如果您想从源代码安装,或者

    regain-开源

    Regain 是一个基于 Jakarta Lucene 的 Java 搜索引擎。 它为多种格式(HTML、XML、doc(x)、xls(x)、ppt(x)、oo、PDF、RTF、mp3、mp4、Java)提供索引和搜索文件。 TagLibrary 简化了在基于 JSP 的网页中集成搜索结果...

    regain:koa2 + mysql + vue3

    node 后端 /back-end 在 /back-end 目录下创建 config 文件夹。 在其下添加 database.js const data = { url:'database-host', //host user:'database-user', //user pwd:'database-pwd', //password ...

    YacineNacer.rar_Alis_diagnostic

    Le diagnostic de défaillances des ... Le regain d’intérêt manifesté par les différents secteurs industriels et par le monde de la recherche, démontre que ce domaine est un créneau très porteur.

    Cracklock 时限破解器

    When installing Cracklock, users... Basically, users who can no longer access a certain shareware software that they have been using for the past 30 days can process it using Cracklock and regain access.

    Microsoft System Center Orchestrator 2012 R2 Essentials(PACKT,2015)

    System Center Orchestrator 2012 R2 is the tool that can help you regain that time again, by simplifying repetitive tasks (or complex ones) into a simple one that will require minimum intervention ...

    PRACTICA 2_powerelectronics_

    The stability of power systems refers to the property that allows them to remain in an operating state in equilibrium under normal operating conditions and to regain another state of equilibrium after...

    Shell Scripting Recipes(Apress,2ed,2015)

    Shell Scripting Recipes is filled with over 150 much-needed and practical recipes that follow a problem-solution format, and help all Unix users regain some of the lost time spent creating and testing...

    英文原版-Cisco ISE for BYOD and Secure Unified Access 1st Edition

    Using Cisco Secure Unified Access Architecture and Cisco Identity Services Engine, you can secure and regain control of borderless networks in a Bring Your Own Device (BYOD) world. This book covers ...

    MySQL Admin Cookbook

    * Restrict access sensibly and regain access to your database in case of loss of administrative user credentials * Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of ...

    sim-card.zip_单片机开发_Others_

    Want to regain file which deleted due to human error from cell phone sim card memory? If yes

    JLink_Windows_V660d.exe

    DLL (Windows): After a dialog from the DLL was closed, the original window did not regain keyboard focus. Fixed. DLL: Added flash programming support for Toshiba TMPM3HLFDUG, TMPM3HLFYUG, TMPM3HLFZUG...

Global site tag (gtag.js) - Google Analytics