Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


 

Anchor
top
top

This topic describes several search engine improvements and changes that occur frequently. Most of these improvements must be performed by a developer and/or a system administrator.

...

Table of Contents
maxLevel2
minLevel2

 


...

Anchor
analyze_the_search_index
analyze_the_search_index
Analyze the Search Index

...

On local machines or developer PCs a tool like Luke can be used to analyze the search index. You can download Luke at http://www.getopt.org/luke/. This is a Java program that can be started as a .jar file by opening it in a Java runtime environment or by right-clicking on the .jar file and then choosing the open option from the Context menu. When it opens you can select the search index directory and browse through all the fields.

 


Servers

On remote servers there are two ways to analyze the index. The first is to (g)zip or tar the index directory, copy it to your computer and use the method described in the previous paragraph. The second is to use a client connection to the search engine. On Windows servers a batch file can be used. On Unix servers a connection can be set up with the following command from the search engine directory: 


Code Block
themeEclipse
java -Djava.security.policy=conf\grantall_policy -jar ... lib/webmanager-searchengine-client-10.12.0.jar ... rmi://localhost:1099/indexer-wm

 


The first marked string is the XperienCentral version. The second marked string is the RMI connection string which is configured in the properties.txt file. Once a connection has been made to the RMI client, commands can be entered.  To see a list of all commands enter ?<Enter>. Two useful commands are:

  • Search - enter a search string and see the results. More fields are displayed than with a regular query, but not all fields are displayed.
  • Urls - enter a pattern such as *News* or * to see indexed URLs

...


Back to top 


...

Implement a Category Search

...

Pagepath_00_name: Home
Pagepath_00_urlhttp://127.0.0.1:8080/web/Home.htm
Pagepath_01_name: Developerweb
Pagepath_01_urlhttp://127.0.0.1:8080/web/Contact.htm
Pagepath_02_name: Documentation
Pagepath_02_url: http://127.0.0.1:8080/web/Contact/Documentation.htm 


The normal query is extended with an extra filter on the pathname_01_name field, for example: (keywords) AND pathname_01_name:Contact.

...

The search engine contains a configuration file called meta.txt. This file can contain extra parsers for URLs which result in extra fields and information in the index. For example: enter the following code in the meta.txt file: 


Code Block
themeEclipse
.*/Examples/.*   sitepart   examples
.*/Forum/.*      sitepart   forum

 


When the crawler encounters a URL that contains "Examples" it adds a field sitepart to the index with the value "examples". Same for "Forum" - this will result in a value "forum" in the sitepart field. Using this sitepart field is similar to the previous method: extend the query with an additional string containing the filter: (keywords) AND sitepart:examples.

When the crawler encounters a URL that contains "Examples", it adds a sitepart field to the index with the value "examples". The same thing happens with Forum -  this will result in a value "forum" in the sitepart field. Using this sitepart field is similar to the previous method: extend the query with an additional string containing the filter: (keywords) AND sitepart:examples.

 


Back to top

 


...

Index an External Website

...

  1. Create a new entry in the search engine cronjob. The crontab.txt can be extended to index one or more external websites. The task can contain a different time schedule and the depth and valid hosts can be specified. This is a cronjob that indexes a local website at 5 past midnight, but also the www.gxsoftware.com website at 2AM. The homepage will be indexed plus all linked pages with a maximum depth of 2.


    Code Block
    themeEclipse
    index http://localhost:8080/web/webmanager/id=39016 1   127.0.0.1,localhost [5 0 * * *]
    index http://www.gxsoftware.com/ 2 www.gx.nl [0 2 * * *]
    
     


    As an alternative to a cronjob, the external website can be indexed manually on the Search Tools tab in the Setup Tool.

  2. Change the meta.txt file to map the external website to the right search index. The queries that are executed from a normal search element will filter on webID and langid. Therefore to include the search results on a certain website, the website’s webid and langid have to be included during indexing. This can be done by extending the meta.txt file:


    Code Block
    themeEclipse
    http://www.gxsoftware.com/.* webid   26098
    http://www.gxsoftware.com/.* langid  42
    

     


    Documents indexed from www.gxsoftware.com will then have a valid webid and langid and will therefore be returned in the search results.



Back to top 


...

Implement a "Best Bets" Search

...

  1. Find the current score for the query "download" by entering the query "download" in the Search Tools tab in the Setup Tool. The score is between brackets () between the position and the date, for example "(30)".
  2. Navigate to Configuration > Channel Configuration > [General] and make sure the field "Default meta keywords" is empty.
  3. Navigate to the "Download" page in the Workspace.
  4. Click [Edit] in the Properties widget and select the SEO tab.
  5. Enter the keywords "Download" and "Downloads" in the keywords field and click [Apply]
  6. Change the search engine configuration file properties.txt in the /conf directory and add a new property: factor.keyword=500, or if this parameter already exists change the current value to 500.
  7. Restart the search engine
  8. For best results, re-index the entire website, or if the website is really large, reindex the page by entering the URL in the Setup Tool.
  9. Navigate to the Setup Tool, go to the [Search Tools] tab and search for "download" again. The score should now be considerably higher.
  10. Depending on your wishes you can change the search results presentation to reflect the score. A simple script that divides the "best bet" search result(s) from the normal search result is:


    Code Block
    themeEclipse
    <xsl:template match="//wm-searchresults-show">
       <xsl:variable name="normal" select="@normal" />
       <xsl:variable name="header" select="@header" />
       <xsl:variable name="showordernumbers" select="@showordernumbers = 'true'" />
       <xsl:variable name="showpath" select="@showpath = 'true'" />
       <xsl:variable name="showlead" select="@showlead = 'true'" />
       <xsl:variable name="showquery" select="@showquery" />
       <xsl:variable name="showtype" select="@showtype" />
       <xsl:variable name="searchid" select="@searchid" />
       <xsl:variable name="baseUrl" select="@baseUrl" />
    <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
       <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
    <xsl:variable name="orgkeyword" select="translate(/root/system/requestparameters/parameter[name='orgkeyword']/value,$uppercase, $lowercase)" />
       <xsl:if test="count(/root/system/searchresults) > 0">
          <xsl:choose>
             <xsl:when test="/root/system/searchresults/totalcount = 0">${helpText}</xsl:when>
             <xsl:otherwise>
                <div class="searchresults">
                   <p>
                      <xsl:if test="$header != ''">
                         <xsl:attribute name="class"><xsl:value-of select="$header" /></xsl:attribute>
                      </xsl:if>
                      <xsl:text disable-output-escaping="yes">${wmfn:escapeToHTML(showText)}&nbsp;</xsl:text>
                      <xsl:value-of select="(/root/system/searchresults/from + 1)" />
                      <xsl:text>-</xsl:text>
                      <xsl:choose>
                         <xsl:when test="(/root/system/searchresults/totalcount) < (/root/system/searchresults/to)">
                            <xsl:value-of select="/root/system/searchresults/totalcount" />
                         </xsl:when>
                         <xsl:otherwise>
                            <xsl:value-of select="/root/system/searchresults/to" />
                         </xsl:otherwise>
                      </xsl:choose>
                      <xsl:text disable-output-escaping="yes"> (${wmfn:escapeToHTML(foundText)}&nbsp;</xsl:text>
                      <xsl:value-of select="/root/system/searchresults/totalcount" />
                      <xsl:text> ${wmfn:escapeToHTML(entriesText)})</xsl:text>
                   </p>
                   <p>
                      <xsl:text>${wmfn:escapeToHTML(searchOnText)} "</xsl:text>
                      <xsl:choose>
                         <xsl:when test="$showquery != ''">
                            <xsl:value-of select="$showquery" />
                         </xsl:when>
                         <xsl:otherwise>
                            <xsl:value-of select="/root/system/searchresults/query" />
                         </xsl:otherwise>
                      </xsl:choose>
                      <xsl:text>"</xsl:text>
                   </p>
                   <!-- Show navigation -->
                   <xsl:call-template name="shownav">
                      <xsl:with-param name="index">0</xsl:with-param>
                      <xsl:with-param name="max">100</xsl:with-param>
                      <xsl:with-param name="totalcount"><xsl:value-of select="/root/system/searchresults/totalcount" /></xsl:with-param>
                      <xsl:with-param name="currentfrom"><xsl:value-of select="/root/system/searchresults/from" /></xsl:with-param>
                      <xsl:with-param name="class"><xsl:value-of select="$normal" /></xsl:with-param>
                      <xsl:with-param name="searchid"><xsl:value-of select="$searchid" /></xsl:with-param>
                      <xsl:with-param name="baseUrl"><xsl:value-of select="$baseUrl" /></xsl:with-param>
                   </xsl:call-template>
                   <dl>
                      <xsl:for-each select="/root/system/searchresults/entry">
                         <xsl:variable name="authorization">
                            <xsl:call-template name="check_searchresults_readaccess">
                               <xsl:with-param name="authorizedgroups">
                                  <xsl:for-each select="meta">
                                     <xsl:if test="name = 'webusergroups'"><xsl:value-of select="value" /></xsl:if>
                                  </xsl:for-each>
                               </xsl:with-param>
                               <xsl:with-param name="loginrequired">
                                  <xsl:value-of select="meta[name = 'loginrequired']/value" />
                               </xsl:with-param>
                            </xsl:call-template>
                         </xsl:variable>
                         <xsl:if test="contains($authorization, '1')">
    <xsl:if test="count(meta[name='keyword' and translate(value,$uppercase, $lowercase)=$orgkeyword])" >
       Recommended:<br/>
       </xsl:if> 
    



Back to top 


...

Excluding Pages from the Search Index

...

Some example of robots.txt files: 


Don’t allow any search engine to index the website:

User-agent: *
Disallow: /
 


Don’t allow the XperienCentral search engine to index the website:

User-agent: gxsearchgeorge
Disallow: /

 


Don’t allow the XperienCentral search engine to index the pages with URL */web/Examples/* or the login page:

User-agent: gxsearchgeorge
Disallow: /web/Examples/
Disallow: /web/Login.html

 


Back to top