Anchor | ||||
---|---|---|---|---|
|
This topic describes several search engine improvements and changes that occur frequently. Most of these improvements must be performed by a developer and/or a system administrator.
...
Table of Contents | ||||
---|---|---|---|---|
|
...
Anchor | ||||
---|---|---|---|---|
|
...
On local machines or developer PCs a tool like Luke can be used to analyze the search index. You can download Luke at http://www.getopt.org/luke/. This is a Java program that can be started as a .jar file by opening it in a Java runtime environment or by right-clicking on the .jar file and then choosing the open option from the Context menu. When it opens you can select the search index directory and browse through all the fields.
Servers
On remote servers there are two ways to analyze the index. The first is to (g)zip or tar the index directory, copy it to your computer and use the method described in the previous paragraph. The second is to use a client connection to the search engine. On Windows servers a batch file can be used. On Unix servers a connection can be set up with the following command from the search engine directory:
Code Block | ||
---|---|---|
| ||
java -Djava.security.policy=conf\grantall_policy -jar ... lib/webmanager-searchengine-client-10.12.0.jar ... rmi://localhost:1099/indexer-wm |
The first marked string is the XperienCentral version. The second marked string is the RMI connection string which is configured in the properties.txt
file. Once a connection has been made to the RMI client, commands can be entered. To see a list of all commands enter ?<Enter>
. Two useful commands are:
- Search - enter a search string and see the results. More fields are displayed than with a regular query, but not all fields are displayed.
- Urls - enter a pattern such as *News* or * to see indexed URLs
...
...
Implement a Category Search
...
Pagepath_00_name
: HomePagepath_00_url
: http://127.0.0.1:8080/web/Home.htmPagepath_01_name
: DeveloperwebPagepath_01_url
: http://127.0.0.1:8080/web/Contact.htmPagepath_02_name
: DocumentationPagepath_02_url
: http://127.0.0.1:8080/web/Contact/Documentation.htm
The normal query is extended with an extra filter on the pathname_01_name
field, for example: (keywords) AND pathname_01_name:Contact
.
...
The search engine contains a configuration file called meta.txt
. This file can contain extra parsers for URLs which result in extra fields and information in the index. For example: enter the following code in the meta.txt
file:
Code Block | ||
---|---|---|
| ||
.*/Examples/.* sitepart examples .*/Forum/.* sitepart forum |
When the crawler encounters a URL that contains "Examples" it adds a field sitepart
to the index with the value "examples". Same for "Forum" - this will result in a value "forum" in the sitepart
field. Using this sitepart
field is similar to the previous method: extend the query with an additional string containing the filter: (keywords) AND sitepart
:examples.
When the crawler encounters a URL that contains "Examples", it adds a sitepart
field to the index with the value "examples". The same thing happens with Forum - this will result in a value "forum" in the sitepart field. Using this sitepart
field is similar to the previous method: extend the query with an additional string containing the filter: (keywords) AND sitepart:examples.
...
Index an External Website
...
Create a new entry in the search engine cronjob. The
crontab.txt
can be extended to index one or more external websites. The task can contain a different time schedule and the depth and valid hosts can be specified. This is a cronjob that indexes a local website at 5 past midnight, but also the www.gxsoftware.com website at 2AM. The homepage will be indexed plus all linked pages with a maximum depth of 2.Code Block theme Eclipse index http://localhost:8080/web/webmanager/id=39016 1 127.0.0.1,localhost [5 0 * * *] index http://www.gxsoftware.com/ 2 www.gx.nl [0 2 * * *]
As an alternative to a cronjob, the external website can be indexed manually on the Search Tools tab in the Setup Tool.
Change the
meta.txt
file to map the external website to the right search index. The queries that are executed from a normal search element will filter onwebID
andlangid
. Therefore to include the search results on a certain website, the website’swebid
andlangid
have to be included during indexing. This can be done by extending themeta.txt
file:Code Block theme Eclipse http://www.gxsoftware.com/.* webid 26098 http://www.gxsoftware.com/.* langid 42
Documents indexed from www.gxsoftware.com will then have a valid
webid
andlangid
and will therefore be returned in the search results.
...
Implement a "Best Bets" Search
...
- Find the current score for the query "download" by entering the query "download" in the Search Tools tab in the Setup Tool. The score is between brackets () between the position and the date, for example "(30)".
- Navigate to Configuration > Channel Configuration > [General] and make sure the field "Default meta keywords" is empty.
- Navigate to the "Download" page in the Workspace.
- Click [Edit] in the Properties widget and select the SEO tab.
- Enter the keywords "Download" and "Downloads" in the keywords field and click [Apply]
- Change the search engine configuration file
properties.txt
in the/conf
directory and add a new property:factor.keyword=500
, or if this parameter already exists change the current value to 500. - Restart the search engine
- For best results, re-index the entire website, or if the website is really large, reindex the page by entering the URL in the Setup Tool.
- Navigate to the Setup Tool, go to the [Search Tools] tab and search for "download" again. The score should now be considerably higher.
Depending on your wishes you can change the search results presentation to reflect the score. A simple script that divides the "best bet" search result(s) from the normal search result is:
Code Block theme Eclipse <xsl:template match="//wm-searchresults-show"> <xsl:variable name="normal" select="@normal" /> <xsl:variable name="header" select="@header" /> <xsl:variable name="showordernumbers" select="@showordernumbers = 'true'" /> <xsl:variable name="showpath" select="@showpath = 'true'" /> <xsl:variable name="showlead" select="@showlead = 'true'" /> <xsl:variable name="showquery" select="@showquery" /> <xsl:variable name="showtype" select="@showtype" /> <xsl:variable name="searchid" select="@searchid" /> <xsl:variable name="baseUrl" select="@baseUrl" /> <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" /> <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" /> <xsl:variable name="orgkeyword" select="translate(/root/system/requestparameters/parameter[name='orgkeyword']/value,$uppercase, $lowercase)" /> <xsl:if test="count(/root/system/searchresults) > 0"> <xsl:choose> <xsl:when test="/root/system/searchresults/totalcount = 0">${helpText}</xsl:when> <xsl:otherwise> <div class="searchresults"> <p> <xsl:if test="$header != ''"> <xsl:attribute name="class"><xsl:value-of select="$header" /></xsl:attribute> </xsl:if> <xsl:text disable-output-escaping="yes">${wmfn:escapeToHTML(showText)} </xsl:text> <xsl:value-of select="(/root/system/searchresults/from + 1)" /> <xsl:text>-</xsl:text> <xsl:choose> <xsl:when test="(/root/system/searchresults/totalcount) < (/root/system/searchresults/to)"> <xsl:value-of select="/root/system/searchresults/totalcount" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="/root/system/searchresults/to" /> </xsl:otherwise> </xsl:choose> <xsl:text disable-output-escaping="yes"> (${wmfn:escapeToHTML(foundText)} </xsl:text> <xsl:value-of select="/root/system/searchresults/totalcount" /> <xsl:text> ${wmfn:escapeToHTML(entriesText)})</xsl:text> </p> <p> <xsl:text>${wmfn:escapeToHTML(searchOnText)} "</xsl:text> <xsl:choose> <xsl:when test="$showquery != ''"> <xsl:value-of select="$showquery" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="/root/system/searchresults/query" /> </xsl:otherwise> </xsl:choose> <xsl:text>"</xsl:text> </p> <!-- Show navigation --> <xsl:call-template name="shownav"> <xsl:with-param name="index">0</xsl:with-param> <xsl:with-param name="max">100</xsl:with-param> <xsl:with-param name="totalcount"><xsl:value-of select="/root/system/searchresults/totalcount" /></xsl:with-param> <xsl:with-param name="currentfrom"><xsl:value-of select="/root/system/searchresults/from" /></xsl:with-param> <xsl:with-param name="class"><xsl:value-of select="$normal" /></xsl:with-param> <xsl:with-param name="searchid"><xsl:value-of select="$searchid" /></xsl:with-param> <xsl:with-param name="baseUrl"><xsl:value-of select="$baseUrl" /></xsl:with-param> </xsl:call-template> <dl> <xsl:for-each select="/root/system/searchresults/entry"> <xsl:variable name="authorization"> <xsl:call-template name="check_searchresults_readaccess"> <xsl:with-param name="authorizedgroups"> <xsl:for-each select="meta"> <xsl:if test="name = 'webusergroups'"><xsl:value-of select="value" /></xsl:if> </xsl:for-each> </xsl:with-param> <xsl:with-param name="loginrequired"> <xsl:value-of select="meta[name = 'loginrequired']/value" /> </xsl:with-param> </xsl:call-template> </xsl:variable> <xsl:if test="contains($authorization, '1')"> <xsl:if test="count(meta[name='keyword' and translate(value,$uppercase, $lowercase)=$orgkeyword])" > Recommended:<br/> </xsl:if>
...
Excluding Pages from the Search Index
...
Some example of robots.txt
files:
Don’t allow any search engine to index the website:
User-agent: *
Disallow: /
Don’t allow the XperienCentral search engine to index the website:
User-agent: gxsearchgeorge
Disallow: /
Don’t allow the XperienCentral search engine to index the pages with URL */web/Examples/* or the login page:
User-agent: gxsearchgeorge
Disallow: /web/Examples/
Disallow: /web/Login.html