|
Site Search
By Uwe Holz
Last Update:
Sunday, October 19, 2003
More information about web search technologies and services can be found at http://www.google.com.
A local search facility is a very important property of well designed web sites. There is a couple of tools available in order to implement a site based search. On the other hand big search engines like Google offer the option to index the site in order to use it's own engine. The drawback are monthly index updates made by Google. Changes (document updates, deletions or additions) will be disregarded by user search requests until the next index update have been made. The following article describes a WebSeller based site search approach not depending on any other third party service.
Article Forumware already demonstrated, how WebSeller can be tailored to set up a discussion board. This multi functional CGI script can be used for HTML search as well.
How it works?
All known search approaches are working based on search indexes, a database containing search tokens and references to all documents which should be found if a user types in an appropriate token.
WebSeller offers native MySQL access therefore it's natural to use a MySQL table for storing the search index into. In general the WebSeller scripting engine comes with all necessary functions for implementing a site search:
- Index creation:
/cgi-bin/webseller.cgi?action=reindex
- Searching within a table or relation:
/cgi-bin/webseller.cgi?action=search ...
- Displaying found documents:
/cgi-bin/webseller.cgi?action=showresult ...
Since all functions are built in there is just one problem left: How to use them. Index creation is a separate task which should be called upon every web site change. It can be done by typing in following URL into the browser:
http://<mydomain>/cgi-bin/webseller.cgi?action=reindex
All other functions require more parameters then just the function's name (action):
How it fits into my Website?
The search option is usually integrated within the main navigation bar. The following HTML source code shows how to implement the search function as a HTTP POST request:
<form method="post"
action="/cgi-bin/webseller.exe?action=Search&ID=-ID-&select=HTMLSEARCH
&response=search/resultview.html&failresponse=search/resultfail.html
§ion=htmlsearch&language=eng">
<p>Site Search:<br>
<input type="text" name="searchtext" size="6">
<input type="image" src="cgi-htm/images/go.gif" value="Search">
</p>
</form>
The result template resultview.html defined by argument Response can be customized in order to fit into the web site's layout and should at least contain some necessary WebSeller tags and placeholders like follows:
<table>
<!---startlist--->
<tr>
<td>
<strong>-POSITION-.</strong>
</td>
<td>
<a href="-LINK-">-TITLE-</a>
<a href="-SCRIPT-HIGHLIGHTRESULT-">(Highlight result)</a>
</td>
</tr>
<tr>
<td> </td>
<td>-ITEMDESCRIPTION-<br /></td>
</tr>
<tr>
<td> </td>
<td><i>URL: </i>-LINK-</td>
</tr>
<tr>
<td> </td>
<td><i>Date: </i><!---(FROM_UNIXTIME(-PUBDATE-))--->
<i> Size: </i>-FILESIZE- Bytes
</td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<!---endlist--->
</table>
The current web has implemented the described solution. Please go ahead and type in some search tokens in order to find out how it works.
Wow it can be customized?
The whole solution is based on a working MySQL database access. Therefore MySQL access data have to be known and set up. Check with your ISP, he will have to provide you with some necessary information for database access like
- MySQL host name
- MySQL database name
- MySQL user name
- MySQL password
If you want to test it locally first you'll need these installed on your own PC. Please adjust MySQL access data within section [MYSQL] of setup file webseller.ini:
[MYSQL]
Host=<MySQL host name>
Database=<MySQL database name>
User=<MySQL user name>
Password=<MySQL password>
...
Index creation can be customized by following values of section [HTMLSEARCH]:
- SEARCHTEXT
- MySQL expression used for database search within the index table
- TRASHTABLE - Table containing invalid search tokens (default: trash)
- FILEEXTENSION - list of all appropriate file extensions. Only files with these extensions should be indexed (default: html;htm;php)
- EXCLUDE
- List of subdirectories not being included by index creation (default: cgi-bin)
- CONTENT_START
- Pattern within the HTML source code indicating the begin of relevant content (content to be indexed)
- CONTENT_STOP
- Pattern within the HTML source code indicating the end of relevant content (content to be indexed)
- HIGHCOLOR_START
- Prefix (HTML source code) for token highlighting
within the found page (Default: <span style=background-color:#00FFFF>)
- HIGHCOLOR_STOP - Postfix (HTML source code) for token highlighting
within the found page (default: </span>)
- MAX_ITEMDESCRIPTION_SIZE
- Maximal length of context displaying
within the result list page (default: 255)
The quality of search results can considerable be determined by values CONTENT_START and CONTENT_STOP: Strings used for navigation links, e.g. 'Home', common captions etc. should not be indexed at all even if they are part of every document. To prevent them from indexing the relevant context should be marked somehow. This is where CONTENT_START and CONTENT_STOP can help. Content Management Systems are using Templates in order to define the layout of every page of the project. In other words: The template's content becomes part of every single document but should be invisible for search requests. Therefore it is recommended to mark begin and end of the actual site content within every single HTML document, e.g.:
<HTML>
...
<BODY>
...
<!-- CITY DESK BODY START -->
... Content ...
<!-- CITY DESK BODY END -->
...
</BODY>
</HTML>
Note: My favoured Content Manager is CityDesk. That's why I am using the mentioned HTML comments in order to mark the relevant content. The following setup data are used for the current web site in configuration file webseller.ini:
[HTMLSEARCH]
SEARCHTEXT=( htmlindex.CONTENT LIKE '%-SEARCHTEXT-%' )
TRASHTABLE=trash
FILEEXTENSION=html;htm;
EXCLUDE=/cgi-bin;/cgi-htm;
CONTENT_START=<!-- CITY DESK BODY START -->
CONTENT_STOP=<!-- CITY DESK BODY END -->
HIGHCOLOR_START=<span style=background-color:#00FFFF>
HIGHCOLOR_STOP=</span>
MAX_ITEMDESCRIPTION_SIZE=255
Please adjust all necessary values and call the index function first. If you receive a failure response or nothing at all, do following checks:
- Check out for file webseller.log created in within the script root directory cgi-bin. If there is no log file at all you might have ran into a permission problem and WebSeller has not been started at all. Check permissions and contact your ISP. If the log file does exist, check it's content for the problem
- If the log file indicates MySQL access problems, check access data with your ISP.
- If the checks above did not solve your problems, send me an e-mail and attach webseller.log.
Note: Since the executable script is platform dependent you have to make sure to install the right version:
- Target platform Linux: /cgi-bin/webseller.cgi
- Target platform Solaris: /cgi-bin/webseller.bin
- Target platform Windows: /cgi-bin/webseller.exe
Where it can be downloaded?
All files used to implement the HTML site search at current website are available at the Download Area. Please adjust all setup values to your environment and use WebSeller Version 3.1.132 (or above) in order to implement your own site search. If you are working with CityDesk you can download the archive file cityd-search.zip. It contains the above described solution including all necessary files. Please adjust the setup by appropriate CityDesk variables after opening search.cty contained in the downloaded archive. Open the variables dialog (click View > Variables) afterwards and modify the following variables accordingly:
- WSMysqlUser - MySQL user name that has access to the database
-
WSMysqlDatabase - Name of the actual database
- WSMySQLPassword - the MySQL user's password
-
WSMysqlHost - Usually localhost unless your ISP defines it differently
- WSScriptUrl - Name of the platform depending WebSeller Engine (/cgi-bin/webseller.exe, /cgi-bin/webseller.exe oder /cgi-bin/webseller.bin)
|