[Free Framers] Search produces five identical listings

Reid Gray rgray at interactivesupercomputing.com
Thu Aug 6 07:33:56 PDT 2009


Hi Nancy,
 
Sorry this is a week off. 
 
You have identified one of the common problems of not classic problems with searching --duplicates.  Often times duplicates are a side effect to the automatic indexing process where the same source text already has hyperlinks from multiple places in the help project, website, or body of content you are feeding. 
 
So if the anchor tag looks like this: 
<a href="ports.html">reserved TCP port range</a>
 
...and this complete anchor tag appears 2 or more times with the same anchor text (in blue for this explanation), and the end-user executes a keyword search on "TCP port," "port range," or something relevant, the example duplicate results you provided is analog to the results the user gets!  
 
Google and the rest do a 'link cardinality' trick where they check if the same achor text actually points to the same place on the same page so they can pick one of the results to be listed in the top ten or so hits you get back.  It is not a tough thing to script this if somebody at your site is handy with python, java, or javascript, or whatever web application language is handy or makes sense. It is simple string comparison operations. The sequence is this:
 
Query results come back and are fed to your dedup script -->  Dedup script searches for links that are the same.  If link is the same and anchor text is the same, delete the result record, repeat.  --> de-duplicated query results are passed to the user.
 
Just ensure that whoever writes the script gives you a switch to turn it on or off so you can check the query results in both cases to see if it is working to your satisfaction.
 
Cheers,
 
Reid

________________________________

From: Nancy Allison [mailto:nancy.allison4 at verizon.net]
Sent: Wed 7/29/2009 8:32 AM
To: Reid Gray
Subject: [Free Framers] Search produces five identical listings




I've used Frame and Mif2Go to create a .chm file. When I use the  Search tab to search for a term, I get five identical listings for some results, but not all. Example: Search for "debounce" produces

Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Resistor Debounce Values for First Touch Valuation
Using the Favorites Tab
Welcome to the Online Help


The first five listings refer to a lengthy topic, which uses the word "debounce" seven times, if it matters.

The second two are the H1 and subhead, in the same topic, which uses the word twice.

What is causing the multiple listing, and is there anything I can do about it?

--Nancy


--
You are subscribed to the following list:       Free Framers
using the following email:                            rgray at interactivesupercomputing.com

You may automatically unsubscribe from this list at any time by visiting the following URL:
 <http://www.omsys.com/cgi-bin/dada/mail.cgi/u/framers/>

You may also change your subscription by visiting this list's main screen:
<http://www.omsys.com/cgi-bin/dada/mail.cgi/list/framers>

If you're still having trouble, please contact the list owner at:
<mailto:owner-framers at omsys.com>

Mailing List Powered by Dada Mail
http://www.omsys.com/cgi-bin/dada/mail.cgi/what_is_dada_mail/= 




More information about the framers mailing list