[OPEN-ILS-DEV] Relevance (again)
mrylander at gmail.com
Wed Feb 13 19:00:21 EST 2008
On Feb 13, 2008 4:30 PM, Patrick Durusau <patrick at durusau.net> wrote:
> When I first joined this list I had a question about the search
> algorithm that was never quite answered. A problem with it has come up
> I search for "Apple fruit" and got 18 "hits."
> To the immediate left I have a listing of relevant subjects, the first
> one of which is "Apples." Followed by "Fruit trees", "Fruit", then
> "Frontier and pioneer life" and then "Overland journeys to the Pacific".
> Oh, but it gets better.
> Guess what is returned if you select "Apples?" Well partner, it isn't
> Dewey 583.73 Apples.
The subject sidebar entry links create new searches, so you're in
effect broadening your search from "subject:apple fruit" to
> No, it helpfully returns 568 "hits" which starts off with Apple
> Computers, includes Appling Country census results and the tenth item is
> an apple cookbook.
Among other things, the search infrastructure will stem any unadorned
terms that you enter, which turns "apples" into "apple". "Appling"
> Does that strike anyone besides myself as rather odd behavior for a
> search engine? Or perhaps I should say, a library search engine?
> Well, but opinions are going to vary on that score aren't they?
> My real question is: Where is the relevance behavior for Evergreen set
> such that I can alter it?
That depends on the version. You were testing on the production PINES
servers (it seems, as I replicated your searches and result counts
there), which is currently on 220.127.116.11 (soon to be 18.104.22.168). There are
weighting values that you can apply in 1.2 that control how much a
particular searched field is worth. So, for instance, topical
subjects could be weighted higher than corporate name subjects, which
would make the Granny Smiths float to the top, above the ][e
> That gets us past all the normative questions and to one that is purely
> technical. I want to *alter* the relevance behavior of Evergreen
> searches. Where is that done?
There are many different things that can be done to change the way
Evergreen performs searches. One could replace, or augment, the
snoball stemmer that is used by default with a dictionary stemmer (or
a non-stemming dictionary). One could turn off stemming altogether,
and require exact word matches.
In future versions (as the plan stands, 1.4 to some degree and 2.0 a
larger degree) one will be able to adjust the relevancy bonuses given
under certain circumstances. For instance, title searches give a
higher rank when the searched words are in the same order in both the
field and the query. Author searches give a large bonus when the
first word of both the field and the query match exactly. Bonuses are
given all around when phrases match. And, obviously, a normalized
full-query-and-field match gets a very large bonus.
One way to effectively turn off stemming today is to quote words and
phrases, which forces a space and case-normalized direct match for the
quotes sections of text in the query.
Does that answer some of your questions?
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker at esilibrary.com
| web: http://www.esilibrary.com
More information about the Open-ils-dev