elasticsearch - Primary/Replica Inconsistent Scoring -

July 15, 2013

we have cluster 3 primary shards , 2 replicas per primary. total doc count same primary/replica shards; however, we're getting 3 distinct scores same query/document. when add preference = primary query parameter, consistent scores each time.

the explanation can think of different df counts between primary/replicas. inconsistency between primary/replica shards, , how 1 go fixing this? we're using 1.4.2.

edit: reindexed doctype querying, there's still inconsistent scoring.

primary , replica shards have different "path" when comes segment merging. meaning, number , size of segments can differ between them. each shared takes care of own segments independent other shards.

why matters when comes calculating score, because merging moment when documents deleted deleted. until then, deleted documents marked deleted (and taken out query results after query ran). so, means can influence algorithm score calculated.

to more specific - total number of docs in shard used [idf calculation](http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/defaultsimilarity.html#idf(long, long)) , document frequency (docfreq):

return (float)(math.log(numdocs/(double)(docfreq+1)) + 1.0)

and number of docs include deleted (marked deleted, more precise) documents. take, also, @ this github issue , simon's comments regarding same subject.

Search This Blog

Macro

elasticsearch - Primary/Replica Inconsistent Scoring -

Comments

Post a Comment

Popular posts from this blog

symfony - TEST environment only: The database schema is not in sync with the current mapping file -

twig - Using Twigbridge in a Laravel 5.1 Package -

jdbc - Not able to establish database connection in eclipse -