function_score 之script_score

发布时间：2020-05-12 03:45:03 作者：大海之中
来源：网络阅读：1811

function_score 配合 script_score 是排序的终极方案

例子：

curl -XGET 'http://localhost:9200/*/*/_search?pretty&explain' -d '{

"size" : 0,

"query" : {

"function_score" : {

"query" : {

"filtered" : {

"filter" : {

"bool" : {

"must" : {

"match" : {

"_all" : {

"query" : "关键字",

"type" : "boolean",

"operator" : "AND"

}

"functions" : [ {

"script_score" : {

"params": {

"field": "company_name",

"term": "关键字"

"script" : "_index[field][term].df()"

}

} ，

{

"filter" : {

"match" : {

"company_name" : {

"query" : "关键字",

"type" : "boolean",

"operator" : "AND"

}

"weight" : 2

}

"score_mode" : "sum"

}

"aggregations" : {

"agg" : {

"terms" : {

"field" : "member_id",

"size" : 0,

"order" : {

"top_hit" : "desc"

}

"aggregations" : {

"top_hit" : {

"max" : {

"script" : {

"inline" : "_score"

}

需要配置：

script.engine.groovy.inline.search: on

script.inline: on

script.indexed: on

script_score可以让你更加灵活的操作ES的打分。例如

"script_score" : {

"params": {

"field": "company_name",

"terms": ["关键字"]

"script" : "score = 0.0; queryLength = 0.0; docLength = 0.0; for (word in terms) { tf = _index[field][word].tf(); score = score + tf * 1.0; queryLength = queryLength + 1.0; docLength = docLength + pow(tf, 2.0); }; return (float)score /(sqrt(docLength) * sqrt(queryLength)); "

}

这里需要注意的是 company_name 这个字段不允许分词，否则这个近似算法可能有问题，个人感觉script_score 执行的时间比较靠后，此时分词阶段已经结束，company_name 已经被分词完毕，不是源表中的值了，所以这个字段不能分词。这里例子实际意义不大，但是看到function_score 还是很强大的。

当上面的query内容变成如下内容时：

"match" : {

"_all" : {

"query" : "关键字",

"type" : "boolean",

"operator" : "AND"

}

相关×××开始启用。打出的分数算法如下。

score=（weight+script_score）*相关性分数

而原来的写法，打分就是 _index[field][term].df()+weight 的值，因为相关性在filter下都是1。

下面举例来说明

function_score 之script_score

4.09 是相关性的分数，

6.09=2+4.09 其中 4.09 来自下面的配置，很明显_score=4.09 因为上面已经提到了。

"script_score" : {

"script" : "return _score"

}

所以：score=（weight+script_score）*相关性分数。

至于相关性的分数如何打出，也很类似，请自行查看资料学习

function_score 之script_score

相关阅读