ElasticSearch에서 모든 _id를 검색하는 효율적인 방법
ElasticSearch에서 특정 인덱스의 모든 _id를 가져 오는 가장 빠른 방법은 무엇입니까? 간단한 쿼리를 사용하여 가능합니까? 내 색인 중 하나에는 약 20,000 개의 문서가 있습니다.
편집 : @Aleck Landgraf의 답변도 읽으십시오.
elasticsearch-internal _id
필드를 원하십니까? 아니면 id
문서 내의 필드?
전자의 경우 시도
curl http://localhost:9200/index/type/_search?pretty=true -d '
{
"query" : {
"match_all" : {}
},
"stored_fields": []
}
'
참고 2017 업데이트 : 게시물은 원래 포함 "fields": []
되었지만 그 이후로 이름이 변경되었으며 stored_fields
새로운 값입니다.
결과에는 문서의 "메타 데이터"만 포함됩니다.
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "index",
"_type" : "type",
"_id" : "36",
"_score" : 1.0
}, {
"_index" : "index",
"_type" : "type",
"_id" : "38",
"_score" : 1.0
}, {
"_index" : "index",
"_type" : "type",
"_id" : "39",
"_score" : 1.0
}, {
"_index" : "index",
"_type" : "type",
"_id" : "34",
"_score" : 1.0
} ]
}
}
후자의 경우 문서의 필드를 포함하려면 fields
배열에 추가하기 만하면 됩니다.
curl http://localhost:9200/index/type/_search?pretty=true -d '
{
"query" : {
"match_all" : {}
},
"fields": ["document_field_to_be_returned"]
}
'
Better to use scroll and scan to get the result list so elasticsearch doesn't have to rank and sort the results.
With the elasticsearch-dsl
python lib this can be accomplished by:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
s = Search(using=es, index=ES_INDEX, doc_type=DOC_TYPE)
s = s.fields([]) # only get ids, otherwise `fields` takes a list of field names
ids = [h.meta.id for h in s.scan()]
Console log:
GET http://localhost:9200/my_index/my_doc/_search?search_type=scan&scroll=5m [status:200 request:0.003s]
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s]
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s]
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.003s]
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s]
...
Note: scroll pulls batches of results from a query and keeps the cursor open for a given amount of time (1 minute, 2 minutes, which you can update); scan disables sorting. The scan
helper function returns a python generator which can be safely iterated through.
For elasticsearch 5.x, you can use the "_source" field.
GET /_search
{
"_source": false,
"query" : {
"term" : { "user" : "kimchy" }
}
}
"fields"
has been deprecated. (Error: "The field [fields] is no longer supported, please use [stored_fields] to retrieve stored fields or _source filtering if the field is not stored")
Another option
curl 'http://localhost:9200/index/type/_search?pretty=true&fields='
will return _index, _type, _id and _score.
you can also do it in python, which gives you a proper list:
import elasticsearch
es = elasticsearch.Elasticsearch()
res = es.search(
index=your_index,
body={"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]})
ids = [d['_id'] for d in res['hits']['hits']]
Elaborating on the 2 answers by @Robert-Lujo and @Aleck-Landgraf (someone with the permissions can gladly move this to a comment): if you do not want to print but get everything inside a list from the returned generator, here is what I use:
from elasticsearch import Elasticsearch,helpers
es = Elasticsearch(hosts=[YOUR_ES_HOST])
a=helpers.scan(es,query={"query":{"match_all": {}}},scroll='1m',index=INDEX_NAME)#like others so far
IDs=[aa['_id'] for aa in a]
Inspired by @Aleck-Landgraf answer, for me it worked by using directly scan function in standard elasticsearch python API:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
es = Elasticsearch()
for dobj in scan(es,
query={"query": {"match_all": {}}, "fields" : []},
index="your-index-name", doc_type="your-doc-type"):
print dobj["_id"],
Url -> http://localhost:9200/<index>/<type>/_query
http method -> GET
Query -> {"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]}
참고URL : https://stackoverflow.com/questions/17497075/efficient-way-to-retrieve-all-ids-in-elasticsearch
'developer tip' 카테고리의 다른 글
한 SQL Server에서 다른 SQL Server로 테이블 데이터 내보내기 (0) | 2020.12.07 |
---|---|
Spring Profile 변수 설정 (0) | 2020.12.07 |
Android의 WindowManager는 무엇입니까? (0) | 2020.12.07 |
UIToolbar의 높이를 변경하는 방법이 있습니까? (0) | 2020.12.06 |
Linq OrderByDescending, 먼저 null (0) | 2020.12.06 |