elasticsearch plugin - Indexing of document in elastic search, JAVA API -
we indexing resume document using elastic search java api. works fine. when searching keyword it's return accurate response(document) has keyword.
but want index document in deep. example resume has 'skills' , 'skills month'. skills month may 13 months in document. search skill , set skill months between 10 15 months in elastic search query, want record(document).
how can this?
here code indexing:-
indexresponse response = client .prepareindex(username, document.gettype(), document.getid()) .setsource(extractdocument(document)).execute() .actionget(); public xcontentbuilder extractdocument(document document) throws ioexception, nosuchalgorithmexception { // extracting content tika int indexedchars = 100000; metadata metadata = new metadata(); string parsedcontent; try { // set maximum length of strings returned parsetostring method, -1 sets no limit parsedcontent = tika().parsetostring(new bytesstreaminput( base64.decode(document.getcontent().getbytes()), false), metadata, indexedchars); } catch (throwable e) { logger.debug("failed extract [" + indexedchars + "] characters of text [" + document.getname() + "]", e); system.out.println("failed extract [" + indexedchars + "] characters of text [" + document.getname() + "]" +e); parsedcontent = ""; } xcontentbuilder source = jsonbuilder().startobject(); if (logger.istraceenabled()) { source.prettyprint(); } // file source .startobject(fsriverutil.doc.file) .field(fsriverutil.doc.file.filename, document.getname()) .field(fsriverutil.doc.file.last_modified, new date()) .field(fsriverutil.doc.file.indexing_date, new date()) .field(fsriverutil.doc.file.content_type, document.getcontenttype() != null ? document.getcontenttype() : metadata.get(metadata.content_type)) .field(fsriverutil.doc.file.url, "file://" + (new file(".", document.getname())).tostring()); if (metadata.get(metadata.content_length) != null) { // try content_length tika first source.field(fsriverutil.doc.file.filesize, metadata.get(metadata.content_length)); } else { // otherwise, use our byte[] length source.field(fsriverutil.doc.file.filesize, base64.decode(document.getcontent().getbytes()).length); } source.endobject(); // file // path source .startobject(fsriverutil.doc.path) .field(fsriverutil.doc.path.encoded, signtool.sign(".")) .field(fsriverutil.doc.path.root, ".") .field(fsriverutil.doc.path.virtual, ".") .field(fsriverutil.doc.path.real, (new file(".", document.getname())).tostring()) .endobject(); // path // meta source .startobject(fsriverutil.doc.meta) .field(fsriverutil.doc.meta.author, metadata.get(metadata.author)) .field(fsriverutil.doc.meta.title, metadata.get(metadata.title) != null ? metadata.get(metadata.title) : document.getname()) .field(fsriverutil.doc.meta.date, metadata.get(metadata.date)) .array(fsriverutil.doc.meta.keywords, strings.commadelimitedlisttostringarray(metadata.get(metadata.keywords))) .endobject(); // meta // doc content source.field(fsriverutil.doc.content, parsedcontent); // doc binary attachment source.field(fsriverutil.doc.attachment, document.getcontent()); // end of our document source.endobject(); return source; }
below code used getting response:
querybuilder qb; if (query == null || query.trim().length() <= 0) { qb = querybuilders.matchallquery(); } else { qb = querybuilders.querystring(query);//query name or string } org.elasticsearch.action.search.searchresponse searchhits = node.client() .preparesearch() .setindices("ankur") .setquery(qb) .setfrom(0).setsize(1000) .addhighlightedfield("file.filename") .addhighlightedfield("content") .addhighlightedfield("meta.title") .sethighlighterpretags("<span class='badge badge-info'>") .sethighlighterposttags("</span>") .addfields("*", "_source") .execute().actionget();
elastic search indices column default providing better search capabilities. before put json documents under type, great define mappings (refer: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-analysis.html)
when want search data exact keyword, may need skip particular column not analyzing. while indexing document, column values analyzed , indexed. can enforce elastic saying "not_analyzed". column value indexed is. way can better search results.
for part defining json document, if use library define json. prefer jackson library parsing json document. reduce lines of code in project.
Comments
Post a Comment