开发者

Indexing Attachment file to elastic search

I have typed this command to index a document in Elasticsearch

create an index

curl -X PUT "localhost:9200/test_idx_1x"

create a mapping

curl -X PUT "localhost:9200/test_idx_1x/test_mapping_1x/_mapping" -d '{
  "test_mapping_1x": {
    "properties": {
      "my_attachments": {
        "type": "attachment"
      }
    }
  }
}'

index this document

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/4' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "test Elastic Search",
  "name": "N1"
}'

All these three commands are very goods. But when I type this command:

curl -XPOST 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "type": "attachment",
    "_content_type": "text/plain",
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
  }
}'

I receive this error message:

{
  "error": "NullPointerException[null]",
  "status": 500
}

I change it into;

curl -XPOST 'http://localhost:9200/test_idx_1x/test_mapping_1x/1bis' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "type": "attachment",
    "_content_type": "text/plain",
    "_name": "/inf/bd/my_home_directory/test.txt"
  }
}'

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
  }
}'

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt",
    "_content_type": "text/plain"
  }
}'

The output is the same error.

I change it like that

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "user": "kimchy",
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": {
    "file": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt",
    "_content_type": "text/plain"开发者_开发百科,
    "content": "... base64 encoded attachment ..."
  }
}'

the error is

{
  "error": "MapperParsingException[Failed to parse]; nested: JsonParseException[Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal character '.' (code 0x2e) in base64 content\n at [Source: [B@159b3; line: 1, column: 241]]; ",
  "status": 400
}

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
}'

I receive this error message:

{
  "error": "MapperParsingException[Failed to parse]; nested: JsonParseException[Unexpected character ('h' (code 104)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: [B@1ae9565; line: 1, column: 132]]; ",
  "status": 400
}

if I type

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/1' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": "http://localhost:5984/my_test_couch_db_7/ID2/test.txt"
}'

I receive error. I can understand it

{
  "error": "MapperParsingException[Failed to parse]; nested: JsonParseException[Failed to decode VALUE_STRING as base64 (MIME-NO-LINEFEEDS): Illegal character ':' (code 0x3a) in base64 content\n at [Source: [B@1ffb7d4; line: 1, column: 137]]; ",
  "status": 400
}

How can I use attach files to ES so that ES can index it?


Thanks for your answer. That attachment plugin I have already installed when I type these commands. The content of the text file is encoded in Base64, so I don't encode it anymore. If I don't use the file's path but directly use its contents in Base 64, ex.

curl -XPUT 'http://localhost:9200/test_idx_1x/test_mapping_1x/' -d '{
  "post_date": "2009-11-15T14:12:12",
  "message": "trying out Elastic Search",
  "name": "N2",
  "my_attachments": "file's content string encoded in base64"
}'

all is good, I have already succeeded in posting file and searching its content later.

But if I replace it with path's file, I obtained negative results. So I want to know how to encode Base64 a file in command line,in the command of ES indexing (of course, I don't want to type base64 command to encode a file before typing 2nd command to indexing it in ES). As your answer, do I have to installed something like "Perl library" to execute your command?


http://es-cn.medcl.net/tutorials/2011/07/18/attachment-type-in-action.html

#!/bin/sh

coded=`cat fn6742.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file
curl -X POST "localhost:9200/test/attachment/" -d @json.file


First, you don't specify whether you have the attachment plugin installed. If not, you can do so with:

./bin/plugin -install mapper-attachments

You will need to restart ElasticSearch for it to load the plugin.

Then, as you do above, you map a field to have type attachment:

curl -XPUT 'http://127.0.0.1:9200/foo/?pretty=1'  -d '
{
   "mappings" : {
      "doc" : {
         "properties" : {
            "file" : {
               "type" : "attachment"
            }
         }
      }
   }
}
'

When you try to index a document, you need to encode the contents of your file in Base64. You could do this on the command line using the base64 command line utility. However, to be legal JSON, you also need to encode new lines, which you can do by piping the output from base64 through Perl:

curl -XPOST 'http://127.0.0.1:9200/foo/doc?pretty=1'  -d '
{
   "file" : '`base64 /path/to/file | perl -pe 's/\n/\\n/g'`'
}
'

Now you can search your file:

curl -XGET 'http://127.0.0.1:9200/foo/doc/_search?pretty=1'  -d '
{
   "query" : {
      "text" : {
         "file" : "text to look for"
      }
   }
}
'

See ElasticSearch attachment type for more.


This is a complete shell script implementation:

file_path='/path/to/file'
file=$(base64 $file_path | perl -pe 's/\n/\\n/g')
curl -XPUT "http://eshost.com:9200/index/type/" -d '{
    "file" : "content" : "'$file'"
}'


There is an alternative solution - plugin at http://elasticwarehouse.org. You can upload binary file using _ewupload?, read newly generated ID and update your different index with this reference.

Install plugin:

plugin -install elasticwarehouseplugin -u http://elasticwarehouse.org/elasticwarehouse/elasticsearch-elasticwarehouseplugin-1.2.2-1.7.0-with-dependencies.zip

Restart cluster, then:

curl -XPOST "http://127.0.0.1:9200/_ewupload?folder=/myfolder&filename=mybinaryfile.bin" --data-binary @mybinaryfile.bin

Sample response:

{"id":"nWvrczBcSEywHRBBBwfy2g","version":1,"created":true}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜