跟益达学Solr5之批量索引JSON数据
假定你有这样一堆JSON数据,
[ {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00}, {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00}, {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00}, {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00}, {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}, {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}, {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}, {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00}, {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00}, {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00}, {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00}, {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00}, {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00}, {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00}, {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50}, {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00}, {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50}, {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00}, {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00}, {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00} ]
你想导入到Solr中进行索引,怎么办?其实Solr的Web UI界面就可以操作,在左侧有个Documents菜单,表示导入Document(当然也支持Document更新)的意思,Document加个s即表示支持批量导入多个Document,如图:
Document Type即表示你的Document数据来源是什么,是来自于JSON,来自于XML,来自于CVS等等,
Commit Within表示document提交必须在指定的毫秒数内完成,否则提交操作视为超时;
Overwriter即表示是否覆盖索引目录下已有的索引数据,设置为false即表示不覆盖已有索引只在原来的基础上追加索引数据;
Boost:表示设置Document的权重,默认值为1.0;
如果你只是单个JSON对象需要导入,那直接选择Document Type为JSON即可,当你选择Document Type为JSON后,Document(s)文本框会提示一个示例,如图:
当然你也可以选择Document Type为Solr Command(raw XML or JSON),只不过这时候JSON数据格式就有特殊要求了,你的JSON数据格式需要这样定义:
{ "add": { "doc": {.......} }, "add": { "doc": {.......} }, "add": { "doc": {.......} }, "add": { "doc": {.......} }, "add": { "doc": {.......} }, ............. and so on. }
其中{.........}部分就是你的Document对象,其余部分为固定格式。使用这种格式正好弥补了Document Type为JSON这种方式只能一条一条的导入,效率太低,当你需要批量导入多个Document时,采用这种格式支持批量导入多个Document。
如果你需要导入XML数据,你需要选择Document Type为XML,如图:
<doc></doc>标签之间的就是你的XML数据,不过它跟Document Type选择为JSON有同样的弊端就是只支持单条导入,如果你需要批量导入XML数据,你同样可以选择Document Type为Solr Command(raw XML or JSON),只不过这时候,数据格式应该是类似这样的:
<add> <doc> <id>xxxx</id> <name>xxxxxxxx</name> <age>xxxxxxxx</age> </doc> <doc> <id>xxxx</id> <name>xxxxxxxx</name> <age>xxxxxxxx</age> </doc> <doc> <id>xxxx</id> <name>xxxxxxxx</name> <age>xxxxxxxx</age> </doc> ............ and so on </add>
如果你想更新Document,那就把<add>元素改成<update>即可,同理还有<delete>你懂的,先前在讲post.jar的时候我有提到过,具体请参阅《跟益达学Solr5之玩转post.jar》,OK,说了那么多,那现在我就以JSON数据为例进行一个操作示范,假定我有这样一个JSON数据,如图:
首先我们需要从JSON数据中提炼出Field域,并在我们的Schema.xml配置文件定义域,如图:
然后我们需要把传统的JSON数据转换成Solr能识别的格式,如图:
{ "add": { "doc": {"id":"1", "name":"Red Lobster", "city":"San Francisco, CA", "type":"Sit-down Chain", "state":"California", "tags":["sea food", "sit down"], "price":33.00} }, "add": { "doc": {"id":"2", "name":"Red Lobster", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["sea food", "sit-down"], "price":22.00} }, "add": { "doc": {"id":"3", "name":"Red Lobster", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["sea food", "sit-down"], "price":29.00} }, "add": { "doc": {"id":"4", "name":"McDonalds", "city":"San Francisco, CA", "type":"Fast Food", "state":"California", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":9.00} }, "add": { "doc": {"id":"5", "name":"McDonalds", "city":"Atlanta, GA", "type":"Fast Food", "state":"Georgia", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00} }, "add": { "doc": {"id":"6", "name":"McDonalds", "city":"New York, NY", "type":"Fast Food", "state":"New York", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00} }, "add": { "doc": {"id":"7", "name":"McDonalds", "city":"Chicago, IL", "type":"Fast Food", "state":"Illinois", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00} }, "add": { "doc": {"id":"8", "name":"McDonalds", "city":"Austin, TX", "type":"Fast Food", "state":"Texas", "tags":["fast food", "hamburgers", "coffee", "wi-fi", "breakfast"], "price":4.00} }, "add": { "doc": {"id":"9", "name":"Pizza Hut", "city":"Atlanta, GA", "type":"Sit-down Chain", "state":"Georgia", "tags":["pizza", "sit-down", "delivery"], "price":15.00} }, "add": { "doc": {"id":"10", "name":"Pizza Hut", "city":"New York, NY", "type":"Sit-down Chain", "state":"New York", "tags":["pizza", "sit-down", "delivery"], "price":24.00} }, "add": { "doc": {"id":"11", "name":"Pizza Hut", "city":"Austin, TX", "type":"Sit-down Chain", "state":"Texas", "tags":["pizza", "sit-down", "delivery"], "price":18.00} }, "add": { "doc": {"id":"12", "name":"Freddy's Pizza Shop", "city":"Los Angeles, CA", "type":"Local Sit-down", "state":"California", "tags":["pizza", "pasta", "sit-down"], "price":25.00} }, "add": { "doc": {"id":"13", "name":"The Iberian Pig", "city":"Atlanta, GA", "type":"Upscale", "state":"Georgia", "tags":["spanish", "tapas", "sit-down", "upscale"], "price":45.00} }, "add": { "doc": {"id":"14", "name":"Sprig", "city":"Atlanta, GA", "type":"Local Sit-down", "state":"Georgia", "tags":["sit-down", "gluten-free", "southern cuisine"], "price":15.00} }, "add": { "doc": {"id":"15", "name":"Starbucks", "city":"San Francisco, CA", "type":"Coffee Shop", "state":"California", "tags":["coffee", "breakfast"], "price":7.50} }, "add": { "doc": {"id":"16", "name":"Starbucks", "city":"Atlanta, GA", "type":"Coffee Shop", "state":"Georgia", "tags":["coffee", "breakfast"], "price":4.00} }, "add": { "doc": {"id":"17", "name":"Starbucks", "city":"New York, NY", "type":"Coffee Shop", "state":"New York", "tags":["coffee", "breakfast"], "price":6.50} }, "add": { "doc": {"id":"18", "name":"Starbucks", "city":"Chicago, IL", "type":"Coffee Shop", "state":"Illinois", "tags":["coffee", "breakfast"], "price":6.00} }, "add": { "doc": {"id":"19", "name":"Starbucks", "city":"Austin, TX", "type":"Coffee Shop", "state":"Texas", "tags":["coffee", "breakfast"], "price":5.00} }, "add": { "doc": {"id":"20", "name":"Starbucks", "city":"Greenville, SC", "type":"Coffee Shop", "state":"South Carolina", "tags":["coffee", "breakfast"], "price":3.00} } }
然后启动你的Tomcat,然后如图操作:
提交后,执行查询,如图:
as
请注意Document Type选择项,如果你选择为JSON,那你将会收到这样一个异常,如图:
示例相关的配置以及测试数据,请看底下的附件,如果你们在操作过程中,遇到任何问题,请联系我,同时也欢迎各路Java高手加群一起交流学习,
益达Q-Q: 7-3-6-0-3-1-3-0-5
益达的Q-Q群: 1-0-5-0-9-8-8-0-6
上一篇: vue 全局函数