Gecco爬虫代码分析
程序员文章站
2022-05-05 12:29:04
...
##Ajax例子
ajax例子请查看源码中的com.geccocrawler.gecco.demo.ajax。
##可扩展特性
一、Spider支持下载前后的自定义,实现接口BeforeDownload自定义下载前操作,实现接口AfterDownload自定义下载后操作,通过注解@SpiderName("com.geccocrawler.gecco.demo.MyGithub")关联到某个SpiderBean
二、SpiderBean的属性渲染有时通过注解无法获取需要的数据,比如十分复杂的ajax请求,可以采用自定义属性渲染器的方式,实现接口CustomFieldRender,属性增加注解:@FieldRenderName("CustomFieldRenderName")
三、结合spring开发pipeline
-
实现SpringPipeLineFactory,例如:
@Service public class SpringPipelineFactory implements PipelineFactory, ApplicationContextAware { private ApplicationContext applicationContext; @Override public void setApplicationContext(ApplicationContext applicationContext) throws BeansException { this.applicationContext = applicationContext; } @Override public Pipeline<? extends SpiderBean> getPipeline(String name) { try { Object bean = applicationContext.getBean(name); if(bean instanceof Pipeline) { return (Pipeline<? extends SpiderBean>)bean; } } catch(NoSuchBeanDefinitionException ex) { System.out.println("no such pipeline : " + name); } return null; } }
-
并在GeccoEngine中设置
@Resource(name="springPipelineFactory") private PipelineFactory springPipelineFactory; GeccoEngine.create().pipelineFactory(springPipelineFactory)...
-
在SpiderBean中引起SpringBean的pipeline的方式和之前没有区别
@Service SpringPipeline impelments Pipeline... @Gecco(matchUrl="...", pipelines="springPipeline") TestSpiderBean implemnets HtmlBean...
下一篇: ubuntu术后那些事儿