欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Gecco爬虫代码分析

程序员文章站 2022-05-05 12:29:04
...

[email protected]

##Ajax例子

ajax例子请查看源码中的com.geccocrawler.gecco.demo.ajax。

##可扩展特性

一、Spider支持下载前后的自定义,实现接口BeforeDownload自定义下载前操作,实现接口AfterDownload自定义下载后操作,通过注解@SpiderName("com.geccocrawler.gecco.demo.MyGithub")关联到某个SpiderBean


二、SpiderBean的属性渲染有时通过注解无法获取需要的数据,比如十分复杂的ajax请求,可以采用自定义属性渲染器的方式,实现接口CustomFieldRender,属性增加注解:@FieldRenderName("CustomFieldRenderName")


三、结合spring开发pipeline

  • 实现SpringPipeLineFactory,例如:

      @Service
      public class SpringPipelineFactory implements PipelineFactory, ApplicationContextAware {
    
      	private ApplicationContext applicationContext;
    
      	@Override
      	public void setApplicationContext(ApplicationContext applicationContext)
      			throws BeansException {
      		this.applicationContext = applicationContext;
      	}
    
      	@Override
      	public Pipeline<? extends SpiderBean> getPipeline(String name) {
      		try {
      			Object bean = applicationContext.getBean(name);
      			if(bean instanceof Pipeline) {
      				return (Pipeline<? extends SpiderBean>)bean;
      			}
      		} catch(NoSuchBeanDefinitionException ex) {
      			System.out.println("no such pipeline : " + name);
      		}
      		return null;
      	}
      }
    
  • 并在GeccoEngine中设置

      @Resource(name="springPipelineFactory")
      private PipelineFactory springPipelineFactory;
    
      GeccoEngine.create().pipelineFactory(springPipelineFactory)...
    
  • 在SpiderBean中引起SpringBean的pipeline的方式和之前没有区别

      @Service
      SpringPipeline impelments Pipeline...
    
      @Gecco(matchUrl="...", pipelines="springPipeline")
      TestSpiderBean implemnets HtmlBean...
相关标签: 爬虫 java