Class OOSpider<T>

  • All Implemented Interfaces:
    java.lang.Runnable, Task

    public class OOSpider<T>
    extends Spider
    The spider for page model extractor.
    In webmagic, we call a POJO containing extract result as "page model".
    You can customize a crawler by write a page model with annotations.
    Such as:
     @TargetUrl("http://my.oschina.net/flashsword/blog/\\d+")
      public class OschinaBlog{
    
          @ExtractBy("//title")
          private String title;
    
          @ExtractBy(value = "div.BlogContent",type = ExtractBy.Type.Css)
          private String content;
    
          @ExtractBy(value = "//div[@class='BlogTags']/a/text()", multi = true)
          private List<String> tags;
     }
     
    And start the spider by:
       OOSpider.create(Site.me().addStartUrl("http://my.oschina.net/flashsword/blog")
            ,new JsonFilePageModelPipeline(), OschinaBlog.class).run();
     }
     
    Since:
    0.2.0
    Author:
    code4crafter@gmail.com
    • Constructor Detail

      • OOSpider

        protected OOSpider​(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)
      • OOSpider

        public OOSpider​(Site site,
                        PageModelPipeline pageModelPipeline,
                        java.lang.Class... pageModels)
        create a spider
        Parameters:
        site - site
        pageModelPipeline - pageModelPipeline
        pageModels - pageModels