Class OOSpider<T>

java.lang.Object
us.codecraft.webmagic.Spider
us.codecraft.webmagic.model.OOSpider<T>
All Implemented Interfaces:
Runnable, Task

public class OOSpider<T> extends Spider
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".
You can customize a crawler by write a page model with annotations.
Such as:
 @TargetUrl("http://my.oschina.net/flashsword/blog/\\d+")
  public class OschinaBlog{

      @ExtractBy("//title")
      private String title;

      @ExtractBy(value = "div.BlogContent",type = ExtractBy.Type.Css)
      private String content;

      @ExtractBy(value = "//div[@class='BlogTags']/a/text()", multi = true)
      private List<String> tags;
 }
 
And start the spider by:
   OOSpider.create(Site.me().addStartUrl("http://my.oschina.net/flashsword/blog")
        ,new JsonFilePageModelPipeline(), OschinaBlog.class).run();
 }
 
Since:
0.2.0
Author:
code4crafter@gmail.com
  • Constructor Details

    • OOSpider

      protected OOSpider(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)
    • OOSpider

      public OOSpider(PageProcessor pageProcessor)
    • OOSpider

      public OOSpider(Site site, PageModelPipeline pageModelPipeline, Class... pageModels)
      create a spider
      Parameters:
      site - site
      pageModelPipeline - pageModelPipeline
      pageModels - pageModels
  • Method Details