Class OOSpider<T>

  • All Implemented Interfaces:
    java.lang.Runnable, us.codecraft.webmagic.Task

    public class OOSpider<T>
    extends us.codecraft.webmagic.Spider
    The spider for page model extractor.
    In webmagic, we call a POJO containing extract result as "page model".
    You can customize a crawler by write a page model with annotations.
    Such as:
     @TargetUrl("http://my.oschina.net/flashsword/blog/\\d+")
      public class OschinaBlog{
    
          @ExtractBy("//title")
          private String title;
    
          @ExtractBy(value = "div.BlogContent",type = ExtractBy.Type.Css)
          private String content;
    
          @ExtractBy(value = "//div[@class='BlogTags']/a/text()", multi = true)
          private List<String> tags;
     }
     
    And start the spider by:
       OOSpider.create(Site.me().addStartUrl("http://my.oschina.net/flashsword/blog")
            ,new JsonFilePageModelPipeline(), OschinaBlog.class).run();
     }
     
    Since:
    0.2.0
    Author:
    code4crafter@gmail.com
    • Nested Class Summary

      • Nested classes/interfaces inherited from class us.codecraft.webmagic.Spider

        us.codecraft.webmagic.Spider.Status
    • Field Summary

      • Fields inherited from class us.codecraft.webmagic.Spider

        destroyWhenExit, downloader, executorService, exitWhenComplete, logger, pageProcessor, pipelines, scheduler, site, spawnUrl, startRequests, stat, STAT_INIT, STAT_RUNNING, STAT_STOPPED, threadNum, threadPool, uuid
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected OOSpider​(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)  
        OOSpider​(us.codecraft.webmagic.processor.PageProcessor pageProcessor)  
        OOSpider​(us.codecraft.webmagic.Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
      create a spider
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      OOSpider addPageModel​(PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)  
      static OOSpider create​(us.codecraft.webmagic.Site site, java.lang.Class... pageModels)  
      static OOSpider create​(us.codecraft.webmagic.Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)  
      protected us.codecraft.webmagic.pipeline.CollectorPipeline getCollectorPipeline()  
      OOSpider setIsExtractLinks​(boolean isExtractLinks)  
      • Methods inherited from class us.codecraft.webmagic.Spider

        addPipeline, addRequest, addUrl, checkIfRunning, clearPipeline, close, create, downloader, extractAndAddRequests, get, getAll, getPageCount, getScheduler, getSite, getSpiderListeners, getStartTime, getStatus, getThreadAlive, getUUID, initComponent, isExitWhenComplete, isSpawnUrl, onError, onError, onSuccess, pipeline, run, runAsync, scheduler, setDownloader, setEmptySleepTime, setExecutorService, setExitWhenComplete, setPipelines, setScheduler, setSpawnUrl, setSpiderListeners, setUUID, sleep, start, startRequest, startUrls, stop, test, thread, thread
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • OOSpider

        protected OOSpider​(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)
      • OOSpider

        public OOSpider​(us.codecraft.webmagic.processor.PageProcessor pageProcessor)
      • OOSpider

        public OOSpider​(us.codecraft.webmagic.Site site,
                        PageModelPipeline pageModelPipeline,
                        java.lang.Class... pageModels)
        create a spider
        Parameters:
        site - site
        pageModelPipeline - pageModelPipeline
        pageModels - pageModels
    • Method Detail

      • getCollectorPipeline

        protected us.codecraft.webmagic.pipeline.CollectorPipeline getCollectorPipeline()
        Overrides:
        getCollectorPipeline in class us.codecraft.webmagic.Spider
      • create

        public static OOSpider create​(us.codecraft.webmagic.Site site,
                                      java.lang.Class... pageModels)
      • create

        public static OOSpider create​(us.codecraft.webmagic.Site site,
                                      PageModelPipeline pageModelPipeline,
                                      java.lang.Class... pageModels)
      • setIsExtractLinks

        public OOSpider setIsExtractLinks​(boolean isExtractLinks)