Package us.codecraft.webmagic.model
Class OOSpider<T>
- java.lang.Object
-
- us.codecraft.webmagic.Spider
-
- us.codecraft.webmagic.model.OOSpider<T>
-
- All Implemented Interfaces:
java.lang.Runnable
,us.codecraft.webmagic.Task
public class OOSpider<T> extends us.codecraft.webmagic.Spider
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".
You can customize a crawler by write a page model with annotations.
Such as:@TargetUrl("http://my.oschina.net/flashsword/blog/\\d+") public class OschinaBlog{ @ExtractBy("//title") private String title; @ExtractBy(value = "div.BlogContent",type = ExtractBy.Type.Css) private String content; @ExtractBy(value = "//div[@class='BlogTags']/a/text()", multi = true) private List<String> tags; }
And start the spider by:OOSpider.create(Site.me().addStartUrl("http://my.oschina.net/flashsword/blog") ,new JsonFilePageModelPipeline(), OschinaBlog.class).run(); }
- Since:
- 0.2.0
- Author:
- code4crafter@gmail.com
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
OOSpider(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)
OOSpider(us.codecraft.webmagic.processor.PageProcessor pageProcessor)
OOSpider(us.codecraft.webmagic.Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
create a spider
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description OOSpider
addPageModel(PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
static OOSpider
create(us.codecraft.webmagic.Site site, java.lang.Class... pageModels)
static OOSpider
create(us.codecraft.webmagic.Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
protected us.codecraft.webmagic.pipeline.CollectorPipeline
getCollectorPipeline()
OOSpider
setIsExtractLinks(boolean isExtractLinks)
-
Methods inherited from class us.codecraft.webmagic.Spider
addPipeline, addRequest, addUrl, checkIfRunning, clearPipeline, close, create, downloader, extractAndAddRequests, get, getAll, getPageCount, getScheduler, getSite, getSpiderListeners, getStartTime, getStatus, getThreadAlive, getUUID, initComponent, isExitWhenComplete, isSpawnUrl, onError, onError, onSuccess, pipeline, run, runAsync, scheduler, setDownloader, setEmptySleepTime, setExecutorService, setExitWhenComplete, setPipelines, setScheduler, setSpawnUrl, setSpiderListeners, setUUID, sleep, start, startRequest, startUrls, stop, test, thread, thread
-
-
-
-
Constructor Detail
-
OOSpider
protected OOSpider(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)
-
OOSpider
public OOSpider(us.codecraft.webmagic.processor.PageProcessor pageProcessor)
-
OOSpider
public OOSpider(us.codecraft.webmagic.Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
create a spider- Parameters:
site
- sitepageModelPipeline
- pageModelPipelinepageModels
- pageModels
-
-
Method Detail
-
getCollectorPipeline
protected us.codecraft.webmagic.pipeline.CollectorPipeline getCollectorPipeline()
- Overrides:
getCollectorPipeline
in classus.codecraft.webmagic.Spider
-
create
public static OOSpider create(us.codecraft.webmagic.Site site, java.lang.Class... pageModels)
-
create
public static OOSpider create(us.codecraft.webmagic.Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
-
addPageModel
public OOSpider addPageModel(PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
-
setIsExtractLinks
public OOSpider setIsExtractLinks(boolean isExtractLinks)
-
-