Package us.codecraft.webmagic.model
Class OOSpider<T>
- java.lang.Object
-
- us.codecraft.webmagic.Spider
-
- us.codecraft.webmagic.model.OOSpider<T>
-
- All Implemented Interfaces:
java.lang.Runnable
,Task
public class OOSpider<T> extends Spider
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".
You can customize a crawler by write a page model with annotations.
Such as:@TargetUrl("http://my.oschina.net/flashsword/blog/\\d+") public class OschinaBlog{ @ExtractBy("//title") private String title; @ExtractBy(value = "div.BlogContent",type = ExtractBy.Type.Css) private String content; @ExtractBy(value = "//div[@class='BlogTags']/a/text()", multi = true) private List<String> tags; }
And start the spider by:OOSpider.create(Site.me().addStartUrl("http://my.oschina.net/flashsword/blog") ,new JsonFilePageModelPipeline(), OschinaBlog.class).run(); }
- Since:
- 0.2.0
- Author:
- code4crafter@gmail.com
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class us.codecraft.webmagic.Spider
Spider.Status
-
-
Field Summary
-
Fields inherited from class us.codecraft.webmagic.Spider
destroyWhenExit, downloader, executorService, exitWhenComplete, logger, pageProcessor, pipelines, scheduler, site, spawnUrl, startRequests, stat, STAT_INIT, STAT_RUNNING, STAT_STOPPED, threadNum, threadPool, uuid
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
OOSpider(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)
OOSpider(PageProcessor pageProcessor)
OOSpider(Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
create a spider
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description OOSpider
addPageModel(PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
static OOSpider
create(Site site, java.lang.Class... pageModels)
static OOSpider
create(Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
protected CollectorPipeline
getCollectorPipeline()
OOSpider
setIsExtractLinks(boolean isExtractLinks)
-
Methods inherited from class us.codecraft.webmagic.Spider
addPipeline, addRequest, addUrl, checkIfRunning, clearPipeline, close, create, downloader, extractAndAddRequests, get, getAll, getPageCount, getScheduler, getSite, getSpiderListeners, getStartTime, getStatus, getThreadAlive, getUUID, initComponent, isExitWhenComplete, isSpawnUrl, onError, onError, onSuccess, pipeline, run, runAsync, scheduler, setDownloader, setEmptySleepTime, setExecutorService, setExitWhenComplete, setPipelines, setScheduler, setSpawnUrl, setSpiderListeners, setUUID, sleep, start, startRequest, startUrls, stop, test, thread, thread
-
-
-
-
Constructor Detail
-
OOSpider
protected OOSpider(us.codecraft.webmagic.model.ModelPageProcessor modelPageProcessor)
-
OOSpider
public OOSpider(PageProcessor pageProcessor)
-
OOSpider
public OOSpider(Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
create a spider- Parameters:
site
- sitepageModelPipeline
- pageModelPipelinepageModels
- pageModels
-
-
Method Detail
-
getCollectorPipeline
protected CollectorPipeline getCollectorPipeline()
- Overrides:
getCollectorPipeline
in classSpider
-
create
public static OOSpider create(Site site, PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
-
addPageModel
public OOSpider addPageModel(PageModelPipeline pageModelPipeline, java.lang.Class... pageModels)
-
setIsExtractLinks
public OOSpider setIsExtractLinks(boolean isExtractLinks)
-
-