Index
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form
$
- $(String) - Method in class us.codecraft.webmagic.selector.HtmlNode
- $(String) - Method in class us.codecraft.webmagic.selector.PlainText
- $(String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
select list with css selector
- $(String) - Static method in class us.codecraft.webmagic.selector.Selectors
- $(String, String) - Method in class us.codecraft.webmagic.selector.HtmlNode
- $(String, String) - Method in class us.codecraft.webmagic.selector.PlainText
- $(String, String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
select list with css selector
- $(String, String) - Static method in class us.codecraft.webmagic.selector.Selectors
A
- AbstractDownloader - Class in us.codecraft.webmagic.downloader
-
Base class of downloader with some common methods.
- AbstractDownloader() - Constructor for class us.codecraft.webmagic.downloader.AbstractDownloader
- AbstractSelectable - Class in us.codecraft.webmagic.selector
- AbstractSelectable() - Constructor for class us.codecraft.webmagic.selector.AbstractSelectable
- addCookie(String, String) - Method in class us.codecraft.webmagic.Request
- addCookie(String, String) - Method in class us.codecraft.webmagic.Site
-
Add a cookie with domain
Site.getDomain()
- addCookie(String, String, String) - Method in class us.codecraft.webmagic.Site
-
Add a cookie with specific domain.
- addHeader(String, String) - Method in class us.codecraft.webmagic.Request
- addHeader(String, String) - Method in class us.codecraft.webmagic.Site
-
Put an Http header for downloader.
- addPageModel(PageModelPipeline, Class...) - Method in class us.codecraft.webmagic.model.OOSpider
- addParamOption(Params, CommandLine) - Method in class us.codecraft.webmagic.scripts.config.CommandLineOption
- addParamOptionIfInCommandLine(Params, CommandLine) - Method in class us.codecraft.webmagic.scripts.config.CommandLineOption
- addPipeline(Pipeline) - Method in class us.codecraft.webmagic.Spider
-
add a pipeline for Spider
- addRequest(Request...) - Method in class us.codecraft.webmagic.Spider
-
Add urls with information to crawl.
- addSubPageProcessor(SubPageProcessor) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- addSubPipeline(SubPipeline) - Method in class us.codecraft.webmagic.handler.CompositePipeline
- addTargetRequest(String) - Method in class us.codecraft.webmagic.Page
-
add url to fetch
- addTargetRequest(Request) - Method in class us.codecraft.webmagic.Page
-
add requests to fetch
- addTargetRequests(Iterable<String>) - Method in class us.codecraft.webmagic.Page
-
add urls to fetch
- addTargetRequests(Iterable<String>, long) - Method in class us.codecraft.webmagic.Page
-
add urls to fetch
- addUrl(String...) - Method in class us.codecraft.webmagic.Spider
-
Add urls to crawl.
- AfterExtractor - Interface in us.codecraft.webmagic.model
-
Interface to be implemented by page models that need to do something after fields are extracted.
- afterProcess(Page) - Method in interface us.codecraft.webmagic.model.AfterExtractor
- afterProcess(Page) - Method in class us.codecraft.webmagic.model.samples.DianpingFtlDataScanner
- afterProcess(Page) - Method in class us.codecraft.webmagic.model.samples.OschinaAnswer
- AlexanderMcqueenGoodsProcessor - Class in us.codecraft.webmagic.samples
- AlexanderMcqueenGoodsProcessor() - Constructor for class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
- all() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- all() - Method in interface us.codecraft.webmagic.selector.Selectable
-
multi string result
- AmanzonPageProcessor - Class in us.codecraft.webmagic.samples
- AmanzonPageProcessor() - Constructor for class us.codecraft.webmagic.samples.AmanzonPageProcessor
- and(Selector...) - Static method in class us.codecraft.webmagic.selector.Selectors
- And - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
All extractors will be arranged as a pipeline.
- AndSelector - Class in us.codecraft.webmagic.selector
-
All selectors will be arranged as a pipeline.
- AndSelector(List<Selector>) - Constructor for class us.codecraft.webmagic.selector.AndSelector
- AndSelector(Selector...) - Constructor for class us.codecraft.webmagic.selector.AndSelector
- AngularJSProcessor - Class in us.codecraft.webmagic.samples
- AngularJSProcessor() - Constructor for class us.codecraft.webmagic.samples.AngularJSProcessor
- AppStore - Class in us.codecraft.webmagic.example
- AppStore() - Constructor for class us.codecraft.webmagic.example.AppStore
B
- BaiduBaike - Class in us.codecraft.webmagic.example
- BaiduBaike() - Constructor for class us.codecraft.webmagic.example.BaiduBaike
- BaiduBaikePageProcessor - Class in us.codecraft.webmagic.processor.example
- BaiduBaikePageProcessor() - Constructor for class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
- BaiduNews - Class in us.codecraft.webmagic.model.samples
- BaiduNews() - Constructor for class us.codecraft.webmagic.model.samples.BaiduNews
- BaseElementSelector - Class in us.codecraft.webmagic.selector
- BaseElementSelector() - Constructor for class us.codecraft.webmagic.selector.BaseElementSelector
- BaseSelectorUtils - Class in us.codecraft.webmagic.utils
- BaseSelectorUtils() - Constructor for class us.codecraft.webmagic.utils.BaseSelectorUtils
- basicClassDetector - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- BasicClassDetector - Interface in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter<T> - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- BasicTypeFormatter.BooleanFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.ByteFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.CharactorFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.DoubleFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.FloatFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.IntegerFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.LongFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.ShortFormatter - Class in us.codecraft.webmagic.model.formatter
- basicTypeFormatters - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- Blog - Interface in us.codecraft.webmagic.model.samples
- BloomFilterDuplicateRemover - Class in us.codecraft.webmagic.scheduler
-
BloomFilterDuplicateRemover for huge number of urls.
- BloomFilterDuplicateRemover(int) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- BloomFilterDuplicateRemover(int, double) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- BooleanFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
- build() - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
- build() - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
- ByteFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
C
- canonicalizeUrl(String, String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
-
canonicalizeUrl
Borrowed from Jsoup. - CharactorFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
- CharsetUtils - Class in us.codecraft.webmagic.utils
- checkAndMakeParentDirecotry(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
- checkIfRunning() - Method in class us.codecraft.webmagic.Spider
- ClassUtils - Class in us.codecraft.webmagic.utils
- ClassUtils() - Constructor for class us.codecraft.webmagic.utils.ClassUtils
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
- clazz() - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
- clazz() - Method in class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
- clearPipeline() - Method in class us.codecraft.webmagic.Spider
-
clear the pipelines set
- close() - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
- close() - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- close() - Method in class us.codecraft.webmagic.Spider
- CODE_200 - Static variable in class us.codecraft.webmagic.utils.HttpConstant.StatusCode
- CollectorPageModelPipeline<T> - Class in us.codecraft.webmagic.pipeline
- CollectorPageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
- CollectorPipeline<T> - Interface in us.codecraft.webmagic.pipeline
-
Pipeline that can collect and store results.
- combine(MultiPageModel) - Method in class us.codecraft.webmagic.model.samples.News163
- combine(MultiPageModel) - Method in interface us.codecraft.webmagic.MultiPageModel
-
Combine multiPageModels to a whole object.
- ComboExtract - Annotation Type in us.codecraft.webmagic.model.annotation
-
Combo 'ExtractBy' extractor with and/or operator.
- ComboExtract.Op - Enum in us.codecraft.webmagic.model.annotation
- ComboExtract.Source - Enum in us.codecraft.webmagic.model.annotation
-
types of source for extracting.
- CommandLineOption - Class in us.codecraft.webmagic.scripts.config
- CommandLineOption(char) - Constructor for class us.codecraft.webmagic.scripts.config.CommandLineOption
- compareLong(long, long) - Static method in class us.codecraft.webmagic.utils.NumberUtils
- CompositePageProcessor - Class in us.codecraft.webmagic.handler
- CompositePageProcessor(Site) - Constructor for class us.codecraft.webmagic.handler.CompositePageProcessor
- CompositePipeline - Class in us.codecraft.webmagic.handler
- CompositePipeline() - Constructor for class us.codecraft.webmagic.handler.CompositePipeline
- configLogger(String) - Static method in class us.codecraft.webmagic.scripts.config.ConfigLogger
-
Log the config parameter.
- ConfigLogger - Class in us.codecraft.webmagic.scripts.config
- ConfigLogger() - Constructor for class us.codecraft.webmagic.scripts.config.ConfigLogger
- ConfigurablePageProcessor - Class in us.codecraft.webmagic.configurable
- ConfigurablePageProcessor(Site, List<ExtractRule>) - Constructor for class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
- CONNECT - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
- ConsolePageModelPipeline - Class in us.codecraft.webmagic.model
-
Print page model in console.
Usually used in test. - ConsolePageModelPipeline() - Constructor for class us.codecraft.webmagic.model.ConsolePageModelPipeline
- ConsolePipeline - Class in us.codecraft.webmagic.pipeline
-
Write results in console.
Usually used in test. - ConsolePipeline() - Constructor for class us.codecraft.webmagic.pipeline.ConsolePipeline
- ContentType() - Constructor for class us.codecraft.webmagic.model.HttpRequestBody.ContentType
- convert(String, ObjectFormatter, Logger) - Method in class us.codecraft.webmagic.model.fields.PageField
- convert(Request, Site, Proxy) - Method in class us.codecraft.webmagic.downloader.HttpUriRequestConverter
- convertHeaders(Header[]) - Static method in class us.codecraft.webmagic.utils.HttpClientUtils
- convertToRequests(Collection<String>) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- convertToUrls(Collection<Request>) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- CountableThreadPool - Class in us.codecraft.webmagic.thread
- CountableThreadPool(int) - Constructor for class us.codecraft.webmagic.thread.CountableThreadPool
- CountableThreadPool(int, ExecutorService) - Constructor for class us.codecraft.webmagic.thread.CountableThreadPool
- create(String) - Static method in class us.codecraft.webmagic.selector.Html
- create(String) - Static method in class us.codecraft.webmagic.selector.PlainText
- create(URI) - Static method in class us.codecraft.webmagic.proxy.Proxy
- create(PageProcessor) - Static method in class us.codecraft.webmagic.Spider
-
create a spider with pageProcessor.
- create(Site, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
- create(Site, PageModelPipeline, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
- css(String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- css(String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
select list with css selector
- css(String, String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- css(String, String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
select list with css selector
- Css - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- Css - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
- CssSelector - Class in us.codecraft.webmagic.selector
-
CSS selector.
- CssSelector(String) - Constructor for class us.codecraft.webmagic.selector.CssSelector
- CssSelector(String, String) - Constructor for class us.codecraft.webmagic.selector.CssSelector
- custom() - Static method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
- custom(byte[], String, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
- CustomRedirectStrategy - Class in us.codecraft.webmagic.downloader
-
支持post 302跳转策略实现类 HttpClient默认跳转:httpClientBuilder.setRedirectStrategy(new LaxRedirectStrategy()); 上述代码在post/redirect/post这种情况下不会传递原有请求的数据信息。所以参考了下SeimiCrawler这个项目的重定向策略。 原代码地址:https://github.com/zhegexiaohuozi/SeimiCrawler/blob/master/project/src/main/java/cn/wanghaomiao/seimi/http/hc/SeimiRedirectStrategy.java
- CustomRedirectStrategy() - Constructor for class us.codecraft.webmagic.downloader.CustomRedirectStrategy
- CYCLE_TRIED_TIMES - Static variable in class us.codecraft.webmagic.Request
D
- DateFormatter - Class in us.codecraft.webmagic.model.formatter
- DateFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.DateFormatter
- DEFAULT_CLAZZ - Static variable in class us.codecraft.webmagic.utils.MultiKeyMapBase
- DEFAULT_FORMATTER - Static variable in annotation type us.codecraft.webmagic.model.annotation.Formatter
- DEFAULT_PATTERN - Static variable in class us.codecraft.webmagic.model.formatter.DateFormatter
- DefaultSource() - Constructor for class us.codecraft.webmagic.model.sources.Source.DefaultSource
- DelayQueueScheduler - Class in us.codecraft.webmagic.samples.scheduler
- DelayQueueScheduler(long, TimeUnit) - Constructor for class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
- DELETE - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
- deserializeRequest(String) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- destroyWhenExit - Variable in class us.codecraft.webmagic.Spider
- detectBasicClass(Class<?>) - Method in interface us.codecraft.webmagic.model.formatter.BasicClassDetector
- detectBasicClass(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- detectCharset(String, byte[]) - Static method in class us.codecraft.webmagic.utils.CharsetUtils
- DiandianBlogProcessor - Class in us.codecraft.webmagic.samples
- DiandianBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.DiandianBlogProcessor
- DianpingFtlDataScanner - Class in us.codecraft.webmagic.model.samples
- DianpingFtlDataScanner() - Constructor for class us.codecraft.webmagic.model.samples.DianpingFtlDataScanner
- DiaoyuwengProcessor - Class in us.codecraft.webmagic.samples
- DiaoyuwengProcessor() - Constructor for class us.codecraft.webmagic.samples.DiaoyuwengProcessor
- DISABLE_HTML_ENTITY_ESCAPE - Static variable in class us.codecraft.webmagic.selector.Html
-
Deprecated.
- DoubleFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
- DoubleKeyMap<K1,
K2, V> - Class in us.codecraft.webmagic.utils - DoubleKeyMap() - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
- DoubleKeyMap(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
- DoubleKeyMap(Map<K1, Map<K2, V>>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
- DoubleKeyMap(Map<K1, Map<K2, V>>, Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
-
init map with protoMapClass
- download(String) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
-
A simple method to download a url.
- download(String, String) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
-
A simple method to download a url.
- download(Request, Task) - Method in interface us.codecraft.webmagic.downloader.Downloader
-
Downloads web pages and store in Page object.
- download(Request, Task) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
- download(Request, Task) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
- download(Request, Task) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
- downloader - Variable in class us.codecraft.webmagic.Spider
- downloader(Downloader) - Method in class us.codecraft.webmagic.Spider
-
Deprecated.
- Downloader - Interface in us.codecraft.webmagic.downloader
-
Downloader is the part that downloads web pages and store in Page object.
- DuplicateRemovedScheduler - Class in us.codecraft.webmagic.scheduler
-
Remove duplicate urls and only push urls which are not duplicate.
- DuplicateRemovedScheduler() - Constructor for class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- DuplicateRemover - Interface in us.codecraft.webmagic.scheduler.component
-
Remove duplicate requests.
- DuplicateStorageRemover - Class in us.codecraft.webmagic.recover
- DuplicateStorageRemover(String) - Constructor for class us.codecraft.webmagic.recover.DuplicateStorageRemover
E
- ElementSelector - Interface in us.codecraft.webmagic.selector
-
Selector(extractor) for html elements.
- encodeIllegalCharacterInUrl(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
-
Deprecated.
- equals(Object) - Method in class us.codecraft.webmagic.proxy.Proxy
- equals(Object) - Method in class us.codecraft.webmagic.Request
- equals(Object) - Method in class us.codecraft.webmagic.Site
- execute(Runnable) - Method in class us.codecraft.webmagic.thread.CountableThreadPool
- executorService - Variable in class us.codecraft.webmagic.Spider
- exitWhenComplete - Variable in class us.codecraft.webmagic.Spider
- Experimental - Annotation Type in us.codecraft.webmagic.utils
-
Stands for features unstable.
- ExpressionType - Enum in us.codecraft.webmagic.configurable
- extractAndAddRequests(Page, boolean) - Method in class us.codecraft.webmagic.Spider
- ExtractBy - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define the extractor for field or class.
- ExtractBy.Source - Enum in us.codecraft.webmagic.model.annotation
-
types of source for extracting.
- ExtractBy.Type - Enum in us.codecraft.webmagic.model.annotation
-
types of extractor expressions
- ExtractByUrl - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define a extractor to extract data in url of current page.
- Extractor - Class in us.codecraft.webmagic.model
-
The object contains 'ExtractBy' information.
- Extractor(Selector, Source, boolean, boolean) - Constructor for class us.codecraft.webmagic.model.Extractor
- ExtractorUtils - Class in us.codecraft.webmagic.utils
-
Tools for annotation converting.
- ExtractorUtils() - Constructor for class us.codecraft.webmagic.utils.ExtractorUtils
- ExtractRule - Class in us.codecraft.webmagic.configurable
- ExtractRule() - Constructor for class us.codecraft.webmagic.configurable.ExtractRule
F
- F58PageProcesser - Class in us.codecraft.webmagic.samples
- F58PageProcesser() - Constructor for class us.codecraft.webmagic.samples.F58PageProcesser
- fail() - Static method in class us.codecraft.webmagic.Page
-
Deprecated.
- fail(Request) - Static method in class us.codecraft.webmagic.Page
- FieldExtractor - Class in us.codecraft.webmagic.model
-
Wrapper of field and extractor.
- FieldExtractor(Field, Selector, Source, boolean, boolean) - Constructor for class us.codecraft.webmagic.model.FieldExtractor
- FileCacheQueueScheduler - Class in us.codecraft.webmagic.scheduler
-
Store urls and cursor in files so that a Spider can resume the status when shutdown.
- FileCacheQueueScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- FilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
-
Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name. - FilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
-
new JsonFilePageModelPipeline with default path "/data/webmagic/"
- FilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
- FilePersistentBase - Class in us.codecraft.webmagic.utils
-
Base object of file persistence.
- FilePersistentBase() - Constructor for class us.codecraft.webmagic.utils.FilePersistentBase
- FilePipeline - Class in us.codecraft.webmagic.pipeline
-
Store results in files.
- FilePipeline() - Constructor for class us.codecraft.webmagic.pipeline.FilePipeline
-
create a FilePipeline with default path"/data/webmagic/"
- FilePipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.FilePipeline
- fixIllegalCharacterInUrl(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- FloatFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
- form(Map<String, Object>, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
- FORM - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
- format(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- format(String) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
- format(String) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
- format(String) - Method in class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
- formatter() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
-
If there are more than one formatter for a class, just specify the implement.
- Formatter - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define how the result string is convert to an object for field.
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
- from(String) - Static method in class us.codecraft.webmagic.utils.RequestUtils
- from(Proxy...) - Static method in class us.codecraft.webmagic.proxy.SimpleProxyProvider
- fromJson(String, Class<T>) - Method in class us.codecraft.webmagic.recover.MmapQueueScheduler
- fromValue(int) - Static method in enum us.codecraft.webmagic.Spider.Status
G
- get() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- get() - Method in interface us.codecraft.webmagic.selector.Selectable
-
single string result
- get(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
- get(String) - Method in class us.codecraft.webmagic.ResultItems
- get(String) - Method in class us.codecraft.webmagic.SimpleHttpClient
- get(String) - Method in class us.codecraft.webmagic.Spider
- get(String, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
- get(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- get(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- get(Page) - Method in class us.codecraft.webmagic.model.PageMapper
- get(Request) - Method in class us.codecraft.webmagic.SimpleHttpClient
- get(Request, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
- GET - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
- getAcceptStatCode() - Method in class us.codecraft.webmagic.Site
-
get acceptStatCode
- getAll() - Method in class us.codecraft.webmagic.ResultItems
- getAll(Collection<String>) - Method in class us.codecraft.webmagic.Spider
-
Download urls synchronizing.
- getAll(Page) - Method in class us.codecraft.webmagic.model.PageMapper
- getAllCookies() - Method in class us.codecraft.webmagic.Site
-
get cookies of all domains
- getAllOptions() - Static method in class us.codecraft.webmagic.scripts.config.CommandLineOption
- getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepo
- getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getAuthor() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- getAuthor() - Method in class us.codecraft.webmagic.samples.GithubRepo
- getBody() - Method in class us.codecraft.webmagic.model.HttpRequestBody
- getBytes() - Method in class us.codecraft.webmagic.Page
- getCharset() - Method in class us.codecraft.webmagic.Page
- getCharset() - Method in class us.codecraft.webmagic.Request
- getCharset() - Method in class us.codecraft.webmagic.Site
-
get charset set manually
- getCharset(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- getClient(Site) - Method in class us.codecraft.webmagic.downloader.HttpClientGenerator
- getCollected() - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
- getCollected() - Method in interface us.codecraft.webmagic.pipeline.CollectorPipeline
-
Get all results collected.
- getCollected() - Method in class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
- getCollectorPipeline() - Method in class us.codecraft.webmagic.model.OOSpider
- getCollectorPipeline() - Method in class us.codecraft.webmagic.Spider
- getContent() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getContent() - Method in interface us.codecraft.webmagic.model.samples.Blog
- getContent() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
- getContent() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
- getContent() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
- getContentType() - Method in class us.codecraft.webmagic.model.HttpRequestBody
- getCookies() - Method in class us.codecraft.webmagic.Request
- getCookies() - Method in class us.codecraft.webmagic.Site
-
get cookies
- getCycleRetryTimes() - Method in class us.codecraft.webmagic.Site
-
When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.
- getDate() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getDefaultCharset() - Method in class us.codecraft.webmagic.Site
-
The default charset if charset detected failed.
- getDefineFile() - Method in class us.codecraft.webmagic.scripts.languages.Language
- getDescription() - Method in class us.codecraft.webmagic.example.BaiduBaike
- getDescription() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
- getDocument() - Method in class us.codecraft.webmagic.selector.Html
- getDomain() - Method in class us.codecraft.webmagic.Site
-
get domain
- getDomain(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- getDownloader() - Method in class us.codecraft.webmagic.Request
- getDuplicateRemover() - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- getElements() - Method in class us.codecraft.webmagic.selector.Html
- getElements() - Method in class us.codecraft.webmagic.selector.HtmlNode
- getEncoding() - Method in class us.codecraft.webmagic.model.HttpRequestBody
- getEngine() - Method in class us.codecraft.webmagic.scripts.ScriptEnginePool
- getEngineName() - Method in class us.codecraft.webmagic.scripts.languages.Language
- getErrorCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- getErrorPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getErrorPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getErrorPages() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getErrorPages() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getErrorUrls() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- getExpressionParams() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getExpressionType() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getExpressionValue() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getExtra(String) - Method in class us.codecraft.webmagic.Request
- getExtras() - Method in class us.codecraft.webmagic.Request
- getFieldName() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getFieldsIncludeSuperClass(Class) - Static method in class us.codecraft.webmagic.utils.ClassUtils
- getFile(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
- getFirstNoLoopbackIPAddresses() - Static method in class us.codecraft.webmagic.utils.IPUtils
- getFirstSourceText() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- getFork() - Method in class us.codecraft.webmagic.example.GithubRepo
- getFork() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getFork() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- getGatherFile() - Method in class us.codecraft.webmagic.scripts.languages.Language
- getHeaders() - Method in class us.codecraft.webmagic.Page
- getHeaders() - Method in class us.codecraft.webmagic.Request
- getHeaders() - Method in class us.codecraft.webmagic.Site
- getHost() - Method in class us.codecraft.webmagic.proxy.Proxy
- getHost(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- getHtml() - Method in class us.codecraft.webmagic.Page
-
get html content of page
- getHttpClientContext() - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
- getHttpUriRequest() - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
- getItemKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getJson() - Method in class us.codecraft.webmagic.Page
-
get json content of page
- getJsonPathStr() - Method in class us.codecraft.webmagic.selector.JsonPathSelector
- getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepo
- getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getLanguage() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- getLeftPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getLeftPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- getLeftRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.MonitorableScheduler
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getMethod() - Method in class us.codecraft.webmagic.Request
-
The http method of the request.
- getName() - Method in class us.codecraft.webmagic.example.BaiduBaike
- getName() - Method in class us.codecraft.webmagic.example.GithubRepo
- getName() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getName() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
- getName() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- getName() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getName() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getName() - Method in class us.codecraft.webmagic.samples.GithubRepo
- getOtherPages() - Method in class us.codecraft.webmagic.model.samples.News163
- getOtherPages() - Method in interface us.codecraft.webmagic.MultiPageModel
-
other pages to be extracted.
It is used to judge whether an object contains more than one page, and whether the pages of the object are all extracted. - getPage() - Method in class us.codecraft.webmagic.model.samples.News163
- getPage() - Method in interface us.codecraft.webmagic.MultiPageModel
-
page is the identifier of a page in pages for one object.
- getPage(Request) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
- getPageCount() - Method in class us.codecraft.webmagic.Spider
-
Get page count downloaded by spider.
- getPageKey() - Method in class us.codecraft.webmagic.model.samples.News163
- getPageKey() - Method in interface us.codecraft.webmagic.MultiPageModel
-
Page key is the identifier for the object.
- getPagePerSecond() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getPagePerSecond() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getPassword() - Method in class us.codecraft.webmagic.proxy.Proxy
- getPath() - Method in class us.codecraft.webmagic.utils.FilePersistentBase
- getPort() - Method in class us.codecraft.webmagic.proxy.Proxy
- getPriority() - Method in class us.codecraft.webmagic.Request
- getProxy(Request, Task) - Method in interface us.codecraft.webmagic.proxy.ProxyProvider
-
Returns a proxy for the request.
- getProxy(Request, Task) - Method in class us.codecraft.webmagic.proxy.SimpleProxyProvider
- getProxy(Task) - Method in interface us.codecraft.webmagic.proxy.ProxyProvider
-
Deprecated.Use
ProxyProvider.getProxy(Request, Task)
instead. - getQueueKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getRawText() - Method in class us.codecraft.webmagic.Page
- getReadme() - Method in class us.codecraft.webmagic.example.GithubRepo
- getReadme() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- getReadme() - Method in class us.codecraft.webmagic.samples.GithubRepo
- getRedirect(HttpRequest, HttpResponse, HttpContext) - Method in class us.codecraft.webmagic.downloader.CustomRedirectStrategy
- getRequest() - Method in class us.codecraft.webmagic.Page
-
get request of current page
- getRequest() - Method in class us.codecraft.webmagic.ResultItems
- getRequestBody() - Method in class us.codecraft.webmagic.Request
- getResultItems() - Method in class us.codecraft.webmagic.Page
- getRetrySleepTime() - Method in class us.codecraft.webmagic.Site
- getRetryTimes() - Method in class us.codecraft.webmagic.Site
-
Get retry times immediately when download fail, 0 by default.
- getScheduler() - Method in class us.codecraft.webmagic.Spider
- getScheduler() - Method in class us.codecraft.webmagic.SpiderScheduler
- getScheme() - Method in class us.codecraft.webmagic.proxy.Proxy
- getSelector() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getSelector(ExtractBy) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
- getSelectors(ExtractBy[]) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
- getSetKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getSite() - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
- getSite() - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
- getSite() - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- getSite() - Method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
- getSite() - Method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
- getSite() - Method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
- getSite() - Method in interface us.codecraft.webmagic.processor.PageProcessor
-
Returns the site settings.
- getSite() - Method in class us.codecraft.webmagic.processor.SimplePageProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.AngularJSProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.DiandianBlogProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.F58PageProcesser
- getSite() - Method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.HuxiuProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.KaichibaProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.MamacnPageProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.MeicanProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.NjuBBSProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.QzoneBlogProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.SinaBlogProcessor
- getSite() - Method in class us.codecraft.webmagic.samples.TianyaPageProcesser
- getSite() - Method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
- getSite() - Method in class us.codecraft.webmagic.scripts.ScriptProcessor
- getSite() - Method in class us.codecraft.webmagic.Spider
- getSite() - Method in interface us.codecraft.webmagic.Task
-
site of a task
- getSleepTime() - Method in class us.codecraft.webmagic.Site
-
Get the interval between the processing of two pages.
Time unit is milliseconds. - getSourceTexts() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- getSourceTexts() - Method in class us.codecraft.webmagic.selector.HtmlNode
- getSourceTexts() - Method in class us.codecraft.webmagic.selector.PlainText
- getSpiderListeners() - Method in class us.codecraft.webmagic.Spider
- getSpiderStatuses() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
- getSpiderStatusMBean(Spider, SpiderMonitor.MonitorSpiderListener) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
- getStar() - Method in class us.codecraft.webmagic.example.GithubRepo
- getStar() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getStar() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- getStartTime() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getStartTime() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getStartTime() - Method in class us.codecraft.webmagic.Spider
- getStatus() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getStatus() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getStatus() - Method in class us.codecraft.webmagic.Spider
-
Get running status by spider.
- getStatusCode() - Method in class us.codecraft.webmagic.Page
- getSuccessCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- getSuccessPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getSuccessPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getTags() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getTags() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
- getTargetRequests() - Method in class us.codecraft.webmagic.Page
- getText(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.DefaultSource
- getText(Page, String, boolean, FieldExtractor) - Method in interface us.codecraft.webmagic.model.sources.Source
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawHtml
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawText
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.SelectedHtml
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.Url
- getText(Page, String, boolean, FieldExtractor) - Static method in class us.codecraft.webmagic.model.sources.SourceTextExtractor
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.DefaultSource
- getTextList(Page, String, boolean, FieldExtractor) - Method in interface us.codecraft.webmagic.model.sources.Source
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawHtml
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawText
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.SelectedHtml
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.Url
- getThread() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getThread() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getThreadAlive() - Method in class us.codecraft.webmagic.Spider
-
Get thread count which is running
- getThreadAlive() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
- getThreadNum() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
- getTimeOut() - Method in class us.codecraft.webmagic.Site
- getTitle() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getTitle() - Method in interface us.codecraft.webmagic.model.samples.Blog
- getTitle() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
- getTitle() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
- getTitle() - Method in class us.codecraft.webmagic.model.samples.OschinaBlog
- getTotalPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getTotalPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.recover.DuplicateStorageRemover
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- getTotalRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.component.DuplicateRemover
-
Get TotalRequestsCount for monitor.
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- getTotalRequestsCount(Task) - Method in interface us.codecraft.webmagic.scheduler.MonitorableScheduler
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getUrl() - Method in class us.codecraft.webmagic.example.GithubRepo
- getUrl() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getUrl() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- getUrl() - Method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
- getUrl() - Method in class us.codecraft.webmagic.Page
-
get url of current page
- getUrl() - Method in class us.codecraft.webmagic.Request
- getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
- getUserAgent() - Method in class us.codecraft.webmagic.Site
-
get user agent
- getUsername() - Method in class us.codecraft.webmagic.proxy.Proxy
- getUUID() - Method in class us.codecraft.webmagic.Spider
- getUUID() - Method in interface us.codecraft.webmagic.Task
-
unique id for a task.
- GithubRepo - Class in us.codecraft.webmagic.example
- GithubRepo - Class in us.codecraft.webmagic.model.samples
- GithubRepo - Class in us.codecraft.webmagic.samples
- GithubRepo() - Constructor for class us.codecraft.webmagic.example.GithubRepo
- GithubRepo() - Constructor for class us.codecraft.webmagic.model.samples.GithubRepo
- GithubRepo() - Constructor for class us.codecraft.webmagic.samples.GithubRepo
- GithubRepoApi - Class in us.codecraft.webmagic.example
- GithubRepoApi() - Constructor for class us.codecraft.webmagic.example.GithubRepoApi
- GithubRepoPageMapper - Class in us.codecraft.webmagic.example
- GithubRepoPageMapper() - Constructor for class us.codecraft.webmagic.example.GithubRepoPageMapper
- GithubRepoPageProcessor - Class in us.codecraft.webmagic.processor.example
- GithubRepoPageProcessor - Class in us.codecraft.webmagic.samples
- GithubRepoPageProcessor() - Constructor for class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
- GithubRepoPageProcessor() - Constructor for class us.codecraft.webmagic.samples.GithubRepoPageProcessor
H
- handleResponse(Request, String, HttpResponse, Task) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
- hasAttribute() - Method in class us.codecraft.webmagic.selector.BaseElementSelector
- hasAttribute() - Method in class us.codecraft.webmagic.selector.CssSelector
- hasAttribute() - Method in class us.codecraft.webmagic.selector.LinksSelector
- hasAttribute() - Method in class us.codecraft.webmagic.selector.XpathSelector
- hashCode() - Method in class us.codecraft.webmagic.proxy.Proxy
- hashCode() - Method in class us.codecraft.webmagic.Request
- hashCode() - Method in class us.codecraft.webmagic.Site
- HashSetDuplicateRemover - Class in us.codecraft.webmagic.scheduler.component
- HashSetDuplicateRemover() - Constructor for class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
- HasKey - Interface in us.codecraft.webmagic.model
-
Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object. - HEAD - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
- Header() - Constructor for class us.codecraft.webmagic.utils.HttpConstant.Header
- HelpUrl - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define the 'help' url patterns for class.
- Html - Class in us.codecraft.webmagic.selector
-
Selectable html.
- Html(String) - Constructor for class us.codecraft.webmagic.selector.Html
- Html(String, String) - Constructor for class us.codecraft.webmagic.selector.Html
- Html(Document) - Constructor for class us.codecraft.webmagic.selector.Html
- HtmlNode - Class in us.codecraft.webmagic.selector
- HtmlNode() - Constructor for class us.codecraft.webmagic.selector.HtmlNode
- HtmlNode(List<Element>) - Constructor for class us.codecraft.webmagic.selector.HtmlNode
- HttpClientDownloader - Class in us.codecraft.webmagic.downloader
-
The http downloader based on HttpClient.
- HttpClientDownloader() - Constructor for class us.codecraft.webmagic.downloader.HttpClientDownloader
- HttpClientGenerator - Class in us.codecraft.webmagic.downloader
- HttpClientGenerator() - Constructor for class us.codecraft.webmagic.downloader.HttpClientGenerator
- HttpClientRequestContext - Class in us.codecraft.webmagic.downloader
- HttpClientRequestContext() - Constructor for class us.codecraft.webmagic.downloader.HttpClientRequestContext
- HttpClientUtils - Class in us.codecraft.webmagic.utils
- HttpClientUtils() - Constructor for class us.codecraft.webmagic.utils.HttpClientUtils
- HttpConstant - Class in us.codecraft.webmagic.utils
-
Some constants of Http protocal.
- HttpConstant() - Constructor for class us.codecraft.webmagic.utils.HttpConstant
- HttpConstant.Header - Class in us.codecraft.webmagic.utils
- HttpConstant.Method - Class in us.codecraft.webmagic.utils
- HttpConstant.StatusCode - Class in us.codecraft.webmagic.utils
- HttpRequestBody - Class in us.codecraft.webmagic.model
- HttpRequestBody() - Constructor for class us.codecraft.webmagic.model.HttpRequestBody
- HttpRequestBody(byte[], String, String) - Constructor for class us.codecraft.webmagic.model.HttpRequestBody
- HttpRequestBody.ContentType - Class in us.codecraft.webmagic.model
- HttpUriRequestConverter - Class in us.codecraft.webmagic.downloader
- HttpUriRequestConverter() - Constructor for class us.codecraft.webmagic.downloader.HttpUriRequestConverter
- HuxiuProcessor - Class in us.codecraft.webmagic.samples
- HuxiuProcessor() - Constructor for class us.codecraft.webmagic.samples.HuxiuProcessor
I
- InfoQMiniBookProcessor - Class in us.codecraft.webmagic.samples
- InfoQMiniBookProcessor() - Constructor for class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
- Init - Enum constant in enum us.codecraft.webmagic.Spider.Status
- initComponent() - Method in class us.codecraft.webmagic.Spider
- INITIAL_CAPACITY - Static variable in class us.codecraft.webmagic.scheduler.PriorityScheduler
- initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
- initParam(String[]) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
- initParam(String[]) - Method in class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
- instance() - Static method in class us.codecraft.webmagic.monitor.SpiderMonitor
- IntegerFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
- IPUtils - Class in us.codecraft.webmagic.utils
- IPUtils() - Constructor for class us.codecraft.webmagic.utils.IPUtils
- isBinaryContent() - Method in class us.codecraft.webmagic.Request
- isDisableCookieManagement() - Method in class us.codecraft.webmagic.Site
- isDownloadSuccess() - Method in class us.codecraft.webmagic.Page
- isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.recover.DuplicateStorageRemover
- isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- isDuplicate(Request, Task) - Method in interface us.codecraft.webmagic.scheduler.component.DuplicateRemover
-
Check whether the request is duplicate.
- isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
- isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- isExitWhenComplete() - Method in class us.codecraft.webmagic.Spider
- isMulti() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- isMulti() - Method in class us.codecraft.webmagic.model.Extractor
- isNotNull() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- isNotNull() - Method in class us.codecraft.webmagic.model.Extractor
- isShutdown() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
- isSkip() - Method in class us.codecraft.webmagic.ResultItems
-
Whether to skip the result.
Result which is skipped will not be processed by Pipeline. - isSpawnUrl() - Method in class us.codecraft.webmagic.Spider
- isUseGzip() - Method in class us.codecraft.webmagic.Site
- IteyeBlog - Class in us.codecraft.webmagic.model.samples
- IteyeBlog() - Constructor for class us.codecraft.webmagic.model.samples.IteyeBlog
- IteyeBlogProcessor - Class in us.codecraft.webmagic.samples
- IteyeBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.IteyeBlogProcessor
J
- Javascript - Class in us.codecraft.webmagic.scripts.languages
- Javascript() - Constructor for class us.codecraft.webmagic.scripts.languages.Javascript
- JaxpSelectorUtils - Class in us.codecraft.webmagic.selector
- JokejiModel - Class in us.codecraft.webmagic.model.samples
- JokejiModel() - Constructor for class us.codecraft.webmagic.model.samples.JokejiModel
- JRuby - Class in us.codecraft.webmagic.scripts.languages
- JRuby() - Constructor for class us.codecraft.webmagic.scripts.languages.JRuby
- json(String, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
- Json - Class in us.codecraft.webmagic.selector
-
parse json
- Json(String) - Constructor for class us.codecraft.webmagic.selector.Json
- Json(List<String>) - Constructor for class us.codecraft.webmagic.selector.Json
- JSON - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
- JsonFilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
-
Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name. - JsonFilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
-
new JsonFilePageModelPipeline with default path "/data/webmagic/"
- JsonFilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
- JsonFilePipeline - Class in us.codecraft.webmagic.pipeline
-
Store results to files in JSON format.
- JsonFilePipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
-
new JsonFilePageModelPipeline with default path "/data/webmagic/"
- JsonFilePipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
- jsonPath(String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- jsonPath(String) - Method in class us.codecraft.webmagic.selector.Json
- jsonPath(String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
extract by JSON Path expression
- JsonPath - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- JsonPath - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
- JsonPathSelector - Class in us.codecraft.webmagic.selector
-
JsonPath selector.
Used to extract content from JSON. - JsonPathSelector(String) - Constructor for class us.codecraft.webmagic.selector.JsonPathSelector
- Jython - Class in us.codecraft.webmagic.scripts.languages
- Jython() - Constructor for class us.codecraft.webmagic.scripts.languages.Jython
K
- KaichibaProcessor - Class in us.codecraft.webmagic.samples
- KaichibaProcessor() - Constructor for class us.codecraft.webmagic.samples.KaichibaProcessor
- key() - Method in class us.codecraft.webmagic.example.GithubRepo
- key() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- key() - Method in interface us.codecraft.webmagic.model.HasKey
- key() - Method in class us.codecraft.webmagic.model.samples.GithubRepo
- Kr36NewsModel - Class in us.codecraft.webmagic.model.samples
- Kr36NewsModel() - Constructor for class us.codecraft.webmagic.model.samples.Kr36NewsModel
L
- language(Language) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
- Language - Class in us.codecraft.webmagic.scripts.languages
- Language(String, String, String) - Constructor for class us.codecraft.webmagic.scripts.languages.Language
- LevelLimitScheduler - Class in us.codecraft.webmagic.samples.scheduler
- LevelLimitScheduler(int) - Constructor for class us.codecraft.webmagic.samples.scheduler.LevelLimitScheduler
- links() - Method in class us.codecraft.webmagic.selector.HtmlNode
- links() - Method in class us.codecraft.webmagic.selector.PlainText
- links() - Method in interface us.codecraft.webmagic.selector.Selectable
-
select all links
- LinksSelector - Class in us.codecraft.webmagic.selector
-
Links selector based on jsoup.
- LinksSelector() - Constructor for class us.codecraft.webmagic.selector.LinksSelector
- logger - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
- logger - Variable in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- logger - Variable in class us.codecraft.webmagic.Spider
- LongFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
M
- main(String[]) - Static method in class us.codecraft.webmagic.example.AppStore
- main(String[]) - Static method in class us.codecraft.webmagic.example.BaiduBaike
- main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepo
- main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoApi
- main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoPageMapper
- main(String[]) - Static method in class us.codecraft.webmagic.example.MonitorExample
- main(String[]) - Static method in class us.codecraft.webmagic.example.OschinaBlog
- main(String...) - Static method in class us.codecraft.webmagic.example.PatternProcessorExample
- main(String[]) - Static method in class us.codecraft.webmagic.main.QuickStarter
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.BaiduNews
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.DianpingFtlDataScanner
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.GithubRepo
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.IteyeBlog
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.JokejiModel
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.Kr36NewsModel
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.News163
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.OschinaAnswer
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.OschinaBlog
- main(String[]) - Static method in class us.codecraft.webmagic.model.samples.QQMeishi
- main(String[]) - Static method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.recover.RecoverSample
- main(String[]) - Static method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.AngularJSProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.F58PageProcesser
- main(String[]) - Static method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.HuxiuProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.KaichibaProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.MamacnPageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.MeicanProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.NjuBBSProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.SinaBlogProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
- main(String[]) - Static method in class us.codecraft.webmagic.scripts.ScriptConsole
- MamacnPageProcessor - Class in us.codecraft.webmagic.samples
- MamacnPageProcessor() - Constructor for class us.codecraft.webmagic.samples.MamacnPageProcessor
- match() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- match() - Method in interface us.codecraft.webmagic.selector.Selectable
-
if result exist for select
- match(Request) - Method in class us.codecraft.webmagic.handler.PatternRequestMatcher
- match(Request) - Method in interface us.codecraft.webmagic.handler.RequestMatcher
-
Check whether to process the page.
Please DO NOT change page status in this method. - me() - Static method in class us.codecraft.webmagic.Site
-
new a Site
- MeicanProcessor - Class in us.codecraft.webmagic.samples
- MeicanProcessor() - Constructor for class us.codecraft.webmagic.samples.MeicanProcessor
- Method() - Constructor for class us.codecraft.webmagic.utils.HttpConstant.Method
- MmapQueueScheduler - Class in us.codecraft.webmagic.recover
- MmapQueueScheduler(DuplicateRemover, String) - Constructor for class us.codecraft.webmagic.recover.MmapQueueScheduler
- MonitorableScheduler - Interface in us.codecraft.webmagic.scheduler
-
The scheduler whose requests can be counted for monitor.
- MonitorExample - Class in us.codecraft.webmagic.example
- MonitorExample() - Constructor for class us.codecraft.webmagic.example.MonitorExample
- monitorSpiderListener - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
- MonitorSpiderListener() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- multi - Variable in class us.codecraft.webmagic.model.Extractor
- multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
Deprecated.since 0.4.2
- multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Deprecated.since 0.4.2
- multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
-
Deprecated.since 0.4.2
- MultiKeyMapBase - Class in us.codecraft.webmagic.utils
-
multi-key map, some basic objects *
- MultiKeyMapBase() - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
- MultiKeyMapBase(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
- MultiPageModel - Interface in us.codecraft.webmagic
-
Extract an object of more than one pages, such as news and articles.
- MultiPagePipeline - Class in us.codecraft.webmagic.pipeline
-
A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page. - MultiPagePipeline() - Constructor for class us.codecraft.webmagic.pipeline.MultiPagePipeline
- MULTIPART - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
- MultipleField - Class in us.codecraft.webmagic.model.fields
- MultipleField(List<String>) - Constructor for class us.codecraft.webmagic.model.fields.MultipleField
N
- newArrayList(T...) - Static method in class us.codecraft.webmagic.utils.WMCollections
- newHashSet(T...) - Static method in class us.codecraft.webmagic.utils.WMCollections
- newInstance(String) - Static method in class us.codecraft.webmagic.selector.Xpath2Selector
- newMap() - Method in class us.codecraft.webmagic.utils.MultiKeyMapBase
- News163 - Class in us.codecraft.webmagic.model.samples
- News163() - Constructor for class us.codecraft.webmagic.model.samples.News163
- NjuBBSProcessor - Class in us.codecraft.webmagic.samples
- NjuBBSProcessor() - Constructor for class us.codecraft.webmagic.samples.NjuBBSProcessor
- NO - Enum constant in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
- NodeListToArrayList(NodeList) - Static method in class us.codecraft.webmagic.selector.JaxpSelectorUtils
- nodes() - Method in class us.codecraft.webmagic.selector.HtmlNode
- nodes() - Method in class us.codecraft.webmagic.selector.PlainText
- nodes() - Method in interface us.codecraft.webmagic.selector.Selectable
-
get all nodes
- NodeSelector - Interface in us.codecraft.webmagic.selector
-
Selector(extractor) for html node.
- nodesToStrings(List<Node>) - Static method in class us.codecraft.webmagic.selector.JaxpSelectorUtils
- nodeToString(Node) - Static method in class us.codecraft.webmagic.selector.JaxpSelectorUtils
- noNeedToRemoveDuplicate(Request) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- notNull - Variable in class us.codecraft.webmagic.model.Extractor
- notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded. - notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded. - notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
-
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded. - NumberUtils - Class in us.codecraft.webmagic.utils
- NumberUtils() - Constructor for class us.codecraft.webmagic.utils.NumberUtils
O
- ObjectFormatter<T> - Interface in us.codecraft.webmagic.model.formatter
- ObjectFormatterBuilder - Class in us.codecraft.webmagic.model.formatter
- ObjectFormatterBuilder() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
- ObjectFormatters - Class in us.codecraft.webmagic.model.formatter
- ObjectFormatters() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatters
- OneFilePipeline - Class in us.codecraft.webmagic.samples.pipeline
- OneFilePipeline() - Constructor for class us.codecraft.webmagic.samples.pipeline.OneFilePipeline
- OneFilePipeline(String) - Constructor for class us.codecraft.webmagic.samples.pipeline.OneFilePipeline
- onError(Page, Task, Throwable) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
- onError(Request) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
-
Deprecated.Use
AbstractDownloader.onError(Page, Task, Throwable)
instead. - onError(Request) - Method in class us.codecraft.webmagic.Spider
-
Deprecated.Use
Spider.onError(Request, Exception)
instead. - onError(Request) - Method in interface us.codecraft.webmagic.SpiderListener
-
Deprecated.Use
SpiderListener.onError(Request, Exception)
instead. - onError(Request, Exception) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- onError(Request, Exception) - Method in class us.codecraft.webmagic.Spider
- onError(Request, Exception) - Method in interface us.codecraft.webmagic.SpiderListener
- onError(Request, Task, Throwable) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
-
Deprecated.Use
AbstractDownloader.onError(Page, Task, Throwable)
instead. - onSuccess(Page, Task) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
- onSuccess(Request) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
-
Deprecated.Use
AbstractDownloader.onSuccess(Page, Task)
instead. - onSuccess(Request) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- onSuccess(Request) - Method in class us.codecraft.webmagic.Spider
- onSuccess(Request) - Method in interface us.codecraft.webmagic.SpiderListener
- onSuccess(Request, Task) - Method in class us.codecraft.webmagic.downloader.AbstractDownloader
-
Deprecated.Use
AbstractDownloader.onSuccess(Page, Task)
instead. - OOSpider<T> - Class in us.codecraft.webmagic.model
-
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model". - OOSpider(ModelPageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
- OOSpider(PageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
- OOSpider(Site, PageModelPipeline, Class...) - Constructor for class us.codecraft.webmagic.model.OOSpider
-
create a spider
- op() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
Combining operation of extractors.
- operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.MultipleField
- operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.PageField
- operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.SingleField
- or(Selector...) - Static method in class us.codecraft.webmagic.selector.Selectors
- Or - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
All extractors will do extracting separately,
and the results of extractors will combined as the final result. - OrSelector - Class in us.codecraft.webmagic.selector
-
All extractors will do extracting separately,
and the results of extractors will combined as the final result. - OrSelector(List<Selector>) - Constructor for class us.codecraft.webmagic.selector.OrSelector
- OrSelector(Selector...) - Constructor for class us.codecraft.webmagic.selector.OrSelector
- OschinaAnswer - Class in us.codecraft.webmagic.model.samples
- OschinaAnswer() - Constructor for class us.codecraft.webmagic.model.samples.OschinaAnswer
- OschinaBlog - Class in us.codecraft.webmagic.example
- OschinaBlog - Class in us.codecraft.webmagic.model.samples
- OschinaBlog() - Constructor for class us.codecraft.webmagic.example.OschinaBlog
- OschinaBlog() - Constructor for class us.codecraft.webmagic.model.samples.OschinaBlog
P
- Page - Class in us.codecraft.webmagic
-
Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(Iterable)
Page.addTargetRequest(String)
add urls to fetch - Page() - Constructor for class us.codecraft.webmagic.Page
- PageField - Class in us.codecraft.webmagic.model.fields
- PageField() - Constructor for class us.codecraft.webmagic.model.fields.PageField
- PageMapper<T> - Class in us.codecraft.webmagic.model
- PageMapper(Class<T>) - Constructor for class us.codecraft.webmagic.model.PageMapper
- PageModelPipeline<T> - Interface in us.codecraft.webmagic.pipeline
-
Implements PageModelPipeline to persistent your page model.
- pageProcessor - Variable in class us.codecraft.webmagic.Spider
- PageProcessor - Interface in us.codecraft.webmagic.processor
-
Interface to be implemented to customize a crawler.
- Params - Class in us.codecraft.webmagic.scripts
- Params() - Constructor for class us.codecraft.webmagic.scripts.Params
- parse(String) - Static method in class us.codecraft.webmagic.selector.Xpath2Selector
- path - Variable in class us.codecraft.webmagic.utils.FilePersistentBase
- PATH_SEPERATOR - Static variable in class us.codecraft.webmagic.utils.FilePersistentBase
- pattern - Variable in class us.codecraft.webmagic.handler.PatternRequestMatcher
-
match pattern.
- PatternProcessor - Class in us.codecraft.webmagic.handler
- PatternProcessor(String) - Constructor for class us.codecraft.webmagic.handler.PatternProcessor
- PatternProcessorExample - Class in us.codecraft.webmagic.example
-
Created with IntelliJ IDEA.
- PatternProcessorExample() - Constructor for class us.codecraft.webmagic.example.PatternProcessorExample
- PatternRequestMatcher - Class in us.codecraft.webmagic.handler
-
Created with IntelliJ IDEA.
- PatternRequestMatcher(String) - Constructor for class us.codecraft.webmagic.handler.PatternRequestMatcher
- PhantomJSDownloader - Class in us.codecraft.webmagic.downloader
-
this downloader is used to download pages which need to render the javascript
- PhantomJSDownloader() - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
- PhantomJSDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
添加新的构造函数,支持phantomjs自定义命令
- PhantomJSDownloader(String, String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
- PhantomJSPageProcessor - Class in us.codecraft.webmagic.samples
-
Created by dolphineor on 2014-11-21.
- PhantomJSPageProcessor() - Constructor for class us.codecraft.webmagic.samples.PhantomJSPageProcessor
- pipeline(Pipeline) - Method in class us.codecraft.webmagic.Spider
-
Deprecated.
- Pipeline - Interface in us.codecraft.webmagic.pipeline
-
Pipeline is the persistent and offline process part of crawler.
The interface Pipeline can be implemented to customize ways of persistent. - pipelines - Variable in class us.codecraft.webmagic.Spider
- PlainText - Class in us.codecraft.webmagic.selector
-
Selectable plain text.
Can not be selected by XPath or CSS Selector. - PlainText(String) - Constructor for class us.codecraft.webmagic.selector.PlainText
- PlainText(List<String>) - Constructor for class us.codecraft.webmagic.selector.PlainText
- poll(Spider) - Method in class us.codecraft.webmagic.SpiderScheduler
- poll(Task) - Method in class us.codecraft.webmagic.recover.MmapQueueScheduler
- poll(Task) - Method in class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- poll(Task) - Method in interface us.codecraft.webmagic.scheduler.Scheduler
-
get an url to crawl
- pool - Variable in class us.codecraft.webmagic.scheduler.RedisScheduler
- POST - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
- preParse(String) - Static method in class us.codecraft.webmagic.utils.BaseSelectorUtils
-
Jsoup/HtmlCleaner could not parse "tr" or "td" tag directly https://stackoverflow.com/questions/63607740/jsoup-couldnt-parse-tr-tag
- PriorityScheduler - Class in us.codecraft.webmagic.scheduler
-
Priority scheduler.
- PriorityScheduler() - Constructor for class us.codecraft.webmagic.scheduler.PriorityScheduler
- process(Object, Task) - Method in class us.codecraft.webmagic.model.ConsolePageModelPipeline
- process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.FilePageModelPipeline
- process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
- process(ScriptEngine, String, String, Page) - Method in class us.codecraft.webmagic.scripts.languages.Javascript
- process(ScriptEngine, String, String, Page) - Method in class us.codecraft.webmagic.scripts.languages.JRuby
- process(ScriptEngine, String, String, Page) - Method in class us.codecraft.webmagic.scripts.languages.Jython
- process(ScriptEngine, String, String, Page) - Method in class us.codecraft.webmagic.scripts.languages.Language
- process(T, Task) - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
- process(T, Task) - Method in interface us.codecraft.webmagic.pipeline.PageModelPipeline
- process(Page) - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
- process(Page) - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
- process(Page) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- process(Page) - Method in class us.codecraft.webmagic.processor.example.BaiduBaikePageProcessor
- process(Page) - Method in class us.codecraft.webmagic.processor.example.GithubRepoPageProcessor
- process(Page) - Method in class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
- process(Page) - Method in interface us.codecraft.webmagic.processor.PageProcessor
-
Processes the page, extract URLs to fetch, extract the data and store.
- process(Page) - Method in class us.codecraft.webmagic.processor.SimplePageProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.AmanzonPageProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.AngularJSProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.DiandianBlogProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.DiaoyuwengProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.F58PageProcesser
- process(Page) - Method in class us.codecraft.webmagic.samples.GithubRepoPageProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.HuxiuProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.InfoQMiniBookProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.IteyeBlogProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.KaichibaProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.MamacnPageProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.MeicanProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.NjuBBSProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.PhantomJSPageProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.QzoneBlogProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.SinaBlogProcessor
- process(Page) - Method in class us.codecraft.webmagic.samples.TianyaPageProcesser
- process(Page) - Method in class us.codecraft.webmagic.samples.ZhihuPageProcessor
- process(Page) - Method in class us.codecraft.webmagic.scripts.ScriptProcessor
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.handler.CompositePipeline
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.ConsolePipeline
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.FilePipeline
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePipeline
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.MultiPagePipeline
- process(ResultItems, Task) - Method in interface us.codecraft.webmagic.pipeline.Pipeline
-
Process extracted results.
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.samples.pipeline.OneFilePipeline
- processPage(Page) - Method in interface us.codecraft.webmagic.handler.SubPageProcessor
-
process the page, extract urls to fetch, extract the data and store
- processResult(ResultItems, Task) - Method in interface us.codecraft.webmagic.handler.SubPipeline
-
process the page, extract urls to fetch, extract the data and store
- Proxy - Class in us.codecraft.webmagic.proxy
- Proxy(String, int) - Constructor for class us.codecraft.webmagic.proxy.Proxy
- Proxy(String, int, String) - Constructor for class us.codecraft.webmagic.proxy.Proxy
- Proxy(String, int, String, String) - Constructor for class us.codecraft.webmagic.proxy.Proxy
- ProxyProvider - Interface in us.codecraft.webmagic.proxy
-
Proxy provider.
- ProxyUtils - Class in us.codecraft.webmagic.utils
-
Pooled Proxy Object
- ProxyUtils() - Constructor for class us.codecraft.webmagic.utils.ProxyUtils
- push(Request, Spider) - Method in class us.codecraft.webmagic.SpiderScheduler
- push(Request, Task) - Method in class us.codecraft.webmagic.samples.scheduler.DelayQueueScheduler
- push(Request, Task) - Method in class us.codecraft.webmagic.samples.scheduler.LevelLimitScheduler
- push(Request, Task) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- push(Request, Task) - Method in interface us.codecraft.webmagic.scheduler.Scheduler
-
add a url to fetch
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.recover.MmapQueueScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.PriorityScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.QueueScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- put(Class<? extends ObjectFormatter>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
- put(String, T) - Method in class us.codecraft.webmagic.ResultItems
- put(K1, Map<K2, V>) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- put(K1, K2, V) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- PUT - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
- putExtra(String, T) - Method in class us.codecraft.webmagic.Request
- putField(String, Object) - Method in class us.codecraft.webmagic.Page
-
store extract results
Q
- QQMeishi - Class in us.codecraft.webmagic.model.samples
- QQMeishi() - Constructor for class us.codecraft.webmagic.model.samples.QQMeishi
- QueueScheduler - Class in us.codecraft.webmagic.scheduler
-
Basic Scheduler implementation.
Store urls to fetch in LinkedBlockingQueue and remove duplicate urls by HashMap. - QueueScheduler() - Constructor for class us.codecraft.webmagic.scheduler.QueueScheduler
- QueueScheduler(int) - Constructor for class us.codecraft.webmagic.scheduler.QueueScheduler
-
Creates a
QueueScheduler
with the given (fixed) capacity. - QuickStarter - Class in us.codecraft.webmagic.main
- QuickStarter() - Constructor for class us.codecraft.webmagic.main.QuickStarter
- QzoneBlogProcessor - Class in us.codecraft.webmagic.samples
- QzoneBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.QzoneBlogProcessor
R
- RawHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
extract from the raw html
- RawHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
extract from the raw html
- RawHtml() - Constructor for class us.codecraft.webmagic.model.sources.Source.RawHtml
- RawText - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
- RawText() - Constructor for class us.codecraft.webmagic.model.sources.Source.RawText
- rebuildBloomFilter() - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- RecoverSample - Class in us.codecraft.webmagic.recover
- RecoverSample() - Constructor for class us.codecraft.webmagic.recover.RecoverSample
- RedisPriorityScheduler - Class in us.codecraft.webmagic.scheduler
-
the redis scheduler with priority
- RedisPriorityScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- RedisPriorityScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- RedisScheduler - Class in us.codecraft.webmagic.scheduler
-
Use Redis as url scheduler for distributed crawlers.
- RedisScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
- RedisScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
- REFERER - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Header
- regex(String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- regex(String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
select list with regex, default group is group 1
- regex(String) - Static method in class us.codecraft.webmagic.selector.Selectors
- regex(String, int) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- regex(String, int) - Method in interface us.codecraft.webmagic.selector.Selectable
-
select list with regex
- regex(String, int) - Static method in class us.codecraft.webmagic.selector.Selectors
- Regex - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- Regex - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
- RegexSelector - Class in us.codecraft.webmagic.selector
-
Selector in regex.
- RegexSelector(String) - Constructor for class us.codecraft.webmagic.selector.RegexSelector
-
Create a RegexSelector.
- RegexSelector(String, int) - Constructor for class us.codecraft.webmagic.selector.RegexSelector
- register(Spider...) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
-
Register spider for monitor.
- registerMBean(SpiderStatusMXBean) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
- release(ScriptEngine) - Method in class us.codecraft.webmagic.scripts.ScriptEnginePool
- remove(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- remove(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- removePadding(String) - Method in class us.codecraft.webmagic.selector.Json
-
remove padding for JSONP
- removePort(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- removeProtocol(String) - Static method in class us.codecraft.webmagic.utils.UrlUtils
- replace(String, String) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- replace(String, String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
replace with regex
- ReplacePipeline - Class in us.codecraft.webmagic.samples.pipeline
- ReplacePipeline() - Constructor for class us.codecraft.webmagic.samples.pipeline.ReplacePipeline
- ReplaceSelector - Class in us.codecraft.webmagic.selector
-
Replace selector.
- ReplaceSelector(String, String) - Constructor for class us.codecraft.webmagic.selector.ReplaceSelector
- Request - Class in us.codecraft.webmagic
-
Object contains url to crawl.
It contains some additional information. - Request() - Constructor for class us.codecraft.webmagic.Request
- Request(String) - Constructor for class us.codecraft.webmagic.Request
- RequestMatcher - Interface in us.codecraft.webmagic.handler
- RequestMatcher.MatchOther - Enum in us.codecraft.webmagic.handler
- RequestUtils - Class in us.codecraft.webmagic.utils
- RequestUtils() - Constructor for class us.codecraft.webmagic.utils.RequestUtils
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.recover.DuplicateStorageRemover
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- resetDuplicateCheck(Task) - Method in interface us.codecraft.webmagic.scheduler.component.DuplicateRemover
-
Reset duplicate check.
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.component.HashSetDuplicateRemover
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- ResultItems - Class in us.codecraft.webmagic
-
Object contains extract results.
It is contained in Page and will be processed in pipeline. - ResultItems() - Constructor for class us.codecraft.webmagic.ResultItems
- ResultItemsCollectorPipeline - Class in us.codecraft.webmagic.pipeline
- ResultItemsCollectorPipeline() - Constructor for class us.codecraft.webmagic.pipeline.ResultItemsCollectorPipeline
- returnProxy(Proxy, Page, Task) - Method in interface us.codecraft.webmagic.proxy.ProxyProvider
-
Return proxy to Provider when complete a download.
- returnProxy(Proxy, Page, Task) - Method in class us.codecraft.webmagic.proxy.SimpleProxyProvider
- run() - Method in class us.codecraft.webmagic.Spider
- runAsync() - Method in class us.codecraft.webmagic.Spider
- Running - Enum constant in enum us.codecraft.webmagic.Spider.Status
S
- scheduler - Variable in class us.codecraft.webmagic.Spider
- scheduler(Scheduler) - Method in class us.codecraft.webmagic.Spider
-
Deprecated.
- Scheduler - Interface in us.codecraft.webmagic.scheduler
-
Scheduler is the part of url management.
You can implement interface Scheduler to do: manage urls to fetch remove duplicate urls - script(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
- ScriptConsole - Class in us.codecraft.webmagic.scripts
- ScriptConsole() - Constructor for class us.codecraft.webmagic.scripts.ScriptConsole
- ScriptEnginePool - Class in us.codecraft.webmagic.scripts
- ScriptEnginePool(Language, int) - Constructor for class us.codecraft.webmagic.scripts.ScriptEnginePool
- scriptFromClassPathFile(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
- scriptFromFile(String) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
- ScriptProcessor - Class in us.codecraft.webmagic.scripts
- ScriptProcessor(Language, String, int) - Constructor for class us.codecraft.webmagic.scripts.ScriptProcessor
- ScriptProcessorBuilder - Class in us.codecraft.webmagic.scripts
- select(String) - Method in class us.codecraft.webmagic.selector.AndSelector
- select(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
- select(String) - Method in class us.codecraft.webmagic.selector.JsonPathSelector
- select(String) - Method in class us.codecraft.webmagic.selector.OrSelector
- select(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
- select(String) - Method in class us.codecraft.webmagic.selector.ReplaceSelector
- select(String) - Method in interface us.codecraft.webmagic.selector.Selector
-
Extract single result in text.
If there are more than one result, only the first will be chosen. - select(String) - Method in class us.codecraft.webmagic.selector.SmartContentSelector
- select(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- select(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
- select(Element) - Method in interface us.codecraft.webmagic.selector.ElementSelector
-
Extract single result in text.
If there are more than one result, only the first will be chosen. - select(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
- select(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
- select(Node) - Method in interface us.codecraft.webmagic.selector.NodeSelector
-
Extract single result in text.
If there are more than one result, only the first will be chosen. - select(Node) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- select(Selector) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- select(Selector) - Method in class us.codecraft.webmagic.selector.HtmlNode
- select(Selector) - Method in interface us.codecraft.webmagic.selector.Selectable
-
extract by custom selector
- select(Selector, List<String>) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- Selectable - Interface in us.codecraft.webmagic.selector
-
Selectable text.
- selectDocument(Selector) - Method in class us.codecraft.webmagic.selector.Html
- selectDocumentForList(Selector) - Method in class us.codecraft.webmagic.selector.Html
- SelectedHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
extract from the content extracted by class extractor
- SelectedHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
extract from the content extracted by class extractor
- SelectedHtml() - Constructor for class us.codecraft.webmagic.model.sources.Source.SelectedHtml
- selectElement(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
- selectElement(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
- selectElements(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
- selectElements(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
- selectElements(BaseElementSelector) - Method in class us.codecraft.webmagic.selector.HtmlNode
-
select elements
- selectGroup(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
- selectGroupList(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
- selectList(String) - Method in class us.codecraft.webmagic.selector.AndSelector
- selectList(String) - Method in class us.codecraft.webmagic.selector.BaseElementSelector
- selectList(String) - Method in class us.codecraft.webmagic.selector.JsonPathSelector
- selectList(String) - Method in class us.codecraft.webmagic.selector.OrSelector
- selectList(String) - Method in class us.codecraft.webmagic.selector.RegexSelector
- selectList(String) - Method in class us.codecraft.webmagic.selector.ReplaceSelector
- selectList(String) - Method in interface us.codecraft.webmagic.selector.Selector
-
Extract all results in text.
- selectList(String) - Method in class us.codecraft.webmagic.selector.SmartContentSelector
- selectList(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- selectList(Element) - Method in class us.codecraft.webmagic.selector.CssSelector
- selectList(Element) - Method in interface us.codecraft.webmagic.selector.ElementSelector
-
Extract all results in text.
- selectList(Element) - Method in class us.codecraft.webmagic.selector.LinksSelector
- selectList(Element) - Method in class us.codecraft.webmagic.selector.XpathSelector
- selectList(Node) - Method in interface us.codecraft.webmagic.selector.NodeSelector
-
Extract all results in text.
- selectList(Node) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- selectList(Selector) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- selectList(Selector) - Method in class us.codecraft.webmagic.selector.HtmlNode
- selectList(Selector) - Method in interface us.codecraft.webmagic.selector.Selectable
-
extract by custom selector
- selectList(Selector, List<String>) - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- selectNode(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- selectNode(Node) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- selectNodes(String) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- selectNodes(Node) - Method in class us.codecraft.webmagic.selector.Xpath2Selector
- selector - Variable in class us.codecraft.webmagic.model.Extractor
- Selector - Interface in us.codecraft.webmagic.selector
-
Selector(extractor) for text.
- Selectors - Class in us.codecraft.webmagic.selector
-
Convenient methods for selectors.
- Selectors() - Constructor for class us.codecraft.webmagic.selector.Selectors
- SeleniumDownloader - Class in us.codecraft.webmagic.downloader.selenium
-
使用Selenium调用浏览器进行渲染。目前仅支持chrome。
需要下载Selenium driver支持。 - SeleniumDownloader() - Constructor for class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
-
Constructor without any filed.
- SeleniumDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
-
新建
- serializeRequest(Request) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- setAcceptStatCode(Set<Integer>) - Method in class us.codecraft.webmagic.Site
-
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed.
{200} by default.
It is not necessarily to be set. - setAuthor(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
- setBinaryContent(boolean) - Method in class us.codecraft.webmagic.Request
- setBody(byte[]) - Method in class us.codecraft.webmagic.model.HttpRequestBody
- setBytes(byte[]) - Method in class us.codecraft.webmagic.Page
- setCharset(String) - Method in class us.codecraft.webmagic.Page
- setCharset(String) - Method in class us.codecraft.webmagic.Request
- setCharset(String) - Method in class us.codecraft.webmagic.Site
-
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header. - setContentType(String) - Method in class us.codecraft.webmagic.model.HttpRequestBody
- setCycleRetryTimes(int) - Method in class us.codecraft.webmagic.Site
-
Set cycleRetryTimes times when download fail, 0 by default.
- setDefaultCharset(String) - Method in class us.codecraft.webmagic.Site
-
Set default charset of page.
- setDisableCookieManagement(boolean) - Method in class us.codecraft.webmagic.Site
-
Downloader is supposed to store response cookie.
- setDomain(String) - Method in class us.codecraft.webmagic.Site
-
set the domain of site.
- setDownloader(Downloader) - Method in class us.codecraft.webmagic.Request
- setDownloader(Downloader) - Method in class us.codecraft.webmagic.Spider
-
set the downloader of spider
- setDownloadSuccess(boolean) - Method in class us.codecraft.webmagic.Page
- setDuplicateRemover(DuplicateRemover) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- setEmptySleepTime(long) - Method in class us.codecraft.webmagic.Spider
-
Set wait time when no url is polled.
- setEncoding(String) - Method in class us.codecraft.webmagic.model.HttpRequestBody
- setExecutorService(ExecutorService) - Method in class us.codecraft.webmagic.Spider
- setExecutorService(ExecutorService) - Method in class us.codecraft.webmagic.thread.CountableThreadPool
- setExitWhenComplete(boolean) - Method in class us.codecraft.webmagic.Spider
-
Exit when complete.
- setExpressionParams(String[]) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setExpressionType(ExpressionType) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setExpressionValue(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setExtras(Map<String, Object>) - Method in class us.codecraft.webmagic.Request
- setField(Object, FieldExtractor, Object) - Method in class us.codecraft.webmagic.model.fields.PageField
- setField(Field) - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
- setFieldName(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setHeaders(Map<String, List<String>>) - Method in class us.codecraft.webmagic.Page
- setHtml(Html) - Method in class us.codecraft.webmagic.Page
-
Deprecated.since 0.4.0 The html is parse just when first time of calling
Page.getHtml()
, so usePage.setRawText(String)
instead. - setHttpClientContext(HttpClientContext) - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
- setHttpUriRequest(HttpUriRequest) - Method in class us.codecraft.webmagic.downloader.HttpClientRequestContext
- setHttpUriRequestConverter(HttpUriRequestConverter) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
- setIsExtractLinks(boolean) - Method in class us.codecraft.webmagic.model.OOSpider
- setLanguagefromArg(String) - Method in class us.codecraft.webmagic.scripts.Params
- setMethod(String) - Method in class us.codecraft.webmagic.Request
- setMulti(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setName(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
- setNotNull(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setPath(String) - Method in class us.codecraft.webmagic.utils.FilePersistentBase
- setPipelines(List<Pipeline>) - Method in class us.codecraft.webmagic.Spider
-
set pipelines for Spider
- setPoolSize(int) - Method in class us.codecraft.webmagic.downloader.HttpClientGenerator
- setPriority(long) - Method in class us.codecraft.webmagic.Request
-
Set the priority of request for sorting.
Need a scheduler supporting priority. - setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
- setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.SimpleHttpClient
- setRawText(String) - Method in class us.codecraft.webmagic.Page
- setReadme(String) - Method in class us.codecraft.webmagic.samples.GithubRepo
- setRequest(Request) - Method in class us.codecraft.webmagic.Page
- setRequest(Request) - Method in class us.codecraft.webmagic.ResultItems
- setRequestBody(HttpRequestBody) - Method in class us.codecraft.webmagic.Request
- setRetrySleepTime(int) - Method in class us.codecraft.webmagic.Site
-
Set retry sleep times when download fail, 1000 by default.
- setRetryTimes(int) - Method in class us.codecraft.webmagic.Site
-
Set retry times when download fail, 0 by default.
- setScheduler(Scheduler) - Method in class us.codecraft.webmagic.Spider
-
set scheduler for Spider
- setScheduler(Scheduler) - Method in class us.codecraft.webmagic.SpiderScheduler
- setScheme(String) - Method in class us.codecraft.webmagic.proxy.Proxy
- setSelector(Selector) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setSite(Site) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- setSkip(boolean) - Method in class us.codecraft.webmagic.Page
- setSkip(boolean) - Method in class us.codecraft.webmagic.ResultItems
-
Set whether to skip the result.
Result which is skipped will not be processed by Pipeline. - setSleepTime(int) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
-
set sleep time to wait until load success
- setSleepTime(int) - Method in class us.codecraft.webmagic.Site
-
Set the interval between the processing of two pages.
Time unit is milliseconds. - setSpawnUrl(boolean) - Method in class us.codecraft.webmagic.Spider
-
Whether add urls extracted to download.
Add urls to download when it is true, and just download seed urls when it is false. - setSpiderListeners(List<SpiderListener>) - Method in class us.codecraft.webmagic.Spider
- setStatusCode(int) - Method in class us.codecraft.webmagic.Page
- setSubPageProcessors(SubPageProcessor...) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- setSubPipeline(SubPipeline...) - Method in class us.codecraft.webmagic.handler.CompositePipeline
- setThread(int) - Method in interface us.codecraft.webmagic.downloader.Downloader
-
Tell the downloader how many threads the spider used.
- setThread(int) - Method in class us.codecraft.webmagic.downloader.HttpClientDownloader
- setThread(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
- setThread(int) - Method in class us.codecraft.webmagic.downloader.selenium.SeleniumDownloader
- setTimeOut(int) - Method in class us.codecraft.webmagic.Site
-
set timeout for downloader in ms
- setUrl(String) - Method in class us.codecraft.webmagic.Request
- setUrl(Selectable) - Method in class us.codecraft.webmagic.Page
- setUseGzip(boolean) - Method in class us.codecraft.webmagic.Site
-
Whether use gzip.
- setUserAgent(String) - Method in class us.codecraft.webmagic.Site
-
set user agent
- setUUID(String) - Method in class us.codecraft.webmagic.Spider
-
Set an uuid for spider.
Default uuid is domain of site. - ShortFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
- shouldReserved(Request) - Method in class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
- shutdown() - Method in class us.codecraft.webmagic.thread.CountableThreadPool
- signalNewUrl() - Method in class us.codecraft.webmagic.SpiderScheduler
- SimpleHttpClient - Class in us.codecraft.webmagic
- SimpleHttpClient() - Constructor for class us.codecraft.webmagic.SimpleHttpClient
- SimpleHttpClient(Site) - Constructor for class us.codecraft.webmagic.SimpleHttpClient
- SimplePageProcessor - Class in us.codecraft.webmagic.processor
-
A simple PageProcessor.
- SimplePageProcessor(String) - Constructor for class us.codecraft.webmagic.processor.SimplePageProcessor
- SimpleProxyProvider - Class in us.codecraft.webmagic.proxy
-
A simple ProxyProvider.
- SimpleProxyProvider(List<Proxy>) - Constructor for class us.codecraft.webmagic.proxy.SimpleProxyProvider
- SinaBlogProcessor - Class in us.codecraft.webmagic.samples
- SinaBlogProcessor() - Constructor for class us.codecraft.webmagic.samples.SinaBlogProcessor
- SingleField - Class in us.codecraft.webmagic.model.fields
- SingleField(String) - Constructor for class us.codecraft.webmagic.model.fields.SingleField
- site - Variable in class us.codecraft.webmagic.Spider
- Site - Class in us.codecraft.webmagic
-
Object contains setting for crawler.
- Site() - Constructor for class us.codecraft.webmagic.Site
- sleep(int) - Method in class us.codecraft.webmagic.Spider
- smartContent() - Method in class us.codecraft.webmagic.selector.HtmlNode
- smartContent() - Static method in class us.codecraft.webmagic.selector.Selectors
- SmartContentSelector - Class in us.codecraft.webmagic.selector
-
Borrowed from https://code.google.com/p/cx-extractor/
- SmartContentSelector() - Constructor for class us.codecraft.webmagic.selector.SmartContentSelector
- source - Variable in class us.codecraft.webmagic.model.Extractor
- source() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
The source for extracting.
- source() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
The source for extracting.
- Source - Interface in us.codecraft.webmagic.model.sources
- Source.DefaultSource - Class in us.codecraft.webmagic.model.sources
- Source.RawHtml - Class in us.codecraft.webmagic.model.sources
- Source.RawText - Class in us.codecraft.webmagic.model.sources
- Source.SelectedHtml - Class in us.codecraft.webmagic.model.sources
- Source.Url - Class in us.codecraft.webmagic.model.sources
- sourceRegion() - Element in annotation type us.codecraft.webmagic.model.annotation.HelpUrl
-
Define the region for url extracting.
- sourceRegion() - Element in annotation type us.codecraft.webmagic.model.annotation.TargetUrl
-
Define the region for url extracting.
- SourceTextExtractor - Class in us.codecraft.webmagic.model.sources
- SourceTextExtractor() - Constructor for class us.codecraft.webmagic.model.sources.SourceTextExtractor
- sourceTexts - Variable in class us.codecraft.webmagic.selector.PlainText
- spawnUrl - Variable in class us.codecraft.webmagic.Spider
- spider - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
- Spider - Class in us.codecraft.webmagic
-
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline.
Every module is a field of Spider. - Spider(PageProcessor) - Constructor for class us.codecraft.webmagic.Spider
-
create a spider with pageProcessor.
- Spider.Status - Enum in us.codecraft.webmagic
- SpiderListener - Interface in us.codecraft.webmagic
-
Listener of Spider on page processing.
- SpiderMonitor - Class in us.codecraft.webmagic.monitor
- SpiderMonitor() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor
- SpiderMonitor.MonitorSpiderListener - Class in us.codecraft.webmagic.monitor
- SpiderScheduler - Class in us.codecraft.webmagic
- SpiderScheduler(Scheduler) - Constructor for class us.codecraft.webmagic.SpiderScheduler
- SpiderStatus - Class in us.codecraft.webmagic.monitor
- SpiderStatus(Spider, SpiderMonitor.MonitorSpiderListener) - Constructor for class us.codecraft.webmagic.monitor.SpiderStatus
- SpiderStatusMXBean - Interface in us.codecraft.webmagic.monitor
- start() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- start() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- start() - Method in class us.codecraft.webmagic.Spider
- startRequest(List<Request>) - Method in class us.codecraft.webmagic.Spider
-
Set startUrls of Spider.
Prior to startUrls of Site. - startRequests - Variable in class us.codecraft.webmagic.Spider
- startUrls(List<String>) - Method in class us.codecraft.webmagic.Spider
-
Set startUrls of Spider.
Prior to startUrls of Site. - stat - Variable in class us.codecraft.webmagic.Spider
- STAT_INIT - Static variable in class us.codecraft.webmagic.Spider
- STAT_RUNNING - Static variable in class us.codecraft.webmagic.Spider
- STAT_STOPPED - Static variable in class us.codecraft.webmagic.Spider
- StatusCode() - Constructor for class us.codecraft.webmagic.utils.HttpConstant.StatusCode
- stop() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- stop() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- stop() - Method in class us.codecraft.webmagic.Spider
- Stopped - Enum constant in enum us.codecraft.webmagic.Spider.Status
- stopWhenComplete() - Method in class us.codecraft.webmagic.Spider
-
Stop when all tasks in the queue are completed and all worker threads are also completed
- StringTemplateFormatter - Class in us.codecraft.webmagic.samples.formatter
- StringTemplateFormatter() - Constructor for class us.codecraft.webmagic.samples.formatter.StringTemplateFormatter
- subClazz() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
-
Specific the class of field of class of elements in collection for field.
- SubPageProcessor - Interface in us.codecraft.webmagic.handler
- SubPipeline - Interface in us.codecraft.webmagic.handler
T
- TargetUrl - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define the url patterns for class.
- Task - Interface in us.codecraft.webmagic
-
Interface for identifying different tasks.
- test(String...) - Method in class us.codecraft.webmagic.Spider
-
Process specific urls without url discovering.
- thread(int) - Method in class us.codecraft.webmagic.scripts.ScriptProcessorBuilder
- thread(int) - Method in class us.codecraft.webmagic.Spider
-
start with more than one threads
- thread(ExecutorService, int) - Method in class us.codecraft.webmagic.Spider
-
start with more than one threads
- threadNum - Variable in class us.codecraft.webmagic.Spider
- threadPool - Variable in class us.codecraft.webmagic.Spider
- TianyaPageProcesser - Class in us.codecraft.webmagic.samples
- TianyaPageProcesser() - Constructor for class us.codecraft.webmagic.samples.TianyaPageProcesser
- toJson(Object) - Method in class us.codecraft.webmagic.recover.MmapQueueScheduler
- toList(Class<T>) - Method in class us.codecraft.webmagic.selector.Json
- toObject(Class<T>) - Method in class us.codecraft.webmagic.selector.Json
- toString() - Method in class us.codecraft.webmagic.example.BaiduBaike
- toString() - Method in class us.codecraft.webmagic.example.GithubRepo
- toString() - Method in class us.codecraft.webmagic.model.samples.BaiduNews
- toString() - Method in class us.codecraft.webmagic.model.samples.IteyeBlog
- toString() - Method in class us.codecraft.webmagic.model.samples.News163
- toString() - Method in class us.codecraft.webmagic.Page
- toString() - Method in class us.codecraft.webmagic.proxy.Proxy
- toString() - Method in class us.codecraft.webmagic.Request
- toString() - Method in class us.codecraft.webmagic.ResultItems
- toString() - Method in class us.codecraft.webmagic.selector.AbstractSelectable
- toString() - Method in class us.codecraft.webmagic.selector.RegexSelector
- toString() - Method in class us.codecraft.webmagic.selector.ReplaceSelector
- toString() - Method in interface us.codecraft.webmagic.selector.Selectable
-
single string result
- toString() - Method in class us.codecraft.webmagic.Site
- toTask() - Method in class us.codecraft.webmagic.Site
- toURI() - Method in class us.codecraft.webmagic.proxy.Proxy
- TRACE - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Method
- type() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Extractor type, support XPath, CSS Selector and regex.
U
- Url() - Constructor for class us.codecraft.webmagic.model.sources.Source.Url
- URL_LIST - Static variable in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
- URL_LIST - Static variable in class us.codecraft.webmagic.samples.SinaBlogProcessor
- URL_POST - Static variable in class us.codecraft.webmagic.samples.AlexanderMcqueenGoodsProcessor
- URL_POST - Static variable in class us.codecraft.webmagic.samples.SinaBlogProcessor
- UrlUtils - Class in us.codecraft.webmagic.utils
-
url and html utils.
- UrlUtils() - Constructor for class us.codecraft.webmagic.utils.UrlUtils
- us.codecraft.webmagic - package us.codecraft.webmagic
-
Main class "Spider" and models.
- us.codecraft.webmagic.configurable - package us.codecraft.webmagic.configurable
- us.codecraft.webmagic.downloader - package us.codecraft.webmagic.downloader
-
Downloader is the part that downloads web pages and store in Page object.
- us.codecraft.webmagic.downloader.selenium - package us.codecraft.webmagic.downloader.selenium
- us.codecraft.webmagic.example - package us.codecraft.webmagic.example
- us.codecraft.webmagic.handler - package us.codecraft.webmagic.handler
- us.codecraft.webmagic.main - package us.codecraft.webmagic.main
- us.codecraft.webmagic.model - package us.codecraft.webmagic.model
-
Page model and annotations used to customize a crawler.
- us.codecraft.webmagic.model.annotation - package us.codecraft.webmagic.model.annotation
-
Annotations for defining a extractor.
- us.codecraft.webmagic.model.fields - package us.codecraft.webmagic.model.fields
- us.codecraft.webmagic.model.formatter - package us.codecraft.webmagic.model.formatter
- us.codecraft.webmagic.model.samples - package us.codecraft.webmagic.model.samples
- us.codecraft.webmagic.model.sources - package us.codecraft.webmagic.model.sources
- us.codecraft.webmagic.monitor - package us.codecraft.webmagic.monitor
- us.codecraft.webmagic.pipeline - package us.codecraft.webmagic.pipeline
-
Pipeline is the persistent and offline process part of crawler.
- us.codecraft.webmagic.processor - package us.codecraft.webmagic.processor
-
PageProcessor custom part of a crawler for specific site.
- us.codecraft.webmagic.processor.example - package us.codecraft.webmagic.processor.example
- us.codecraft.webmagic.proxy - package us.codecraft.webmagic.proxy
- us.codecraft.webmagic.recover - package us.codecraft.webmagic.recover
- us.codecraft.webmagic.samples - package us.codecraft.webmagic.samples
- us.codecraft.webmagic.samples.formatter - package us.codecraft.webmagic.samples.formatter
- us.codecraft.webmagic.samples.pipeline - package us.codecraft.webmagic.samples.pipeline
- us.codecraft.webmagic.samples.scheduler - package us.codecraft.webmagic.samples.scheduler
- us.codecraft.webmagic.scheduler - package us.codecraft.webmagic.scheduler
-
Scheduler is the part of url management.
- us.codecraft.webmagic.scheduler.component - package us.codecraft.webmagic.scheduler.component
-
Component of scheduler.
- us.codecraft.webmagic.scripts - package us.codecraft.webmagic.scripts
- us.codecraft.webmagic.scripts.config - package us.codecraft.webmagic.scripts.config
- us.codecraft.webmagic.scripts.languages - package us.codecraft.webmagic.scripts.languages
- us.codecraft.webmagic.selector - package us.codecraft.webmagic.selector
-
Selectors for page extraction.
- us.codecraft.webmagic.thread - package us.codecraft.webmagic.thread
- us.codecraft.webmagic.utils - package us.codecraft.webmagic.utils
-
Static utils of webmagic.
- USER_AGENT - Static variable in class us.codecraft.webmagic.utils.HttpConstant.Header
- uuid - Variable in class us.codecraft.webmagic.Spider
V
- validateProxy(Proxy) - Static method in class us.codecraft.webmagic.utils.ProxyUtils
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
The extractors to be combined.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Extractor expression, support XPath, CSS Selector and regex.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
-
Extractor expression, only regex can be used
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
-
Set formatter params.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.HelpUrl
-
The url patterns to crawl.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.TargetUrl
-
The url patterns for class.
Use regex expression with some changes:
"." stand for literal character "." instead of "any character". - valueOf(String) - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.Spider.Status
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.Spider.Status
-
Returns an array containing the constants of this enum type, in the order they are declared.
W
- waitNewUrl(CountableThreadPool, long) - Method in class us.codecraft.webmagic.SpiderScheduler
- WMCollections - Class in us.codecraft.webmagic.utils
- WMCollections() - Constructor for class us.codecraft.webmagic.utils.WMCollections
X
- xml(String, String) - Static method in class us.codecraft.webmagic.model.HttpRequestBody
- XML - Static variable in class us.codecraft.webmagic.model.HttpRequestBody.ContentType
- xpath(String) - Method in class us.codecraft.webmagic.selector.HtmlNode
- xpath(String) - Method in class us.codecraft.webmagic.selector.PlainText
- xpath(String) - Method in interface us.codecraft.webmagic.selector.Selectable
-
select list with xpath
- xpath(String) - Static method in class us.codecraft.webmagic.selector.Selectors
- XPath - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- XPath - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
- Xpath2Selector - Class in us.codecraft.webmagic.selector
-
支持xpath2.0的选择器。包装了HtmlCleaner和Saxon HE。
- Xpath2Selector(String) - Constructor for class us.codecraft.webmagic.selector.Xpath2Selector
- XpathSelector - Class in us.codecraft.webmagic.selector
-
XPath selector based on Xsoup.
- XpathSelector(String) - Constructor for class us.codecraft.webmagic.selector.XpathSelector
- xsoup(String) - Static method in class us.codecraft.webmagic.selector.Selectors
-
Deprecated.
Y
- YES - Enum constant in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
Z
- ZhihuPageProcessor - Class in us.codecraft.webmagic.processor.example
- ZhihuPageProcessor - Class in us.codecraft.webmagic.samples
- ZhihuPageProcessor() - Constructor for class us.codecraft.webmagic.processor.example.ZhihuPageProcessor
- ZhihuPageProcessor() - Constructor for class us.codecraft.webmagic.samples.ZhihuPageProcessor
- ZipCodePageProcessor - Class in us.codecraft.webmagic.samples.scheduler
- ZipCodePageProcessor() - Constructor for class us.codecraft.webmagic.samples.scheduler.ZipCodePageProcessor
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form
Page.fail(Request)
instead.