Index
All Classes and Interfaces|All Packages
A
- addPageModel(PageModelPipeline, Class...) - Method in class us.codecraft.webmagic.model.OOSpider
- addSubPageProcessor(SubPageProcessor) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- addSubPipeline(SubPipeline) - Method in class us.codecraft.webmagic.handler.CompositePipeline
- AfterExtractor - Interface in us.codecraft.webmagic.model
-
Interface to be implemented by page models that need to do something after fields are extracted.
- afterProcess(Page) - Method in interface us.codecraft.webmagic.model.AfterExtractor
- And - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
All extractors will be arranged as a pipeline.
- AppStore - Class in us.codecraft.webmagic.example
- AppStore() - Constructor for class us.codecraft.webmagic.example.AppStore
B
- BaiduBaike - Class in us.codecraft.webmagic.example
- BaiduBaike() - Constructor for class us.codecraft.webmagic.example.BaiduBaike
- basicClassDetector - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- BasicClassDetector - Interface in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter<T> - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- BasicTypeFormatter.BooleanFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.ByteFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.CharactorFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.DoubleFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.FloatFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.IntegerFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.LongFormatter - Class in us.codecraft.webmagic.model.formatter
- BasicTypeFormatter.ShortFormatter - Class in us.codecraft.webmagic.model.formatter
- basicTypeFormatters - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- BloomFilterDuplicateRemover - Class in us.codecraft.webmagic.scheduler
-
BloomFilterDuplicateRemover for huge number of urls.
- BloomFilterDuplicateRemover(int) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- BloomFilterDuplicateRemover(int, double) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- BooleanFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
- build() - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
- ByteFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
C
- CharactorFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
- ClassUtils - Class in us.codecraft.webmagic.utils
- ClassUtils() - Constructor for class us.codecraft.webmagic.utils.ClassUtils
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
- clazz() - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
- clazz() - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
- close() - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- CollectorPageModelPipeline<T> - Class in us.codecraft.webmagic.pipeline
- CollectorPageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
- combine(MultiPageModel) - Method in interface us.codecraft.webmagic.MultiPageModel
-
Combine multiPageModels to a whole object.
- ComboExtract - Annotation Type in us.codecraft.webmagic.model.annotation
-
Combo 'ExtractBy' extractor with and/or operator.
- ComboExtract.Op - Enum in us.codecraft.webmagic.model.annotation
- ComboExtract.Source - Enum in us.codecraft.webmagic.model.annotation
-
types of source for extracting.
- CompositePageProcessor - Class in us.codecraft.webmagic.handler
- CompositePageProcessor(Site) - Constructor for class us.codecraft.webmagic.handler.CompositePageProcessor
- CompositePipeline - Class in us.codecraft.webmagic.handler
- CompositePipeline() - Constructor for class us.codecraft.webmagic.handler.CompositePipeline
- ConfigurablePageProcessor - Class in us.codecraft.webmagic.configurable
- ConfigurablePageProcessor(Site, List<ExtractRule>) - Constructor for class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
- ConsolePageModelPipeline - Class in us.codecraft.webmagic.model
-
Print page model in console.
Usually used in test. - ConsolePageModelPipeline() - Constructor for class us.codecraft.webmagic.model.ConsolePageModelPipeline
- convert(String, ObjectFormatter, Logger) - Method in class us.codecraft.webmagic.model.fields.PageField
- create(Site, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
- create(Site, PageModelPipeline, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
- Css - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- Css - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
D
- DateFormatter - Class in us.codecraft.webmagic.model.formatter
- DateFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.DateFormatter
- DEFAULT_CLAZZ - Static variable in class us.codecraft.webmagic.utils.MultiKeyMapBase
- DEFAULT_FORMATTER - Static variable in annotation type us.codecraft.webmagic.model.annotation.Formatter
- DEFAULT_PATTERN - Static variable in class us.codecraft.webmagic.model.formatter.DateFormatter
- DefaultSource() - Constructor for class us.codecraft.webmagic.model.sources.Source.DefaultSource
- deserializeRequest(String) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- detectBasicClass(Class<?>) - Method in interface us.codecraft.webmagic.model.formatter.BasicClassDetector
- detectBasicClass(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- DoubleFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
- DoubleKeyMap<K1,
K2, V> - Class in us.codecraft.webmagic.utils - DoubleKeyMap() - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
- DoubleKeyMap(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
- DoubleKeyMap(Map<K1, Map<K2, V>>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
- DoubleKeyMap(Map<K1, Map<K2, V>>, Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
-
init map with protoMapClass
- download(Request, Task) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
E
- ExpressionType - Enum in us.codecraft.webmagic.configurable
- ExtractBy - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define the extractor for field or class.
- ExtractBy.Source - Enum in us.codecraft.webmagic.model.annotation
-
types of source for extracting.
- ExtractBy.Type - Enum in us.codecraft.webmagic.model.annotation
-
types of extractor expressions
- ExtractByUrl - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define a extractor to extract data in url of current page.
- Extractor - Class in us.codecraft.webmagic.model
-
The object contains 'ExtractBy' information.
- Extractor(Selector, Source, boolean, boolean) - Constructor for class us.codecraft.webmagic.model.Extractor
- ExtractorUtils - Class in us.codecraft.webmagic.utils
-
Tools for annotation converting.
- ExtractorUtils() - Constructor for class us.codecraft.webmagic.utils.ExtractorUtils
- ExtractRule - Class in us.codecraft.webmagic.configurable
- ExtractRule() - Constructor for class us.codecraft.webmagic.configurable.ExtractRule
F
- FieldExtractor - Class in us.codecraft.webmagic.model
-
Wrapper of field and extractor.
- FieldExtractor(Field, Selector, Source, boolean, boolean) - Constructor for class us.codecraft.webmagic.model.FieldExtractor
- FileCacheQueueScheduler - Class in us.codecraft.webmagic.scheduler
-
Store urls and cursor in files so that a Spider can resume the status when shutdown.
- FileCacheQueueScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- FilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
-
Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name. - FilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
-
new JsonFilePageModelPipeline with default path "/data/webmagic/"
- FilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
- FloatFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
- format(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- format(String) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
- format(String) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
- formatter() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
-
If there are more than one formatter for a class, just specify the implement.
- Formatter - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define how the result string is convert to an object for field.
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
- formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
- from(String) - Static method in class us.codecraft.webmagic.utils.RequestUtils
G
- get(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
- get(String) - Method in class us.codecraft.webmagic.SimpleHttpClient
- get(String, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
- get(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- get(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- get(Page) - Method in class us.codecraft.webmagic.model.PageMapper
- get(Request) - Method in class us.codecraft.webmagic.SimpleHttpClient
- get(Request, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
- getAll(Page) - Method in class us.codecraft.webmagic.model.PageMapper
- getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepo
- getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getCollected() - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
- getCollectorPipeline() - Method in class us.codecraft.webmagic.model.OOSpider
- getContent() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getDate() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getDescription() - Method in class us.codecraft.webmagic.example.BaiduBaike
- getErrorCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- getErrorPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getErrorPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getErrorPages() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getErrorPages() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getErrorUrls() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- getExpressionParams() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getExpressionType() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getExpressionValue() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getFieldName() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getFieldsIncludeSuperClass(Class) - Static method in class us.codecraft.webmagic.utils.ClassUtils
- getFirstNoLoopbackIPAddresses() - Static method in class us.codecraft.webmagic.utils.IPUtils
- getFork() - Method in class us.codecraft.webmagic.example.GithubRepo
- getFork() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getItemKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepo
- getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getLeftPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getLeftPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getName() - Method in class us.codecraft.webmagic.example.BaiduBaike
- getName() - Method in class us.codecraft.webmagic.example.GithubRepo
- getName() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getName() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getName() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getOtherPages() - Method in interface us.codecraft.webmagic.MultiPageModel
-
other pages to be extracted.
It is used to judge whether an object contains more than one page, and whether the pages of the object are all extracted. - getPage() - Method in interface us.codecraft.webmagic.MultiPageModel
-
page is the identifier of a page in pages for one object.
- getPage(Request) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
- getPageKey() - Method in interface us.codecraft.webmagic.MultiPageModel
-
Page key is the identifier for the object.
- getPagePerSecond() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getPagePerSecond() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getQueueKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getReadme() - Method in class us.codecraft.webmagic.example.GithubRepo
- getSelector() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- getSelector(ExtractBy) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
- getSelectors(ExtractBy[]) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
- getSetKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getSite() - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
- getSite() - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
- getSite() - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- getSpiderStatuses() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
- getSpiderStatusMBean(Spider, SpiderMonitor.MonitorSpiderListener) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
- getStar() - Method in class us.codecraft.webmagic.example.GithubRepo
- getStar() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getStartTime() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getStartTime() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getStatus() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getStatus() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getSuccessCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- getSuccessPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getSuccessPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getTags() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.DefaultSource
- getText(Page, String, boolean, FieldExtractor) - Method in interface us.codecraft.webmagic.model.sources.Source
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawHtml
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawText
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.SelectedHtml
- getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.Url
- getText(Page, String, boolean, FieldExtractor) - Static method in class us.codecraft.webmagic.model.sources.SourceTextExtractor
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.DefaultSource
- getTextList(Page, String, boolean, FieldExtractor) - Method in interface us.codecraft.webmagic.model.sources.Source
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawHtml
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawText
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.SelectedHtml
- getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.Url
- getThread() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getThread() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getTitle() - Method in class us.codecraft.webmagic.example.OschinaBlog
- getTotalPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- getTotalPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- getUrl() - Method in class us.codecraft.webmagic.example.GithubRepo
- getUrl() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- GithubRepo - Class in us.codecraft.webmagic.example
- GithubRepo() - Constructor for class us.codecraft.webmagic.example.GithubRepo
- GithubRepoApi - Class in us.codecraft.webmagic.example
- GithubRepoApi() - Constructor for class us.codecraft.webmagic.example.GithubRepoApi
- GithubRepoPageMapper - Class in us.codecraft.webmagic.example
- GithubRepoPageMapper() - Constructor for class us.codecraft.webmagic.example.GithubRepoPageMapper
H
- HasKey - Interface in us.codecraft.webmagic.model
-
Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object. - HelpUrl - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define the 'help' url patterns for class.
I
- initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
- initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
- initParam(String[]) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
- instance() - Static method in class us.codecraft.webmagic.monitor.SpiderMonitor
- IntegerFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
- IPUtils - Class in us.codecraft.webmagic.utils
- IPUtils() - Constructor for class us.codecraft.webmagic.utils.IPUtils
- isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- isMulti() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- isMulti() - Method in class us.codecraft.webmagic.model.Extractor
- isNotNull() - Method in class us.codecraft.webmagic.configurable.ExtractRule
- isNotNull() - Method in class us.codecraft.webmagic.model.Extractor
J
- JsonFilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
-
Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name. - JsonFilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
-
new JsonFilePageModelPipeline with default path "/data/webmagic/"
- JsonFilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
- JsonFilePipeline - Class in us.codecraft.webmagic.pipeline
-
Store results to files in JSON format.
- JsonFilePipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
-
new JsonFilePageModelPipeline with default path "/data/webmagic/"
- JsonFilePipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
- JsonPath - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- JsonPath - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
K
- key() - Method in class us.codecraft.webmagic.example.GithubRepo
- key() - Method in class us.codecraft.webmagic.example.GithubRepoApi
- key() - Method in interface us.codecraft.webmagic.model.HasKey
L
- logger - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
- LongFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
M
- main(String[]) - Static method in class us.codecraft.webmagic.example.AppStore
- main(String[]) - Static method in class us.codecraft.webmagic.example.BaiduBaike
- main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepo
- main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoApi
- main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoPageMapper
- main(String[]) - Static method in class us.codecraft.webmagic.example.MonitorExample
- main(String[]) - Static method in class us.codecraft.webmagic.example.OschinaBlog
- main(String...) - Static method in class us.codecraft.webmagic.example.PatternProcessorExample
- match(Request) - Method in class us.codecraft.webmagic.handler.PatternRequestMatcher
- match(Request) - Method in interface us.codecraft.webmagic.handler.RequestMatcher
-
Check whether to process the page.
Please DO NOT change page status in this method. - MonitorExample - Class in us.codecraft.webmagic.example
- MonitorExample() - Constructor for class us.codecraft.webmagic.example.MonitorExample
- monitorSpiderListener - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
- MonitorSpiderListener() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- multi - Variable in class us.codecraft.webmagic.model.Extractor
- multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
Deprecated.since 0.4.2
- multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Deprecated.since 0.4.2
- multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
-
Deprecated.since 0.4.2
- MultiKeyMapBase - Class in us.codecraft.webmagic.utils
-
multi-key map, some basic objects *
- MultiKeyMapBase() - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
- MultiKeyMapBase(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
- MultiPageModel - Interface in us.codecraft.webmagic
-
Extract an object of more than one pages, such as news and articles.
- MultiPagePipeline - Class in us.codecraft.webmagic.pipeline
-
A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page. - MultiPagePipeline() - Constructor for class us.codecraft.webmagic.pipeline.MultiPagePipeline
- MultipleField - Class in us.codecraft.webmagic.model.fields
- MultipleField(List<String>) - Constructor for class us.codecraft.webmagic.model.fields.MultipleField
N
- newMap() - Method in class us.codecraft.webmagic.utils.MultiKeyMapBase
- NO - Enum constant in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
- notNull - Variable in class us.codecraft.webmagic.model.Extractor
- notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded. - notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded. - notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
-
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded.
O
- ObjectFormatter<T> - Interface in us.codecraft.webmagic.model.formatter
- ObjectFormatterBuilder - Class in us.codecraft.webmagic.model.formatter
- ObjectFormatterBuilder() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
- ObjectFormatters - Class in us.codecraft.webmagic.model.formatter
- ObjectFormatters() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatters
- onError(Request, Exception) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- onSuccess(Request) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
- OOSpider<T> - Class in us.codecraft.webmagic.model
-
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model". - OOSpider(ModelPageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
- OOSpider(PageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
- OOSpider(Site, PageModelPipeline, Class...) - Constructor for class us.codecraft.webmagic.model.OOSpider
-
create a spider
- op() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
Combining operation of extractors.
- operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.MultipleField
- operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.PageField
- operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.SingleField
- Or - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
All extractors will do extracting separately,
and the results of extractors will combined as the final result. - OschinaBlog - Class in us.codecraft.webmagic.example
- OschinaBlog() - Constructor for class us.codecraft.webmagic.example.OschinaBlog
P
- PageField - Class in us.codecraft.webmagic.model.fields
- PageField() - Constructor for class us.codecraft.webmagic.model.fields.PageField
- PageMapper<T> - Class in us.codecraft.webmagic.model
- PageMapper(Class<T>) - Constructor for class us.codecraft.webmagic.model.PageMapper
- PageModelPipeline<T> - Interface in us.codecraft.webmagic.pipeline
-
Implements PageModelPipeline to persistent your page model.
- pattern - Variable in class us.codecraft.webmagic.handler.PatternRequestMatcher
-
match pattern.
- PatternProcessor - Class in us.codecraft.webmagic.handler
- PatternProcessor(String) - Constructor for class us.codecraft.webmagic.handler.PatternProcessor
- PatternProcessorExample - Class in us.codecraft.webmagic.example
-
Created with IntelliJ IDEA.
- PatternProcessorExample() - Constructor for class us.codecraft.webmagic.example.PatternProcessorExample
- PatternRequestMatcher - Class in us.codecraft.webmagic.handler
-
Created with IntelliJ IDEA.
- PatternRequestMatcher(String) - Constructor for class us.codecraft.webmagic.handler.PatternRequestMatcher
- PhantomJSDownloader - Class in us.codecraft.webmagic.downloader
-
this downloader is used to download pages which need to render the javascript
- PhantomJSDownloader() - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
- PhantomJSDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
添加新的构造函数,支持phantomjs自定义命令
- PhantomJSDownloader(String, String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
-
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- pool - Variable in class us.codecraft.webmagic.scheduler.RedisScheduler
- process(Object, Task) - Method in class us.codecraft.webmagic.model.ConsolePageModelPipeline
- process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.FilePageModelPipeline
- process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
- process(T, Task) - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
- process(T, Task) - Method in interface us.codecraft.webmagic.pipeline.PageModelPipeline
- process(Page) - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
- process(Page) - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
- process(Page) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.handler.CompositePipeline
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePipeline
- process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.MultiPagePipeline
- processPage(Page) - Method in interface us.codecraft.webmagic.handler.SubPageProcessor
-
process the page, extract urls to fetch, extract the data and store
- processResult(ResultItems, Task) - Method in interface us.codecraft.webmagic.handler.SubPipeline
-
process the page, extract urls to fetch, extract the data and store
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
- put(Class<? extends ObjectFormatter>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
- put(K1, Map<K2, V>) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- put(K1, K2, V) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
R
- RawHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
extract from the raw html
- RawHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
extract from the raw html
- RawHtml() - Constructor for class us.codecraft.webmagic.model.sources.Source.RawHtml
- RawText - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
- RawText() - Constructor for class us.codecraft.webmagic.model.sources.Source.RawText
- rebuildBloomFilter() - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- RedisPriorityScheduler - Class in us.codecraft.webmagic.scheduler
-
the redis scheduler with priority
- RedisPriorityScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- RedisPriorityScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- RedisScheduler - Class in us.codecraft.webmagic.scheduler
-
Use Redis as url scheduler for distributed crawlers.
- RedisScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
- RedisScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
- Regex - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- Regex - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
- register(Spider...) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
-
Register spider for monitor.
- registerMBean(SpiderStatusMXBean) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
- remove(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- remove(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
- RequestMatcher - Interface in us.codecraft.webmagic.handler
- RequestMatcher.MatchOther - Enum in us.codecraft.webmagic.handler
- RequestUtils - Class in us.codecraft.webmagic.utils
- RequestUtils() - Constructor for class us.codecraft.webmagic.utils.RequestUtils
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
- resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
S
- SelectedHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
extract from the content extracted by class extractor
- SelectedHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
extract from the content extracted by class extractor
- SelectedHtml() - Constructor for class us.codecraft.webmagic.model.sources.Source.SelectedHtml
- selector - Variable in class us.codecraft.webmagic.model.Extractor
- serializeRequest(Request) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
- setExpressionParams(String[]) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setExpressionType(ExpressionType) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setExpressionValue(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setField(Object, FieldExtractor, Object) - Method in class us.codecraft.webmagic.model.fields.PageField
- setField(Field) - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
- setFieldName(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setIsExtractLinks(boolean) - Method in class us.codecraft.webmagic.model.OOSpider
- setMulti(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setNotNull(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.SimpleHttpClient
- setSelector(Selector) - Method in class us.codecraft.webmagic.configurable.ExtractRule
- setSite(Site) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- setSubPageProcessors(SubPageProcessor...) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
- setSubPipeline(SubPipeline...) - Method in class us.codecraft.webmagic.handler.CompositePipeline
- setThread(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
- ShortFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
- SimpleHttpClient - Class in us.codecraft.webmagic
- SimpleHttpClient() - Constructor for class us.codecraft.webmagic.SimpleHttpClient
- SimpleHttpClient(Site) - Constructor for class us.codecraft.webmagic.SimpleHttpClient
- SingleField - Class in us.codecraft.webmagic.model.fields
- SingleField(String) - Constructor for class us.codecraft.webmagic.model.fields.SingleField
- source - Variable in class us.codecraft.webmagic.model.Extractor
- source() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
The source for extracting.
- source() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
The source for extracting.
- Source - Interface in us.codecraft.webmagic.model.sources
- Source.DefaultSource - Class in us.codecraft.webmagic.model.sources
- Source.RawHtml - Class in us.codecraft.webmagic.model.sources
- Source.RawText - Class in us.codecraft.webmagic.model.sources
- Source.SelectedHtml - Class in us.codecraft.webmagic.model.sources
- Source.Url - Class in us.codecraft.webmagic.model.sources
- sourceRegion() - Element in annotation type us.codecraft.webmagic.model.annotation.HelpUrl
-
Define the region for url extracting.
- sourceRegion() - Element in annotation type us.codecraft.webmagic.model.annotation.TargetUrl
-
Define the region for url extracting.
- SourceTextExtractor - Class in us.codecraft.webmagic.model.sources
- SourceTextExtractor() - Constructor for class us.codecraft.webmagic.model.sources.SourceTextExtractor
- spider - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
- SpiderMonitor - Class in us.codecraft.webmagic.monitor
- SpiderMonitor() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor
- SpiderMonitor.MonitorSpiderListener - Class in us.codecraft.webmagic.monitor
- SpiderStatus - Class in us.codecraft.webmagic.monitor
- SpiderStatus(Spider, SpiderMonitor.MonitorSpiderListener) - Constructor for class us.codecraft.webmagic.monitor.SpiderStatus
- SpiderStatusMXBean - Interface in us.codecraft.webmagic.monitor
- start() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- start() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- stop() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
- stop() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
- subClazz() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
-
Specific the class of field of class of elements in collection for field.
- SubPageProcessor - Interface in us.codecraft.webmagic.handler
- SubPipeline - Interface in us.codecraft.webmagic.handler
T
- TargetUrl - Annotation Type in us.codecraft.webmagic.model.annotation
-
Define the url patterns for class.
- toString() - Method in class us.codecraft.webmagic.example.BaiduBaike
- toString() - Method in class us.codecraft.webmagic.example.GithubRepo
- type() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Extractor type, support XPath, CSS Selector and regex.
U
- Url() - Constructor for class us.codecraft.webmagic.model.sources.Source.Url
- us.codecraft.webmagic - package us.codecraft.webmagic
- us.codecraft.webmagic.configurable - package us.codecraft.webmagic.configurable
- us.codecraft.webmagic.downloader - package us.codecraft.webmagic.downloader
- us.codecraft.webmagic.example - package us.codecraft.webmagic.example
- us.codecraft.webmagic.handler - package us.codecraft.webmagic.handler
- us.codecraft.webmagic.model - package us.codecraft.webmagic.model
-
Page model and annotations used to customize a crawler.
- us.codecraft.webmagic.model.annotation - package us.codecraft.webmagic.model.annotation
-
Annotations for defining a extractor.
- us.codecraft.webmagic.model.fields - package us.codecraft.webmagic.model.fields
- us.codecraft.webmagic.model.formatter - package us.codecraft.webmagic.model.formatter
- us.codecraft.webmagic.model.sources - package us.codecraft.webmagic.model.sources
- us.codecraft.webmagic.monitor - package us.codecraft.webmagic.monitor
- us.codecraft.webmagic.pipeline - package us.codecraft.webmagic.pipeline
- us.codecraft.webmagic.scheduler - package us.codecraft.webmagic.scheduler
- us.codecraft.webmagic.utils - package us.codecraft.webmagic.utils
V
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
-
The extractors to be combined.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
-
Extractor expression, support XPath, CSS Selector and regex.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
-
Extractor expression, only regex can be used
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
-
Set formatter params.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.HelpUrl
-
The url patterns to crawl.
- value() - Element in annotation type us.codecraft.webmagic.model.annotation.TargetUrl
-
The url patterns for class.
Use regex expression with some changes:
"." stand for literal character "." instead of "any character". - valueOf(String) - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
-
Returns an array containing the constants of this enum type, in the order they are declared.
X
- XPath - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
- XPath - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
Y
- YES - Enum constant in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
All Classes and Interfaces|All Packages