Index

A B C D E F G H I J K L M N O P R S T U V X Y 
All Classes and Interfaces|All Packages

A

addPageModel(PageModelPipeline, Class...) - Method in class us.codecraft.webmagic.model.OOSpider
 
addSubPageProcessor(SubPageProcessor) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
addSubPipeline(SubPipeline) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
AfterExtractor - Interface in us.codecraft.webmagic.model
Interface to be implemented by page models that need to do something after fields are extracted.
afterProcess(Page) - Method in interface us.codecraft.webmagic.model.AfterExtractor
 
And - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
All extractors will be arranged as a pipeline.
AppStore - Class in us.codecraft.webmagic.example
 
AppStore() - Constructor for class us.codecraft.webmagic.example.AppStore
 

B

BaiduBaike - Class in us.codecraft.webmagic.example
 
BaiduBaike() - Constructor for class us.codecraft.webmagic.example.BaiduBaike
 
basicClassDetector - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
BasicClassDetector - Interface in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter<T> - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
BasicTypeFormatter.BooleanFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.ByteFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.CharactorFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.DoubleFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.FloatFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.IntegerFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.LongFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.ShortFormatter - Class in us.codecraft.webmagic.model.formatter
 
basicTypeFormatters - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
BloomFilterDuplicateRemover - Class in us.codecraft.webmagic.scheduler
BloomFilterDuplicateRemover for huge number of urls.
BloomFilterDuplicateRemover(int) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
BloomFilterDuplicateRemover(int, double) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
BooleanFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
build() - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
ByteFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 

C

CharactorFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
ClassUtils - Class in us.codecraft.webmagic.utils
 
ClassUtils() - Constructor for class us.codecraft.webmagic.utils.ClassUtils
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
clazz() - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
close() - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
CollectorPageModelPipeline<T> - Class in us.codecraft.webmagic.pipeline
 
CollectorPageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
combine(MultiPageModel) - Method in interface us.codecraft.webmagic.MultiPageModel
Combine multiPageModels to a whole object.
ComboExtract - Annotation Type in us.codecraft.webmagic.model.annotation
Combo 'ExtractBy' extractor with and/or operator.
ComboExtract.Op - Enum in us.codecraft.webmagic.model.annotation
 
ComboExtract.Source - Enum in us.codecraft.webmagic.model.annotation
types of source for extracting.
CompositePageProcessor - Class in us.codecraft.webmagic.handler
 
CompositePageProcessor(Site) - Constructor for class us.codecraft.webmagic.handler.CompositePageProcessor
 
CompositePipeline - Class in us.codecraft.webmagic.handler
 
CompositePipeline() - Constructor for class us.codecraft.webmagic.handler.CompositePipeline
 
ConfigurablePageProcessor - Class in us.codecraft.webmagic.configurable
 
ConfigurablePageProcessor(Site, List<ExtractRule>) - Constructor for class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
ConsolePageModelPipeline - Class in us.codecraft.webmagic.model
Print page model in console.
Usually used in test.
ConsolePageModelPipeline() - Constructor for class us.codecraft.webmagic.model.ConsolePageModelPipeline
 
convert(String, ObjectFormatter, Logger) - Method in class us.codecraft.webmagic.model.fields.PageField
 
create(Site, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
 
create(Site, PageModelPipeline, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
 
Css - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
 
Css - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
 

D

DateFormatter - Class in us.codecraft.webmagic.model.formatter
 
DateFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.DateFormatter
 
DEFAULT_CLAZZ - Static variable in class us.codecraft.webmagic.utils.MultiKeyMapBase
 
DEFAULT_FORMATTER - Static variable in annotation type us.codecraft.webmagic.model.annotation.Formatter
 
DEFAULT_PATTERN - Static variable in class us.codecraft.webmagic.model.formatter.DateFormatter
 
DefaultSource() - Constructor for class us.codecraft.webmagic.model.sources.Source.DefaultSource
 
deserializeRequest(String) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
detectBasicClass(Class<?>) - Method in interface us.codecraft.webmagic.model.formatter.BasicClassDetector
 
detectBasicClass(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
DoubleFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
DoubleKeyMap<K1,K2,V> - Class in us.codecraft.webmagic.utils
 
DoubleKeyMap() - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Map<K1, Map<K2, V>>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Map<K1, Map<K2, V>>, Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
init map with protoMapClass
download(Request, Task) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 

E

ExpressionType - Enum in us.codecraft.webmagic.configurable
 
ExtractBy - Annotation Type in us.codecraft.webmagic.model.annotation
Define the extractor for field or class.
ExtractBy.Source - Enum in us.codecraft.webmagic.model.annotation
types of source for extracting.
ExtractBy.Type - Enum in us.codecraft.webmagic.model.annotation
types of extractor expressions
ExtractByUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define a extractor to extract data in url of current page.
Extractor - Class in us.codecraft.webmagic.model
The object contains 'ExtractBy' information.
Extractor(Selector, Source, boolean, boolean) - Constructor for class us.codecraft.webmagic.model.Extractor
 
ExtractorUtils - Class in us.codecraft.webmagic.utils
Tools for annotation converting.
ExtractorUtils() - Constructor for class us.codecraft.webmagic.utils.ExtractorUtils
 
ExtractRule - Class in us.codecraft.webmagic.configurable
 
ExtractRule() - Constructor for class us.codecraft.webmagic.configurable.ExtractRule
 

F

FieldExtractor - Class in us.codecraft.webmagic.model
Wrapper of field and extractor.
FieldExtractor(Field, Selector, Source, boolean, boolean) - Constructor for class us.codecraft.webmagic.model.FieldExtractor
 
FileCacheQueueScheduler - Class in us.codecraft.webmagic.scheduler
Store urls and cursor in files so that a Spider can resume the status when shutdown.
FileCacheQueueScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
FilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.
FilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
FilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
 
FloatFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
format(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
format(String) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
format(String) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
formatter() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
If there are more than one formatter for a class, just specify the implement.
Formatter - Annotation Type in us.codecraft.webmagic.model.annotation
Define how the result string is convert to an object for field.
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
from(String) - Static method in class us.codecraft.webmagic.utils.RequestUtils
 

G

get(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
get(String) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(String, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
get(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
get(Page) - Method in class us.codecraft.webmagic.model.PageMapper
 
get(Request) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(Request, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
getAll(Page) - Method in class us.codecraft.webmagic.model.PageMapper
 
getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getCollected() - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
getCollectorPipeline() - Method in class us.codecraft.webmagic.model.OOSpider
 
getContent() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getDate() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getDescription() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
getErrorCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getErrorPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getErrorPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getErrorPages() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getErrorPages() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getErrorUrls() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getExpressionParams() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getExpressionType() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getExpressionValue() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getFieldName() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getFieldsIncludeSuperClass(Class) - Static method in class us.codecraft.webmagic.utils.ClassUtils
 
getFirstNoLoopbackIPAddresses() - Static method in class us.codecraft.webmagic.utils.IPUtils
 
getFork() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getFork() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getItemKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getLeftPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getLeftPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getName() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
getName() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getName() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getName() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getName() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getOtherPages() - Method in interface us.codecraft.webmagic.MultiPageModel
other pages to be extracted.
It is used to judge whether an object contains more than one page, and whether the pages of the object are all extracted.
getPage() - Method in interface us.codecraft.webmagic.MultiPageModel
page is the identifier of a page in pages for one object.
getPage(Request) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
getPageKey() - Method in interface us.codecraft.webmagic.MultiPageModel
Page key is the identifier for the object.
getPagePerSecond() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getPagePerSecond() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getQueueKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getReadme() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getSelector() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getSelector(ExtractBy) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
 
getSelectors(ExtractBy[]) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
 
getSetKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getSite() - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
getSite() - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
getSite() - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
getSpiderStatuses() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
getSpiderStatusMBean(Spider, SpiderMonitor.MonitorSpiderListener) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
getStar() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getStar() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getStartTime() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getStartTime() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getStatus() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getStatus() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getSuccessCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getSuccessPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getSuccessPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTags() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.DefaultSource
 
getText(Page, String, boolean, FieldExtractor) - Method in interface us.codecraft.webmagic.model.sources.Source
 
getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawHtml
 
getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawText
 
getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.SelectedHtml
 
getText(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.Url
 
getText(Page, String, boolean, FieldExtractor) - Static method in class us.codecraft.webmagic.model.sources.SourceTextExtractor
 
getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.DefaultSource
 
getTextList(Page, String, boolean, FieldExtractor) - Method in interface us.codecraft.webmagic.model.sources.Source
 
getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawHtml
 
getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.RawText
 
getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.SelectedHtml
 
getTextList(Page, String, boolean, FieldExtractor) - Method in class us.codecraft.webmagic.model.sources.Source.Url
 
getThread() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getThread() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTitle() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getTotalPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getTotalPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getUrl() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getUrl() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
GithubRepo - Class in us.codecraft.webmagic.example
 
GithubRepo() - Constructor for class us.codecraft.webmagic.example.GithubRepo
 
GithubRepoApi - Class in us.codecraft.webmagic.example
 
GithubRepoApi() - Constructor for class us.codecraft.webmagic.example.GithubRepoApi
 
GithubRepoPageMapper - Class in us.codecraft.webmagic.example
 
GithubRepoPageMapper() - Constructor for class us.codecraft.webmagic.example.GithubRepoPageMapper
 

H

HasKey - Interface in us.codecraft.webmagic.model
Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object.
HelpUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define the 'help' url patterns for class.

I

initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
initParam(String[]) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
instance() - Static method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
IntegerFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
IPUtils - Class in us.codecraft.webmagic.utils
 
IPUtils() - Constructor for class us.codecraft.webmagic.utils.IPUtils
 
isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
isMulti() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
isMulti() - Method in class us.codecraft.webmagic.model.Extractor
 
isNotNull() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
isNotNull() - Method in class us.codecraft.webmagic.model.Extractor
 

J

JsonFilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.
JsonFilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
JsonFilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
 
JsonFilePipeline - Class in us.codecraft.webmagic.pipeline
Store results to files in JSON format.
JsonFilePipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
JsonFilePipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
 
JsonPath - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
 
JsonPath - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
 

K

key() - Method in class us.codecraft.webmagic.example.GithubRepo
 
key() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
key() - Method in interface us.codecraft.webmagic.model.HasKey
 

L

logger - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
LongFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 

M

main(String[]) - Static method in class us.codecraft.webmagic.example.AppStore
 
main(String[]) - Static method in class us.codecraft.webmagic.example.BaiduBaike
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepo
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoApi
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
main(String[]) - Static method in class us.codecraft.webmagic.example.MonitorExample
 
main(String[]) - Static method in class us.codecraft.webmagic.example.OschinaBlog
 
main(String...) - Static method in class us.codecraft.webmagic.example.PatternProcessorExample
 
match(Request) - Method in class us.codecraft.webmagic.handler.PatternRequestMatcher
 
match(Request) - Method in interface us.codecraft.webmagic.handler.RequestMatcher
Check whether to process the page.

Please DO NOT change page status in this method.
MonitorExample - Class in us.codecraft.webmagic.example
 
MonitorExample() - Constructor for class us.codecraft.webmagic.example.MonitorExample
 
monitorSpiderListener - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
MonitorSpiderListener() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
multi - Variable in class us.codecraft.webmagic.model.Extractor
 
multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
Deprecated.
since 0.4.2
multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
Deprecated.
since 0.4.2
multi() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
Deprecated.
since 0.4.2
MultiKeyMapBase - Class in us.codecraft.webmagic.utils
multi-key map, some basic objects *
MultiKeyMapBase() - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
 
MultiKeyMapBase(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
 
MultiPageModel - Interface in us.codecraft.webmagic
Extract an object of more than one pages, such as news and articles.
MultiPagePipeline - Class in us.codecraft.webmagic.pipeline
A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page.
MultiPagePipeline() - Constructor for class us.codecraft.webmagic.pipeline.MultiPagePipeline
 
MultipleField - Class in us.codecraft.webmagic.model.fields
 
MultipleField(List<String>) - Constructor for class us.codecraft.webmagic.model.fields.MultipleField
 

N

newMap() - Method in class us.codecraft.webmagic.utils.MultiKeyMapBase
 
NO - Enum constant in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
 
notNull - Variable in class us.codecraft.webmagic.model.Extractor
 
notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded.
notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded.
notNull() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
Define whether the field can be null.
If set to 'true' and the extractor get no result, the entire class will be discarded.

O

ObjectFormatter<T> - Interface in us.codecraft.webmagic.model.formatter
 
ObjectFormatterBuilder - Class in us.codecraft.webmagic.model.formatter
 
ObjectFormatterBuilder() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
ObjectFormatters - Class in us.codecraft.webmagic.model.formatter
 
ObjectFormatters() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
onError(Request, Exception) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
onSuccess(Request) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
OOSpider<T> - Class in us.codecraft.webmagic.model
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".
OOSpider(ModelPageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
 
OOSpider(PageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
 
OOSpider(Site, PageModelPipeline, Class...) - Constructor for class us.codecraft.webmagic.model.OOSpider
create a spider
op() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
Combining operation of extractors.
operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.MultipleField
 
operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.PageField
 
operation(Object, FieldExtractor, Logger) - Method in class us.codecraft.webmagic.model.fields.SingleField
 
Or - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
All extractors will do extracting separately,
and the results of extractors will combined as the final result.
OschinaBlog - Class in us.codecraft.webmagic.example
 
OschinaBlog() - Constructor for class us.codecraft.webmagic.example.OschinaBlog
 

P

PageField - Class in us.codecraft.webmagic.model.fields
 
PageField() - Constructor for class us.codecraft.webmagic.model.fields.PageField
 
PageMapper<T> - Class in us.codecraft.webmagic.model
 
PageMapper(Class<T>) - Constructor for class us.codecraft.webmagic.model.PageMapper
 
PageModelPipeline<T> - Interface in us.codecraft.webmagic.pipeline
Implements PageModelPipeline to persistent your page model.
pattern - Variable in class us.codecraft.webmagic.handler.PatternRequestMatcher
match pattern.
PatternProcessor - Class in us.codecraft.webmagic.handler
 
PatternProcessor(String) - Constructor for class us.codecraft.webmagic.handler.PatternProcessor
 
PatternProcessorExample - Class in us.codecraft.webmagic.example
Created with IntelliJ IDEA.
PatternProcessorExample() - Constructor for class us.codecraft.webmagic.example.PatternProcessorExample
 
PatternRequestMatcher - Class in us.codecraft.webmagic.handler
Created with IntelliJ IDEA.
PatternRequestMatcher(String) - Constructor for class us.codecraft.webmagic.handler.PatternRequestMatcher
 
PhantomJSDownloader - Class in us.codecraft.webmagic.downloader
this downloader is used to download pages which need to render the javascript
PhantomJSDownloader() - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
PhantomJSDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
添加新的构造函数,支持phantomjs自定义命令
PhantomJSDownloader(String, String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
poll(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
pool - Variable in class us.codecraft.webmagic.scheduler.RedisScheduler
 
process(Object, Task) - Method in class us.codecraft.webmagic.model.ConsolePageModelPipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.FilePageModelPipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
 
process(T, Task) - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
process(T, Task) - Method in interface us.codecraft.webmagic.pipeline.PageModelPipeline
 
process(Page) - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
process(Page) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.MultiPagePipeline
 
processPage(Page) - Method in interface us.codecraft.webmagic.handler.SubPageProcessor
process the page, extract urls to fetch, extract the data and store
processResult(ResultItems, Task) - Method in interface us.codecraft.webmagic.handler.SubPipeline
process the page, extract urls to fetch, extract the data and store
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
put(Class<? extends ObjectFormatter>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
put(K1, Map<K2, V>) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
put(K1, K2, V) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 

R

RawHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
extract from the raw html
RawHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
extract from the raw html
RawHtml() - Constructor for class us.codecraft.webmagic.model.sources.Source.RawHtml
 
RawText - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
 
RawText() - Constructor for class us.codecraft.webmagic.model.sources.Source.RawText
 
rebuildBloomFilter() - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
RedisPriorityScheduler - Class in us.codecraft.webmagic.scheduler
the redis scheduler with priority
RedisPriorityScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
RedisPriorityScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
RedisScheduler - Class in us.codecraft.webmagic.scheduler
Use Redis as url scheduler for distributed crawlers.
RedisScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
 
RedisScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
 
Regex - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
 
Regex - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
 
register(Spider...) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
Register spider for monitor.
registerMBean(SpiderStatusMXBean) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
remove(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
remove(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
RequestMatcher - Interface in us.codecraft.webmagic.handler
 
RequestMatcher.MatchOther - Enum in us.codecraft.webmagic.handler
 
RequestUtils - Class in us.codecraft.webmagic.utils
 
RequestUtils() - Constructor for class us.codecraft.webmagic.utils.RequestUtils
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 

S

SelectedHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
extract from the content extracted by class extractor
SelectedHtml - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
extract from the content extracted by class extractor
SelectedHtml() - Constructor for class us.codecraft.webmagic.model.sources.Source.SelectedHtml
 
selector - Variable in class us.codecraft.webmagic.model.Extractor
 
serializeRequest(Request) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
setExpressionParams(String[]) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setExpressionType(ExpressionType) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setExpressionValue(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setField(Object, FieldExtractor, Object) - Method in class us.codecraft.webmagic.model.fields.PageField
 
setField(Field) - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
setFieldName(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setIsExtractLinks(boolean) - Method in class us.codecraft.webmagic.model.OOSpider
 
setMulti(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setNotNull(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
setSelector(Selector) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setSite(Site) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
setSubPageProcessors(SubPageProcessor...) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
setSubPipeline(SubPipeline...) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
setThread(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
ShortFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
SimpleHttpClient - Class in us.codecraft.webmagic
 
SimpleHttpClient() - Constructor for class us.codecraft.webmagic.SimpleHttpClient
 
SimpleHttpClient(Site) - Constructor for class us.codecraft.webmagic.SimpleHttpClient
 
SingleField - Class in us.codecraft.webmagic.model.fields
 
SingleField(String) - Constructor for class us.codecraft.webmagic.model.fields.SingleField
 
source - Variable in class us.codecraft.webmagic.model.Extractor
 
source() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
The source for extracting.
source() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
The source for extracting.
Source - Interface in us.codecraft.webmagic.model.sources
 
Source.DefaultSource - Class in us.codecraft.webmagic.model.sources
 
Source.RawHtml - Class in us.codecraft.webmagic.model.sources
 
Source.RawText - Class in us.codecraft.webmagic.model.sources
 
Source.SelectedHtml - Class in us.codecraft.webmagic.model.sources
 
Source.Url - Class in us.codecraft.webmagic.model.sources
 
sourceRegion() - Element in annotation type us.codecraft.webmagic.model.annotation.HelpUrl
Define the region for url extracting.
sourceRegion() - Element in annotation type us.codecraft.webmagic.model.annotation.TargetUrl
Define the region for url extracting.
SourceTextExtractor - Class in us.codecraft.webmagic.model.sources
 
SourceTextExtractor() - Constructor for class us.codecraft.webmagic.model.sources.SourceTextExtractor
 
spider - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
SpiderMonitor - Class in us.codecraft.webmagic.monitor
 
SpiderMonitor() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor
 
SpiderMonitor.MonitorSpiderListener - Class in us.codecraft.webmagic.monitor
 
SpiderStatus - Class in us.codecraft.webmagic.monitor
 
SpiderStatus(Spider, SpiderMonitor.MonitorSpiderListener) - Constructor for class us.codecraft.webmagic.monitor.SpiderStatus
 
SpiderStatusMXBean - Interface in us.codecraft.webmagic.monitor
 
start() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
start() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
stop() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
stop() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
subClazz() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
Specific the class of field of class of elements in collection for field.
SubPageProcessor - Interface in us.codecraft.webmagic.handler
 
SubPipeline - Interface in us.codecraft.webmagic.handler
 

T

TargetUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define the url patterns for class.
toString() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
toString() - Method in class us.codecraft.webmagic.example.GithubRepo
 
type() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
Extractor type, support XPath, CSS Selector and regex.

U

Url() - Constructor for class us.codecraft.webmagic.model.sources.Source.Url
 
us.codecraft.webmagic - package us.codecraft.webmagic
 
us.codecraft.webmagic.configurable - package us.codecraft.webmagic.configurable
 
us.codecraft.webmagic.downloader - package us.codecraft.webmagic.downloader
 
us.codecraft.webmagic.example - package us.codecraft.webmagic.example
 
us.codecraft.webmagic.handler - package us.codecraft.webmagic.handler
 
us.codecraft.webmagic.model - package us.codecraft.webmagic.model
Page model and annotations used to customize a crawler.
us.codecraft.webmagic.model.annotation - package us.codecraft.webmagic.model.annotation
Annotations for defining a extractor.
us.codecraft.webmagic.model.fields - package us.codecraft.webmagic.model.fields
 
us.codecraft.webmagic.model.formatter - package us.codecraft.webmagic.model.formatter
 
us.codecraft.webmagic.model.sources - package us.codecraft.webmagic.model.sources
 
us.codecraft.webmagic.monitor - package us.codecraft.webmagic.monitor
 
us.codecraft.webmagic.pipeline - package us.codecraft.webmagic.pipeline
 
us.codecraft.webmagic.scheduler - package us.codecraft.webmagic.scheduler
 
us.codecraft.webmagic.utils - package us.codecraft.webmagic.utils
 

V

value() - Element in annotation type us.codecraft.webmagic.model.annotation.ComboExtract
The extractors to be combined.
value() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractBy
Extractor expression, support XPath, CSS Selector and regex.
value() - Element in annotation type us.codecraft.webmagic.model.annotation.ExtractByUrl
Extractor expression, only regex can be used
value() - Element in annotation type us.codecraft.webmagic.model.annotation.Formatter
Set formatter params.
value() - Element in annotation type us.codecraft.webmagic.model.annotation.HelpUrl
The url patterns to crawl.
value() - Element in annotation type us.codecraft.webmagic.model.annotation.TargetUrl
The url patterns for class.
Use regex expression with some changes:
"." stand for literal character "." instead of "any character".
valueOf(String) - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
Returns the enum constant of this type with the specified name.
values() - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
Returns an array containing the constants of this enum type, in the order they are declared.

X

XPath - Enum constant in enum us.codecraft.webmagic.configurable.ExpressionType
 
XPath - Enum constant in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
 

Y

YES - Enum constant in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
 
A B C D E F G H I J K L M N O P R S T U V X Y 
All Classes and Interfaces|All Packages