All Classes
-
All Classes Interface Summary Class Summary Enum Summary Annotation Types Summary Class Description AfterExtractor Interface to be implemented by page models that need to do something after fields are extracted.AppStore BaiduBaike BasicClassDetector BasicTypeFormatter<T> BasicTypeFormatter.BooleanFormatter BasicTypeFormatter.ByteFormatter BasicTypeFormatter.CharactorFormatter BasicTypeFormatter.DoubleFormatter BasicTypeFormatter.FloatFormatter BasicTypeFormatter.IntegerFormatter BasicTypeFormatter.LongFormatter BasicTypeFormatter.ShortFormatter BloomFilterDuplicateRemover BloomFilterDuplicateRemover for huge number of urls.ClassUtils CollectorPageModelPipeline<T> ComboExtract Combo 'ExtractBy' extractor with and/or operator.ComboExtract.Op ComboExtract.Source types of source for extracting.CompositePageProcessor CompositePipeline ConfigurablePageProcessor ConsolePageModelPipeline Print page model in console.
Usually used in test.DateFormatter DoubleKeyMap<K1,K2,V> ExpressionType ExtractBy Define the extractor for field or class.ExtractBy.Source types of source for extracting.ExtractBy.Type types of extractor expressionsExtractByUrl Define a extractor to extract data in url of current page.Extractor The object contains 'ExtractBy' information.ExtractorUtils Tools for annotation converting.ExtractRule FieldExtractor Wrapper of field and extractor.FileCacheQueueScheduler Store urls and cursor in files so that a Spider can resume the status when shutdown.FilePageModelPipeline Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.Formatter Define how the result string is convert to an object for field.GithubRepo GithubRepoApi GithubRepoPageMapper HasKey Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object.HelpUrl Define the 'help' url patterns for class.IPUtils JsonFilePageModelPipeline Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.JsonFilePipeline Store results to files in JSON format.MonitorExample MultiKeyMapBase multi-key map, some basic objects *MultiPageModel Extract an object of more than one pages, such as news and articles.MultiPagePipeline A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page.MultipleField ObjectFormatter<T> ObjectFormatterBuilder ObjectFormatters OOSpider<T> The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".OschinaBlog PageField PageMapper<T> PageModelPipeline<T> Implements PageModelPipeline to persistent your page model.PatternProcessor PatternProcessorExample Created with IntelliJ IDEA.PatternRequestMatcher Created with IntelliJ IDEA.PhantomJSDownloader this downloader is used to download pages which need to render the javascriptRedisPriorityScheduler the redis scheduler with priorityRedisScheduler Use Redis as url scheduler for distributed crawlers.RequestMatcher RequestMatcher.MatchOther RequestUtils SimpleHttpClient SingleField Source Source.DefaultSource Source.RawHtml Source.RawText Source.SelectedHtml Source.Url SourceTextExtractor SpiderMonitor SpiderStatus SpiderStatusMXBean SubPageProcessor SubPipeline TargetUrl Define the url patterns for class.