All Classes and Interfaces (webmagic-extension 1.0.3-SNAPSHOT API)

Class

Description

Interface to be implemented by page models that need to do something after fields are extracted.

BasicClassDetector

BasicTypeFormatter<T>

BasicTypeFormatter.BooleanFormatter

BasicTypeFormatter.ByteFormatter

BasicTypeFormatter.CharactorFormatter

BasicTypeFormatter.DoubleFormatter

BasicTypeFormatter.FloatFormatter

BasicTypeFormatter.IntegerFormatter

BasicTypeFormatter.LongFormatter

BasicTypeFormatter.ShortFormatter

BloomFilterDuplicateRemover

BloomFilterDuplicateRemover for huge number of urls.

CollectorPageModelPipeline<T>

Combo 'ExtractBy' extractor with and/or operator.

ComboExtract.Op

ComboExtract.Source

types of source for extracting.

CompositePageProcessor

CompositePipeline

ConfigurablePageProcessor

ConsolePageModelPipeline

Print page model in console.
Usually used in test.

DoubleKeyMap<K1,K2,V>

Define the extractor for field or class.

ExtractBy.Source

types of source for extracting.

types of extractor expressions

Define a extractor to extract data in url of current page.

The object contains 'ExtractBy' information.

Tools for annotation converting.

Wrapper of field and extractor.

FileCacheQueueScheduler

Store urls and cursor in files so that a Spider can resume the status when shutdown.

FilePageModelPipeline

Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.

Define how the result string is convert to an object for field.

GithubRepoPageMapper

Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object.

Define the 'help' url patterns for class.

JsonFilePageModelPipeline

Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.

JsonFilePipeline

Store results to files in JSON format.

MultiKeyMapBase

multi-key map, some basic objects *

Extract an object of more than one pages, such as news and articles.

MultiPagePipeline

A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page.

ObjectFormatter<T>

ObjectFormatterBuilder

ObjectFormatters

The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".

PageModelPipeline<T>

Implements PageModelPipeline to persistent your page model.

PatternProcessor

PatternProcessorExample

Created with IntelliJ IDEA.

PatternRequestMatcher

Created with IntelliJ IDEA.

PhantomJSDownloader

this downloader is used to download pages which need to render the javascript

RedisPriorityScheduler

the redis scheduler with priority

Use Redis as url scheduler for distributed crawlers.

RequestMatcher.MatchOther

SimpleHttpClient

Source.DefaultSource

Source.SelectedHtml

SourceTextExtractor

SpiderStatusMXBean

SubPageProcessor

Define the url patterns for class.