Package us.codecraft.webmagic.pipeline
Pipeline is the persistent and offline process part of crawler.
-
Interface Summary Interface Description CollectorPipeline<T> Pipeline that can collect and store results.PageModelPipeline<T> Implements PageModelPipeline to persistent your page model.Pipeline Pipeline is the persistent and offline process part of crawler.
The interface Pipeline can be implemented to customize ways of persistent. -
Class Summary Class Description CollectorPageModelPipeline<T> ConsolePipeline Write results in console.
Usually used in test.FilePageModelPipeline Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.FilePipeline Store results in files.JsonFilePageModelPipeline Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.JsonFilePipeline Store results to files in JSON format.MultiPagePipeline A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page.ResultItemsCollectorPipeline