Package us.codecraft.webmagic.pipeline
package us.codecraft.webmagic.pipeline
Pipeline is the persistent and offline process part of crawler.
-
ClassDescriptionPipeline that can collect and store results.Write results in console.
Usually used in test.Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.Store results in files.Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.Store results to files in JSON format.A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page.Implements PageModelPipeline to persistent your page model.Pipeline is the persistent and offline process part of crawler.
The interface Pipeline can be implemented to customize ways of persistent.