Package us.codecraft.webmagic.selector
package us.codecraft.webmagic.selector
Selectors for page extraction. Core API is the interface Selectable,and internal core is the interface Selector。
-
ClassDescriptionAll selectors will be arranged as a pipeline.CSS selector.Selector(extractor) for html elements.Selectable html.parse jsonJsonPath selector.
Used to extract content from JSON.Links selector based on jsoup.Selector(extractor) for html node.All extractors will do extracting separately,
and the results of extractors will combined as the final result.Selectable plain text.
Can not be selected by XPath or CSS Selector.Selector in regex.Replace selector.Selectable text.Selector(extractor) for text.Convenient methods for selectors.Borrowed from https://code.google.com/p/cx-extractor/支持xpath2.0的选择器。包装了HtmlCleaner和Saxon HE。XPath selector based on Xsoup.