Package us.codecraft.webmagic.selector
Selectors for page extraction. Core API is the interface Selectable,and internal core is the interface Selector。
-
Interface Summary Interface Description ElementSelector Selector(extractor) for html elements.NodeSelector Selector(extractor) for html node.Selectable Selectable text.Selector Selector(extractor) for text. -
Class Summary Class Description AbstractSelectable AndSelector All selectors will be arranged as a pipeline.BaseElementSelector CssSelector CSS selector.Html Selectable html.HtmlNode JaxpSelectorUtils Json parse jsonJsonPathSelector JsonPath selector.
Used to extract content from JSON.LinksSelector Links selector based on jsoup.OrSelector All extractors will do extracting separately,
and the results of extractors will combined as the final result.PlainText Selectable plain text.
Can not be selected by XPath or CSS Selector.RegexSelector Selector in regex.ReplaceSelector Replace selector.Selectors Convenient methods for selectors.SmartContentSelector Borrowed from https://code.google.com/p/cx-extractor/Xpath2Selector 支持xpath2.0的选择器。包装了HtmlCleaner和Saxon HE。XpathSelector XPath selector based on Xsoup.