Package us.codecraft.webmagic
Class Page
- java.lang.Object
-
- us.codecraft.webmagic.Page
-
public class Page extends java.lang.Object
Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
getUrl()
get url of current page
getHtml()
get content of current page
putField(String, Object)
save extracted result
getResultItems()
get extract results to be used inPipeline
addTargetRequests(Iterable)
addTargetRequest(String)
add urls to fetch- Since:
- 0.1.0
- Author:
- code4crafter@gmail.com
- See Also:
Downloader
,PageProcessor
-
-
Constructor Summary
Constructors Constructor Description Page()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addTargetRequest(java.lang.String requestString)
add url to fetchvoid
addTargetRequest(Request request)
add requests to fetchvoid
addTargetRequests(java.lang.Iterable<java.lang.String> requests)
add urls to fetchvoid
addTargetRequests(java.lang.Iterable<java.lang.String> requests, long priority)
add urls to fetchstatic Page
fail()
Deprecated.Usefail(Request)
instead.static Page
fail(Request request)
byte[]
getBytes()
java.lang.String
getCharset()
java.util.Map<java.lang.String,java.util.List<java.lang.String>>
getHeaders()
Html
getHtml()
get html content of pageJson
getJson()
get json content of pagejava.lang.String
getRawText()
Request
getRequest()
get request of current pageResultItems
getResultItems()
int
getStatusCode()
java.util.List<Request>
getTargetRequests()
Selectable
getUrl()
get url of current pageboolean
isDownloadSuccess()
void
putField(java.lang.String key, java.lang.Object field)
store extract resultsvoid
setBytes(byte[] bytes)
void
setCharset(java.lang.String charset)
void
setDownloadSuccess(boolean downloadSuccess)
void
setHeaders(java.util.Map<java.lang.String,java.util.List<java.lang.String>> headers)
void
setHtml(Html html)
Deprecated.since 0.4.0 The html is parse just when first time of callinggetHtml()
, so usesetRawText(String)
instead.Page
setRawText(java.lang.String rawText)
void
setRequest(Request request)
Page
setSkip(boolean skip)
void
setStatusCode(int statusCode)
void
setUrl(Selectable url)
java.lang.String
toString()
-
-
-
Method Detail
-
fail
@Deprecated public static Page fail()
Deprecated.Usefail(Request)
instead.- Returns:
- the page.
-
fail
public static Page fail(Request request)
- Parameters:
request
- theRequest
.- Returns:
- the page.
- Since:
- 0.10.0
-
setSkip
public Page setSkip(boolean skip)
-
putField
public void putField(java.lang.String key, java.lang.Object field)
store extract results- Parameters:
key
- keyfield
- field
-
getHtml
public Html getHtml()
get html content of page- Returns:
- html
-
getJson
public Json getJson()
get json content of page- Returns:
- json
- Since:
- 0.5.0
-
setHtml
@Deprecated public void setHtml(Html html)
Deprecated.since 0.4.0 The html is parse just when first time of callinggetHtml()
, so usesetRawText(String)
instead.- Parameters:
html
- html
-
getTargetRequests
public java.util.List<Request> getTargetRequests()
-
addTargetRequests
public void addTargetRequests(java.lang.Iterable<java.lang.String> requests)
add urls to fetch- Parameters:
requests
- requests
-
addTargetRequests
public void addTargetRequests(java.lang.Iterable<java.lang.String> requests, long priority)
add urls to fetch- Parameters:
requests
- requestspriority
- priority
-
addTargetRequest
public void addTargetRequest(java.lang.String requestString)
add url to fetch- Parameters:
requestString
- requestString
-
addTargetRequest
public void addTargetRequest(Request request)
add requests to fetch- Parameters:
request
- request
-
getUrl
public Selectable getUrl()
get url of current page- Returns:
- url of current page
-
setUrl
public void setUrl(Selectable url)
-
getRequest
public Request getRequest()
get request of current page- Returns:
- request
-
setRequest
public void setRequest(Request request)
-
getResultItems
public ResultItems getResultItems()
-
getStatusCode
public int getStatusCode()
-
setStatusCode
public void setStatusCode(int statusCode)
-
getRawText
public java.lang.String getRawText()
-
setRawText
public Page setRawText(java.lang.String rawText)
-
getHeaders
public java.util.Map<java.lang.String,java.util.List<java.lang.String>> getHeaders()
-
setHeaders
public void setHeaders(java.util.Map<java.lang.String,java.util.List<java.lang.String>> headers)
-
isDownloadSuccess
public boolean isDownloadSuccess()
-
setDownloadSuccess
public void setDownloadSuccess(boolean downloadSuccess)
-
getBytes
public byte[] getBytes()
-
setBytes
public void setBytes(byte[] bytes)
-
getCharset
public java.lang.String getCharset()
-
setCharset
public void setCharset(java.lang.String charset)
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-