Package us.codecraft.webmagic.downloader
Class PhantomJSDownloader
- java.lang.Object
-
- us.codecraft.webmagic.downloader.AbstractDownloader
-
- us.codecraft.webmagic.downloader.PhantomJSDownloader
-
- All Implemented Interfaces:
Downloader
public class PhantomJSDownloader extends AbstractDownloader
this downloader is used to download pages which need to render the javascript- Version:
- 0.5.3
- Author:
- dolphineor@gmail.com
-
-
Constructor Summary
Constructors Constructor Description PhantomJSDownloader()
PhantomJSDownloader(java.lang.String phantomJsCommand)
添加新的构造函数,支持phantomjs自定义命令PhantomJSDownloader(java.lang.String phantomJsCommand, java.lang.String crawlJsPath)
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Page
download(Request request, Task task)
Downloads web pages and store in Page object.protected java.lang.String
getPage(Request request)
void
setThread(int threadNum)
Tell the downloader how many threads the spider used.
-
-
-
Constructor Detail
-
PhantomJSDownloader
public PhantomJSDownloader()
-
PhantomJSDownloader
public PhantomJSDownloader(java.lang.String phantomJsCommand)
添加新的构造函数,支持phantomjs自定义命令example: phantomjs.exe 支持windows环境 phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误 /usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
- Parameters:
phantomJsCommand
- phantomJsCommand
-
PhantomJSDownloader
public PhantomJSDownloader(java.lang.String phantomJsCommand, java.lang.String crawlJsPath)
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.jscrawl.js start -- var system = require('system'); var url = system.args[1]; var page = require('webpage').create(); page.settings.loadImages = false; page.settings.resourceTimeout = 5000; page.open(url, function (status) { if (status != 'success') { console.log("HTTP request failed!"); } else { console.log(page.content); } page.close(); phantom.exit(); }); -- crawl.js end
具体项目时可以将以上js代码复制下来使用example: new PhantomJSDownloader("/your/path/phantomjs", "/your/path/crawl.js");
- Parameters:
phantomJsCommand
- phantomJsCommandcrawlJsPath
- crawlJsPath
-
-
Method Detail
-
download
public Page download(Request request, Task task)
Description copied from interface:Downloader
Downloads web pages and store in Page object.- Parameters:
request
- requesttask
- task- Returns:
- page
-
setThread
public void setThread(int threadNum)
Description copied from interface:Downloader
Tell the downloader how many threads the spider used.- Parameters:
threadNum
- number of threads
-
getPage
protected java.lang.String getPage(Request request) throws java.lang.Exception
- Throws:
java.lang.Exception
-
-