Package us.codecraft.webmagic.downloader
Class PhantomJSDownloader
- java.lang.Object
-
- us.codecraft.webmagic.downloader.AbstractDownloader
-
- us.codecraft.webmagic.downloader.PhantomJSDownloader
-
- All Implemented Interfaces:
us.codecraft.webmagic.downloader.Downloader
public class PhantomJSDownloader extends us.codecraft.webmagic.downloader.AbstractDownloader
this downloader is used to download pages which need to render the javascript- Version:
- 0.5.3
- Author:
- dolphineor@gmail.com
-
-
Constructor Summary
Constructors Constructor Description PhantomJSDownloader()
PhantomJSDownloader(java.lang.String phantomJsCommand)
添加新的构造函数,支持phantomjs自定义命令PhantomJSDownloader(java.lang.String phantomJsCommand, java.lang.String crawlJsPath)
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description us.codecraft.webmagic.Page
download(us.codecraft.webmagic.Request request, us.codecraft.webmagic.Task task)
protected java.lang.String
getPage(us.codecraft.webmagic.Request request)
void
setThread(int threadNum)
-
-
-
Constructor Detail
-
PhantomJSDownloader
public PhantomJSDownloader()
-
PhantomJSDownloader
public PhantomJSDownloader(java.lang.String phantomJsCommand)
添加新的构造函数,支持phantomjs自定义命令example: phantomjs.exe 支持windows环境 phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误 /usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
- Parameters:
phantomJsCommand
- phantomJsCommand
-
PhantomJSDownloader
public PhantomJSDownloader(java.lang.String phantomJsCommand, java.lang.String crawlJsPath)
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.jscrawl.js start -- var system = require('system'); var url = system.args[1]; var page = require('webpage').create(); page.settings.loadImages = false; page.settings.resourceTimeout = 5000; page.open(url, function (status) { if (status != 'success') { console.log("HTTP request failed!"); } else { console.log(page.content); } page.close(); phantom.exit(); }); -- crawl.js end
具体项目时可以将以上js代码复制下来使用example: new PhantomJSDownloader("/your/path/phantomjs", "/your/path/crawl.js");
- Parameters:
phantomJsCommand
- phantomJsCommandcrawlJsPath
- crawlJsPath
-
-
Method Detail
-
download
public us.codecraft.webmagic.Page download(us.codecraft.webmagic.Request request, us.codecraft.webmagic.Task task)
-
setThread
public void setThread(int threadNum)
-
getPage
protected java.lang.String getPage(us.codecraft.webmagic.Request request) throws java.lang.Exception
- Throws:
java.lang.Exception
-
-