Class UrlUtils

java.lang.Object
us.codecraft.webmagic.utils.UrlUtils

public class UrlUtils extends Object
url and html utils.
Since:
0.1.0
Author:
code4crafter@gmail.com
  • Constructor Details

    • UrlUtils

      public UrlUtils()
  • Method Details

    • canonicalizeUrl

      public static String canonicalizeUrl(String url, String refer)
      canonicalizeUrl
      Borrowed from Jsoup.
      Parameters:
      url - url
      refer - refer
      Returns:
      canonicalizeUrl
    • encodeIllegalCharacterInUrl

      public static String encodeIllegalCharacterInUrl(String url)
      Deprecated.
      Parameters:
      url - url
      Returns:
      new url
    • fixIllegalCharacterInUrl

      public static String fixIllegalCharacterInUrl(String url)
    • getHost

      public static String getHost(String url)
    • removeProtocol

      public static String removeProtocol(String url)
    • getDomain

      public static String getDomain(String url)
    • removePort

      public static String removePort(String domain)
    • convertToRequests

      public static List<Request> convertToRequests(Collection<String> urls)
    • convertToUrls

      public static List<String> convertToUrls(Collection<Request> requests)
    • getCharset

      public static String getCharset(String contentType)