skip to Main Content

So I’m trying to do parsing of a javascript generated resource on the server side for SEO optimization. I’m following the example of using HTMLUnit on a java based server that google provides here.

We’re currently hosted using app-engine but I’m finding when calling

final WebClient webClient = new WebClient();

I always receive this exception, anyone have any ideas:

java.lang.ArrayStoreException: com.gargoylesoftware.htmlunit.httpclient.HtmlUnitDomainHandler
    at com.gargoylesoftware.htmlunit.httpclient.HtmlUnitBrowserCompatCookieSpec.<init>(HtmlUnitBrowserCompatCookieSpec.java:101)
    at com.gargoylesoftware.htmlunit.CookieManager.<init>(CookieManager.java:56)
    at com.gargoylesoftware.htmlunit.WebClient.<init>(WebClient.java:141)
    at com.gargoylesoftware.htmlunit.WebClient.<init>(WebClient.java:202)
    at filters.CrawlServlet.doFilter(CrawlServlet.java:38)

2

Answers


  1. I tested with HtmlUnit 2.16 and AppEngine and it works here.

    With a sample project, copying the 2.16 jars to war/WEB-INF/lib, and having:

    @SuppressWarnings("serial")
    public class GuestbookServlet extends HttpServlet {
        public void doGet(HttpServletRequest req, HttpServletResponse resp)
                throws IOException {
            resp.setContentType("text/plain");
            try (WebClient webClient = new WebClient()) {
                final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
                resp.getWriter().println(page.getTitleText());           
            }
        }
    }
    
    Login or Signup to reply.
  2. This should be a httpclient version dependency problem, for HTMLUnit 2.16 you should use httpclient 4.4.1

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search