You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Short Description
To check HTML we parse it into an internal (DOM-like) representation. For this task we use jsoup HTML parser, an open-source parser without external dependencies.
Iteration
Facade

Problem

Details

To check HTML we parse it into an internal (DOM-like) representation. For this task we use jsoup HTML parser, an open-source parser without external dependencies.

To quote from the jsoup website:

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Relevance

Check HTML programatically by using an existing API that provides access and finder methods to the DOM-tree of the file(s) to be checked.

Problem Constraints

Requirements

  • few dependencies, so the HtmlSC binary stays as small as possible.
  • accessor and finder methods to find images, links and link-targets within the DOM tree.

Alternatives

  • HTTPUnit: a testing framework for web applications and -sites. Its main focus is web testing and it suffers from a large number of dependencies.
  • jsoup: a plain HTML parser without any dependencies (!) and a rich api to access all HTML elements in DOM-like syntax.

Resources

  • No labels