You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
Version 1
Next »
Problem
Details
To check HTML we parse it into an internal (DOM-like) representation. For this task we use jsoup HTML parser, an open-source parser without external dependencies.
To quote from the jsoup website:
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Relevance
Check HTML programatically by using an existing API that provides access and finder methods to the DOM-tree of the file(s) to be checked.Problem Constraints
Requirements
- few dependencies, so the HtmlSC binary stays as small as possible.
- accessor and finder methods to find images, links and link-targets within the DOM tree.
Alternatives
- HTTPUnit: a testing framework for web applications and -sites. Its main focus is web testing and it suffers from a large number of dependencies.
- jsoup: a plain HTML parser without any dependencies (!) and a rich api to access all HTML elements in DOM-like syntax.