- Created by Anton Kronseder, last modified by Robert Reiner on 02. Mar 2015
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 2 Next »
Short Description | Documents all important design decisions and their reasons. |
---|---|
Name | Design Decisions |
Iteration | Filled |
Design Decisions
HTML Parsing with jsoup
Details
To check HTML we parse it into an internal (DOM-like) representation. For this task we use jsoup HTML parser, an open-source parser without external dependencies.
To quote from the jsoup website:
Find details on how HtmlSC implements HTML parsing in the HTML encapsulation concept.jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Relevance
Check HTML programatically by using an existing API that provides access and finder methods to the DOM-tree of the file(s) to be checked.Requirements
- few dependencies, so the HtmlSC binary stays as small as possible.
- accessor and finder methods to find images, links and link-targets within the DOM tree.
Alternatives
- HTTPUnit: a testing framework for web applications and -sites. Its main focus is web testing and it suffers from a large number of dependencies.
- jsoup: a plain HTML parser without any dependencies (!) and a rich api to access all HTML elements in DOM-like syntax.
Checking of external links postponed
Details
In the current {revision} we won’t check external links. These checks have been postponed to later versions.String Similarity Checking with Jaro-Winkler-Distance
Details
The small java-string-similarity library (by Ralph Allen Rice) contains implementations of several similarity-calculation algorithms. As it is not available as public binary, we use the sources instead, primarily:
net.ricecode.similarity.JaroWinklerStrategyTest
net.ricecode.similarity.JaroWinklerStrategy
The actual implementation of the similarity comparison has been postponed to a later release of HtmlSC
HTML Parsing with jsoup
Problem
Details
To check HTML we parse it into an internal (DOM-like) representation. For this task we use jsoup HTML parser, an open-source parser without external dependencies.
To quote from the jsoup website:
Find details on how HtmlSC implements HTML parsing in the HTML encapsulation concept.jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Relevance
Check HTML programatically by using an existing API that provides access and finder methods to the DOM-tree of the file(s) to be checked.Problem Constraints
Requirements
- few dependencies, so the HtmlSC binary stays as small as possible.
- accessor and finder methods to find images, links and link-targets within the DOM tree.
Alternatives
- HTTPUnit: a testing framework for web applications and -sites. Its main focus is web testing and it suffers from a large number of dependencies.
- jsoup: a plain HTML parser without any dependencies (!) and a rich api to access all HTML elements in DOM-like syntax.
Resources
- Find details on how HtmlSC implements HTML parsing in the HTML encapsulation concept.
Checking of external links postponed
Problem
Details
In the current {revision} we won’t check external links. These checks have been postponed to later versions.String Similarity Checking with Jaro-Winkler-Distance
Problem
Details
The small java-string-similarity library (by Ralph Allen Rice) contains implementations of several similarity-calculation algorithms. As it is not available as public binary, we use the sources instead, primarily:
net.ricecode.similarity.JaroWinklerStrategyTest
net.ricecode.similarity.JaroWinklerStrategy
The actual implementation of the similarity comparison has been postponed to a later release of HtmlSC
- No labels