HTML Parser - The Bio One

HTML and XML parsing for the masses

Project Description

HTML Parser - The Bio One is a minimallistic open source HTML parsing library, implemented in Java 5.0. Our goal is not only implementing a practically usable HTML parser. There are tons of such programs on the Internet. We are trying to achieve simple and easy to understand HTML and XML parsing. Therefore the document object model is kept as simple as possible. All this makes the parser suitable for educational purposes and also for using it as a base for a custom parser.

License

This software is licensed under the terms of the General Public License.

Features

Requirements

For users For developers

Download

You can download the HTML parser's source, binaries and tests here.

CVS

The source code repository information is available here and the repository can be browsed via HTTP here.

Documentation

Project's javadocs can be accessed here.

Quickstart Example

import com.bioinformatixx.htmlparser.*;
import com.bioinformatixx.htmlparser.dom.*;

class Main
{
	public static void main(String [] args)
	{
		Parser parser = new Parser("<html><body>Hello, World!</body></html>");
		ArrayList<SimpleNode> rootElements = parser.parseHtml();
		Node html = (Node)rootElements.get(0);
		Node body = (Node)html.getChildren().get(0);
		TextNode bodyInnerText = (TextNode)body.getChildren().get(0);
		System.out.println(bodyInnerText.getText());
	}
}

Contributing

If you are interested in development of HTML Parser - The Bio One, the source code is available in the CVS tree of the project.
In order to contribute source code, you have to be added to the active developers of the project. Please, contact the project administrator in this case.
The development and test process is supported by the Eclipse IDE and the JUnit test framework.

Contacts

Project administrator - v_bachvarov (at-no-spam) users.sourceforge.net

Links

Project summary page at Sourceforge.net
Project download section
Java web site
JUnit web site
The Eclipse project