Dashboard > Search > Home > NutchWAX
Search Log In View a printable version of the current page.
NutchWAX
Added by Aaron Binns, last edited by Aaron Binns on Jul 28, 2008  (view change)
Labels: 
(None)

Welcome to NutchWAX!

NutchWAX is a set of add-ons to Nutch in order to index and search archived web data.

These add-ons are developed and maintained by the Internet Archive Web Team in conjunction with a broad community of contributors, partners and end-users.

The name "NutchWAX" stands for "Nutch (W)eb (A)rchive e(X)tensions".

Since NutchWAX is a set of add-ons to Nutch, you should already be familiar with Nutch before using NutchWAX.

NutchWAX 0.12.x

The latest and greatest version of NutchWAX is 0.12.x. This release is a significant re-write of NutchWAX compared to the 0.10 release. The impetus for the re-write of NutchWAX was two-fold:

  • Catch-up to the latest versions of Nutch and Hadoop. The 0.10 release was tied to older versions of Nutch and Hadoop.
  • Re-factor NutchWAX add-ons to leverage the Nutch plugin framework and public APIs; rather than copy/paste/edit code from Nutch internal classes.

NutchWAX 0.12.x is bundled as a Nutch contrib package; with the goal that eventually all of the NutchWAX plugins and extensions will be integrated into mainline Nutch.

Website changes

Thus far, the focus has been on completing the 0.12 release, in particular the source code and bundled documentation. In the coming weeks the NutchWAX-related web pages will be overhauled. We will likely add a blog to this wiki page with regular updates and notes regarding ongoing NutchWAX development.

So, bookmark this as your NutchWAX homepage.

Download, build and install

At this time, NutchWAX 0.12 is only available via the SourceForge-hosted subversion server. For details on how to download, build and install NutchWAX 0.12, see:

http://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/nutchwax/archive/INSTALL.txt

To get the source for the 0.12.1 release, checkout

http://archive-access.svn.sourceforge.net/svnroot/archive-access/tags/nutchwax-0_12_1

Discussion & Mailing List

Participate in the NutchWAX discussion by subscribing to the archive-access-discuss mailing list at:

http://lists.sourceforge.net/lists/listinfo/archive-access-discuss

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.10 Build:#528 Nov 29, 2006) - Bug/feature request - Contact Administrators