How to develop Apache Nutch’s plugin (1) extension-points list

I’ll want to substitute Apache Nutch’s LanguageIdentifier for our Language Detection library, so I’m summerizing researched information while developing.

extension-points

“Extension-points” are and they are interfaces that implement org.apache.nutch.plugin.Pluggable.

list of extension-points

Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
No documents. Summarizer of search results?

Nutch Protocol (org.apache.nutch.protocol.Protocol)
Provides the way of fetching contents from URL for each protocol (http, ftp and so on)

Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)
(I don’t still know what’s “segment merging.”)

Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
Extracts words from documents.

Nutch Field Filter (org.apache.nutch.indexer.field.FieldFilter)
Adds fields into documents(equals to abstracted IndexingFilter????)

HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
Adds meta data into parsed document of HTML

Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
Translates a query(Adds meta data)

Nutch Search Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter)
Provide vaious output formats for search results.

Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
Converts URLs into normal form(e.g. http://www.cnn.com into cnn.com ?)

Nutch URL Filter (org.apache.nutch.net.URLFilter)
Limits fetching URLs (for excluding SPAM page and so on)

Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
I understand it vaguely. Groups search results which have the same domain or similar properties (like Google?).

Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
Adds meta data into document’s fields for index.

Nutch Content Parser (org.apache.nutch.parse.Parser)
Extract text data from contents.

Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
Manipulates documents’ score variables.

Ontology Model Loader (org.apache.nutch.ontology.Ontology)
No documents.

Advertisements
This entry was posted in Nutch, Search Engine. Bookmark the permalink.

One Response to How to develop Apache Nutch’s plugin (1) extension-points list

  1. Pingback: Language Detection Plugin for Apache Nutch | Shuyo's Weblog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s