How to develop Apache Nutch’s plugin (2) plugin.xml

To be enable plugins is to set them into $NUTCH_HOME/plugins/ and to add the plugin names into conf/nutch-site.xml.

$NUTCH_HOME/plugins/
    [plugin name]/
        plugin.xml       plugin information xml file
        [some name].jar  plugin implemented jar file

The “plugin.includes” property has enables plugin names in conf/nutch-site.xml, so write the default plugin names in conf/nutch-default.xml and your plugin names.

Example: Default enabled plugins

<property>
  <name>plugin.includes</name>
  <value>protocol-http|urlfilter-regex|parse-(text|html|js|tika)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>

How to prepare plugin.xml

“plugin.xml” provides the implemented jar file name, extension-points, classes and parameters.
An extension is an implement of extension-points, and a plugin has more than one extension.

The implemented jar file name is written at plugin/runtime node.

    <runtime>
      <library name="language-identifier.jar">
         <export name="*"/>
      </library>
   </runtime>

The information of extensions are written at plugin/extension nodes.

   <extension id="org.apache.nutch.analysis.lang.LanguageQueryFilter"
              name="Nutch Language Query Filter"
              point="org.apache.nutch.searcher.QueryFilter">
      <implementation id="LanguageQueryFilter"
                      class="org.apache.nutch.analysis.lang.LanguageQueryFilter">
        <parameter name="raw-fields" value="lang"/>
      </implementation>
   </extension>

The following is about each node and attribute.

extension@id
The id of the extension which is equals to the class name normally.
extension@point
The extension-point name which is also an interface name.
extension/implementation@id
The id of the extension implementation. (I don’t yet know the difference between extension@id and it…)
extension/implementation@class
The class name of the extension implementation.
extension/implementation/palameter
Parameters of the extension. They can be referred at Nutch’s plugin framework, but not at plugin itself. If you want to handle some parameters in plugins, write the parameters in conf/nutch-site.xml .
Advertisements
This entry was posted in Nutch, Search Engine. Bookmark the permalink.

One Response to How to develop Apache Nutch’s plugin (2) plugin.xml

  1. Pingback: Language Detection Plugin for Apache Nutch | Shuyo's Weblog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s