Package | Description |
---|---|
org.archive.crawler.frontier | |
org.archive.crawler.postprocessor | |
org.archive.crawler.reporting | |
org.archive.modules |
The beginnings of a refactored settings framework.
|
org.archive.modules.extractor | |
org.archive.modules.forms | |
org.archive.modules.net |
Class and Description |
---|
ExtractorParameters
Bean interface for parameters consulted by multiple Extractors, and
thus provided by some shared object.
|
Class and Description |
---|
Link
Link represents one discovered "edge" of the web graph: the source
URI, the destination URI, and the type of reference (represented by the
context in which it was found).
|
Class and Description |
---|
UriErrorLoggerModule |
Class and Description |
---|
Hop
The kind of "hop" from one URI to another.
|
Link
Link represents one discovered "edge" of the web graph: the source
URI, the destination URI, and the type of reference (represented by the
context in which it was found).
|
LinkContext
The context of link discovery.
|
Class and Description |
---|
ContentExtractor
Extracts link from the fetched content of a URI, as opposed to its headers.
|
ContentExtractorTestBase
Abstract base class for unit testing ContentExtractor implementations.
|
Extractor
Extracts links from fetched URIs.
|
ExtractorHTML
Basic link-extraction, from an HTML content-body,
using regular expressions.
|
ExtractorJS
Processes Javascript files for strings that are likely to be
crawlable URIs.
|
ExtractorMultipleRegex.GroupList |
ExtractorMultipleRegex.MatchList |
ExtractorParameters
Bean interface for parameters consulted by multiple Extractors, and
thus provided by some shared object.
|
Hop
The kind of "hop" from one URI to another.
|
HTMLLinkContext
XPath-like context for HTML discovered URIs.
|
Link
Link represents one discovered "edge" of the web graph: the source
URI, the destination URI, and the type of reference (represented by the
context in which it was found).
|
LinkContext
The context of link discovery.
|
StringExtractorTestBase.TestData |
UriErrorLoggerModule |
Class and Description |
---|
Extractor
Extracts links from fetched URIs.
|
UriErrorLoggerModule |
Class and Description |
---|
TempDirProvider |
Copyright © 2003-2014 Internet Archive. All Rights Reserved.