Package | Description |
---|---|
org.archive.crawler.framework | |
org.archive.crawler.frontier | |
org.archive.crawler.postprocessor | |
org.archive.crawler.prefetch | |
org.archive.crawler.reporting | |
org.archive.modules |
The beginnings of a refactored settings framework.
|
org.archive.modules.credential |
Contains html form login and basic and digest credentials
used by Heritrix logging into sites.
|
org.archive.modules.deciderules | |
org.archive.modules.fetcher | |
org.archive.modules.net | |
org.archive.modules.writer |
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
RobotsPolicy
RobotsPolicy represents the strategy used by the crawler
for determining how robots.txt files will be honored.
|
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
CrawlHost
Represents a single remote "host".
|
CrawlServer
Represents a single remote "server".
|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
CrawlHost
Represents a single remote "host".
|
CrawlServer
Represents a single remote "server".
|
RobotsDirectives
Represents the directives that apply to a user-agent (or set of
user-agents)
|
RobotsPolicy
RobotsPolicy represents the strategy used by the crawler
for determining how robots.txt files will be honored.
|
Robotstxt
Utility class for parsing and representing 'robots.txt' format
directives, into a list of named user-agents and map from user-agents
to RobotsDirectives.
|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Class and Description |
---|
ServerCache
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
|
Copyright © 2003-2014 Internet Archive. All Rights Reserved.