public class TextSeedModule extends SeedModule implements org.archive.io.ReadSource
Modifier and Type | Field and Description |
---|---|
protected int |
blockAwaitingSeedLines
Number of lines of seeds-source to read on initial load before proceeding
with crawl.
|
protected org.archive.io.ReadSource |
textSource
Text from which to extract seeds
|
seedListeners, sourceTagSeeds
Constructor and Description |
---|
TextSeedModule() |
Modifier and Type | Method and Description |
---|---|
void |
actOn(File f)
Treat the given file as a source of additional seeds,
announcing to SeedListeners.
|
void |
addSeed(CrawlURI curi)
Add a new seed to scope.
|
void |
announceSeeds()
Announce all seeds from configured source to SeedListeners
(including nonseed lines mixed in).
|
protected void |
announceSeeds(CountDownLatch latchOrNull) |
protected void |
announceSeedsFromReader(BufferedReader reader,
CountDownLatch latchOrNull)
Announce all seeds (and nonseed possible-directive lines) from
the given Reader
|
int |
getBlockAwaitingSeedLines() |
org.archive.io.ReadSource |
getTextSource() |
protected void |
nonseedLine(String line)
Handle a read line that is not a seed, but may still have
meaning to seed-consumers (such as scoping beans).
|
Reader |
obtainReader() |
protected void |
seedLine(String uri)
Handle a read line that is probably a seed.
|
void |
setBlockAwaitingSeedLines(int blockAwaitingSeedLines) |
void |
setTextSource(org.archive.io.ReadSource seedsSource) |
addSeedListener, getSeedListeners, getSourceTagSeeds, publishAddedSeed, publishConcludedSeedBatch, publishNonSeedLine, setSeedListeners, setSourceTagSeeds
protected org.archive.io.ReadSource textSource
protected int blockAwaitingSeedLines
public org.archive.io.ReadSource getTextSource()
public void setTextSource(org.archive.io.ReadSource seedsSource)
public int getBlockAwaitingSeedLines()
public void setBlockAwaitingSeedLines(int blockAwaitingSeedLines)
public void announceSeeds()
announceSeeds
in class SeedModule
SeedModule.announceSeeds()
protected void announceSeeds(CountDownLatch latchOrNull)
protected void announceSeedsFromReader(BufferedReader reader, CountDownLatch latchOrNull)
reader
- source of seed/directive lineslatchOrNull
- if non-null, sent countDown after each line, allowing
another thread to proceed after a configurable number of lines processedprotected void seedLine(String uri)
uri
- String seed-containing lineprotected void nonseedLine(String line)
uri
- String seed-containing linepublic void actOn(File f)
actOn
in class SeedModule
SeedModule.actOn(java.io.File)
public void addSeed(CrawlURI curi)
This method is *not* sufficient to get the new seed scheduled in the Frontier for crawling -- it only affects the Scope's seed record (and decisions which flow from seeds).
addSeed
in class SeedModule
curi
- CandidateUri to addpublic Reader obtainReader()
obtainReader
in interface org.archive.io.ReadSource
Copyright © 2003-2014 Internet Archive. All Rights Reserved.