public class FetchFTP extends Processor
Modifier and Type | Class and Description |
---|---|
class |
FetchFTP.SocketFactoryWithTimeout
A
SocketFactory much like DefaultSocketFactory ,
except that the createSocket() methods that open connections support a
connect timeout. |
Modifier and Type | Field and Description |
---|---|
protected String |
digestAlgorithm
Which algorithm (for example MD5 or SHA-1) to use to perform an
on-the-fly digest hash of retrieved content-bodies.
|
protected FetchFTP.SocketFactoryWithTimeout |
socketFactory |
Constructor and Description |
---|
FetchFTP()
Constructs a new
FetchFTP . |
Modifier and Type | Method and Description |
---|---|
String |
getDigestAlgorithm() |
boolean |
getDigestContent() |
boolean |
getExtractFromDirs()
Returns the
extract.from.dirs attribute for this
FetchFTP and the given curi. |
boolean |
getExtractParent()
Returns the
extract.parent attribute for this
FetchFTP and the given curi. |
int |
getMaxFetchKBSec() |
long |
getMaxLengthBytes() |
String |
getPassword() |
int |
getSoTimeoutMs() |
int |
getTimeoutSeconds() |
String |
getUsername() |
protected void |
innerProcess(CrawlURI curi)
Processes the given URI.
|
void |
setDigestAlgorithm(String digestAlgorithm) |
void |
setDigestContent(boolean digest) |
void |
setExtractFromDirs(boolean extractFromDirs) |
void |
setExtractParent(boolean extractParent) |
void |
setMaxFetchKBSec(int rate) |
void |
setMaxLengthBytes(long timeout) |
void |
setPassword(String pw) |
void |
setSoTimeoutMs(int timeout) |
void |
setTimeoutSeconds(int timeout) |
void |
setUsername(String username) |
protected boolean |
shouldProcess(CrawlURI curi)
Determines whether the given uri should be processed by this
processor.
|
doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
protected String digestAlgorithm
protected FetchFTP.SocketFactoryWithTimeout socketFactory
public String getUsername()
public void setUsername(String username)
public String getPassword()
public void setPassword(String pw)
public boolean getExtractFromDirs()
extract.from.dirs
attribute for this
FetchFTP
and the given curi.curi
- the curi whose attribute to returnextract.from.dirs
public void setExtractFromDirs(boolean extractFromDirs)
public boolean getExtractParent()
extract.parent
attribute for this
FetchFTP
and the given curi.curi
- the curi whose attribute to returnextract-parent
public void setExtractParent(boolean extractParent)
public boolean getDigestContent()
public void setDigestContent(boolean digest)
public String getDigestAlgorithm()
public void setDigestAlgorithm(String digestAlgorithm)
public long getMaxLengthBytes()
public void setMaxLengthBytes(long timeout)
public int getMaxFetchKBSec()
public void setMaxFetchKBSec(int rate)
public int getTimeoutSeconds()
public void setTimeoutSeconds(int timeout)
public int getSoTimeoutMs()
public void setSoTimeoutMs(int timeout)
protected boolean shouldProcess(CrawlURI curi)
Processor
shouldProcess
in class Processor
curi
- the URI to testprotected void innerProcess(CrawlURI curi) throws InterruptedException
If the connection is successful, an attempt will be made to CD to the path specified in the URI. If the remote CD command succeeds, then it is assumed that the URI represents a directory. If the CD command fails, then it is assumed that the URI represents a file.
For directories, the directory listing will be fetched using
the FTP LIST command, and saved to the HttpRecorder. If the
extract.from.dirs
attribute is set to true, then
the files in the fetched list will be added to the curi as
extracted FTP links. (It was easier to do that here, rather
than writing a separate FTPExtractor.)
For files, the file will be fetched using the FTP RETR command, and saved to the HttpRecorder.
All file transfers (including directory listings) occur using Binary mode transfer. Also, the local passive transfer mode is always used, to play well with firewalls.
innerProcess
in class Processor
curi
- the curi to processInterruptedException
- if the thread is interrupted during
processingCopyright © 2003-2014 Internet Archive. All Rights Reserved.