public abstract class FPMergeUriUniqFilter extends Object implements UriUniqFilter
Modifier and Type | Class and Description |
---|---|
class |
FPMergeUriUniqFilter.PendingItem
Represents a long fingerprint and (possibly) its corresponding
CrawlURI, awaiting the next merge in a 'pending' state.
|
UriUniqFilter.CrawlUriReceiver
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_PENDING |
static long |
FLUSH_DELAY_FACTOR |
protected int |
maxPending
size at which to force flush of pending items
|
protected long |
mergeDupAtLast |
protected long |
mergeDuplicateCount |
protected long |
nextFlushAllowableAfter
time-based throttle on flush-merge operations
|
protected long |
pendDupAtLast |
protected long |
pendDuplicateCount |
protected TreeSet<FPMergeUriUniqFilter.PendingItem> |
pendingSet
items awaiting merge
TODO: consider only sorting just pre-merge
TODO: consider using a fastutil long->Object class
TODO: consider actually writing items to disk file,
as in Najork/Heydon
|
protected PrintWriter |
profileLog |
protected ArrayLongFPCache |
quickCache
cache of most recently seen FPs
|
protected long |
quickDupAtLast |
protected long |
quickDuplicateCount |
protected UriUniqFilter.CrawlUriReceiver |
receiver |
Constructor and Description |
---|
FPMergeUriUniqFilter() |
Modifier and Type | Method and Description |
---|---|
void |
add(String key,
CrawlURI value)
Add given uri, if not already present.
|
void |
addForce(String key,
CrawlURI value)
Add given uri, all the way through to underlying destination, even
if already present.
|
protected abstract void |
addNewFp(long fp)
Add an FP (which may be an old or new FP) to the new complete
list.
|
void |
addNow(String key,
CrawlURI value)
Immediately add uri.
|
protected abstract it.unimi.dsi.fastutil.longs.LongIterator |
beginFpMerge()
Begin merging pending candidates with complete list.
|
void |
close()
Close down any allocated resources.
|
static long |
createFp(CharSequence key)
Create a fingerprint from the given key
|
protected abstract void |
finishFpMerge()
Complete the merge of candidate and previously-known FPs (closing
files/iterators as appropriate).
|
long |
flush()
Perform a merge of all 'pending' items to the overall fingerprint list.
|
void |
forget(String key,
CrawlURI value)
Forget item was seen
|
void |
note(String key)
Note item as seen, without passing through to receiver.
|
protected void |
pend(long fp,
CrawlURI value)
Place the given FP/CrawlURI pair into the pending set, awaiting
a merge to determine if it's actually accepted.
|
long |
pending()
Count of items added, but not yet filtered in or out.
|
protected void |
profileLog(String key) |
long |
requestFlush()
Request that any pending items be added/dropped.
|
void |
setDestination(UriUniqFilter.CrawlUriReceiver receiver)
Receiver of uniq URIs.
|
void |
setMaxPending(int max) |
void |
setProfileLog(File logfile)
Set a File to receive a log for replay profiling.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
count
protected UriUniqFilter.CrawlUriReceiver receiver
protected PrintWriter profileLog
protected long quickDuplicateCount
protected long quickDupAtLast
protected long pendDuplicateCount
protected long pendDupAtLast
protected long mergeDuplicateCount
protected long mergeDupAtLast
protected TreeSet<FPMergeUriUniqFilter.PendingItem> pendingSet
protected int maxPending
public static final int DEFAULT_MAX_PENDING
protected long nextFlushAllowableAfter
public static final long FLUSH_DELAY_FACTOR
protected ArrayLongFPCache quickCache
public void setMaxPending(int max)
public long pending()
UriUniqFilter
pending
in interface UriUniqFilter
public void setDestination(UriUniqFilter.CrawlUriReceiver receiver)
UriUniqFilter
setDestination
in interface UriUniqFilter
receiver
- Object that will be passed items. Must implement
HasUriReceiver interface.protected void profileLog(String key)
public void add(String key, CrawlURI value)
UriUniqFilter
add
in interface UriUniqFilter
key
- Usually a canonicalized version of value
.
This is the key used doing lookups, forgets and insertions on the
already included list.value
- item to add.protected void pend(long fp, CrawlURI value)
fp
- long fingerprintvalue
- CrawlURI or null, if fp only needs merging (as when
CrawlURI was already forced inpublic static long createFp(CharSequence key)
key
- CharSequence (URI) to fingerprintpublic void addNow(String key, CrawlURI value)
UriUniqFilter
addNow
in interface UriUniqFilter
key
- Usually a canonicalized version of uri
.
This is the key used doing lookups, forgets and insertions on the
already included list.value
- item to add.public void addForce(String key, CrawlURI value)
UriUniqFilter
addForce
in interface UriUniqFilter
key
- Usually a canonicalized version of uri
.
This is the key used doing lookups, forgets and insertions on the
already included list.value
- item to add.public void note(String key)
UriUniqFilter
note
in interface UriUniqFilter
key
- Usually a canonicalized version of an URI
.
This is the key used doing lookups, forgets and insertions on the
already included list.public void forget(String key, CrawlURI value)
UriUniqFilter
forget
in interface UriUniqFilter
key
- Usually a canonicalized version of an URI
.
This is the key used doing lookups, forgets and insertions on the
already included list.value
- item to add.public long requestFlush()
UriUniqFilter
requestFlush
in interface UriUniqFilter
public long flush()
protected abstract it.unimi.dsi.fastutil.longs.LongIterator beginFpMerge()
protected abstract void addNewFp(long fp)
fp
- the FP to addprotected abstract void finishFpMerge()
public void close()
UriUniqFilter
close
in interface UriUniqFilter
public void setProfileLog(File logfile)
UriUniqFilter
setProfileLog
in interface UriUniqFilter
Copyright © 2003-2014 Internet Archive. All Rights Reserved.