Package | Description |
---|---|
org.archive.modules.extractor |
Modifier and Type | Class and Description |
---|---|
class |
AggressiveExtractorHTML
Extended version of ExtractorHTML with more aggressive javascript link
extraction where javascript code is parsed first with general HTML tags
regex, and than by javascript speculative link regex.
|
class |
JerichoExtractorHTML
Improved link-extraction from an HTML content-body using jericho-html parser.
|
Copyright © 2003-2014 Internet Archive. All Rights Reserved.