public abstract class BaseParser
extends java.lang.Object
implements java.io.Serializable
Constructor and Description |
---|
BaseParser(ParserPolicy policy) |
Modifier and Type | Method and Description |
---|---|
protected java.lang.String |
getCharset(FetchedDatum datum)
Extract encoding from content-type
If a charset is returned, then it's a valid/normalized charset name that's
supported on this platform.
|
protected java.net.URL |
getContentLocation(FetchedDatum fetchedDatum)
Figure out the right base URL to use, for when we need to resolve relative URLs.
|
protected java.lang.String |
getLanguage(FetchedDatum fetchedDatum,
java.lang.String charset)
Extract language from (first) explicit header
|
ParserPolicy |
getParserPolicy() |
abstract ParsedDatum |
parse(FetchedDatum fetchedDatum) |
public BaseParser(ParserPolicy policy)
public ParserPolicy getParserPolicy()
public abstract ParsedDatum parse(FetchedDatum fetchedDatum) throws java.lang.Exception
java.lang.Exception
protected java.lang.String getCharset(FetchedDatum datum)
datum
- protected java.lang.String getLanguage(FetchedDatum fetchedDatum, java.lang.String charset)
fetchedDatum
- charset
- protected java.net.URL getContentLocation(FetchedDatum fetchedDatum) throws java.net.MalformedURLException
fetchedDatum
- java.net.MalformedURLException
Copyright © 2012 Bixo Labs