- generateScore(String) - Method in class bixo.operations.BaseScoreGenerator
-
- generateScore(String, String, GroupedUrlDatum) - Method in class bixo.operations.BaseScoreGenerator
-
- generateScore(String, String, String) - Method in class bixo.operations.BaseScoreGenerator
-
Return score for URL, based on domain & URL path
- generateScore(String, String, String) - Method in class bixo.operations.FixedScoreGenerator
-
- get(ScoredUrlDatum) - Method in class bixo.fetcher.BaseFetcher
-
- get(ScoredUrlDatum) - Method in class bixo.fetcher.LoggingFetcher
-
- get(ScoredUrlDatum) - Method in class bixo.fetcher.SimpleHttpFetcher
-
- get() - Method in class bixo.hadoop.DiskBytesWritable
-
Get the data from the BytesWritable.
- getAbortReason() - Method in exception bixo.exceptions.AbortedFetchException
-
- getAcceptedIssuers() - Method in class bixo.fetcher.DummyX509TrustManager
-
- getAcceptEncoding() - Method in class bixo.fetcher.SimpleHttpFetcher
-
Return the current value used for the ACCEPT-ENCODING request parameter.
- getAcceptLanguage() - Method in class bixo.config.FetcherPolicy
-
- getActiveCount() - Method in class bixo.utils.ThreadedExecutor
-
Return number of active threads
- getAgentName() - Method in class bixo.config.UserAgent
-
- getAll(String) - Method in class bixo.datum.HttpHeaders
-
- getAnchor() - Method in class bixo.datum.Outlink
-
- getBaseUrl() - Method in class bixo.datum.ContentDatum
-
Return the original URL - use the UrlDatum support for this.
- getBaseUrl() - Method in class bixo.fetcher.FetchedResult
-
- getBooleanProperty(String) - Method in class bixo.config.BixoPlatform
-
- getBytes() - Method in class bixo.datum.ContentBytes
-
- getCapacity() - Method in class bixo.hadoop.DiskBytesWritable
-
Get the capacity, which is the maximum size that could handled without
resizing the backing storage.
- getCause() - Method in exception bixo.exceptions.BaseFetchException
-
- getCharset(FetchedDatum) - Method in class bixo.parser.BaseParser
-
Extract encoding from content-type
If a charset is returned, then it's a valid/normalized charset name that's
supported on this platform.
- getCharsetFromContentType(String) - Static method in class bixo.utils.HttpUtils
-
- getConf() - Method in class bixo.hadoop.HadoopConfigured
-
- getConnectionTimeout() - Method in class bixo.fetcher.SimpleHttpFetcher
-
- getContent() - Method in class bixo.fetcher.FetchedResult
-
- getContent() - Method in class bixo.parser.BaseContentExtractor
-
- getContent() - Method in class bixo.parser.BoilerpipeContentExtractor
-
getContent returns the boilerpipe extracted text.
- getContent() - Method in class bixo.parser.HtmlContentExtractor
-
- getContent() - Method in class bixo.parser.SimpleContentExtractor
-
- getContentBytes() - Method in class bixo.datum.ContentDatum
-
- getContentBytes() - Method in class bixo.datum.FetchedDatum
-
- getContentLength() - Method in class bixo.datum.ContentDatum
-
- getContentLength() - Method in class bixo.datum.FetchedDatum
-
- getContentLocation(FetchedDatum) - Method in class bixo.parser.BaseParser
-
Figure out the right base URL to use, for when we need to resolve relative URLs.
- getContentTailPipe() - Method in class bixo.pipes.FetchPipe
-
- getContentType() - Method in class bixo.datum.ContentDatum
-
- getContentType() - Method in class bixo.datum.FetchedDatum
-
- getContentType() - Method in class bixo.fetcher.FetchedResult
-
- getCrawlDelay() - Method in class bixo.config.FakeUserFetcherPolicy
-
- getCrawlDelay() - Method in class bixo.config.FetcherPolicy
-
Deprecated.
- getCrawlDelay() - Method in class bixo.robots.BaseRobotRules
-
- getCrawlDelayFromKey(String) - Static method in class bixo.utils.GroupingKey
-
- getCrawlEndTime() - Method in class bixo.config.FetcherPolicy
-
- getDefaultCrawlDelay() - Method in class bixo.config.BaseFetchJobPolicy
-
- getDefaultCrawlDelay() - Method in class bixo.config.FakeUserFetcherPolicy
-
- getDefaultCrawlDelay() - Method in class bixo.config.FetcherPolicy
-
Deprecated.
- getDefaultLogDir() - Method in class bixo.config.BixoPlatform
-
- getDefaultMaxContentSize() - Method in class bixo.fetcher.BaseFetcher
-
- getDomain() - Method in class bixo.utils.DomainInfo
-
- getDomainFromKey(String) - Static method in class bixo.utils.GroupingKey
-
- getException() - Method in class bixo.datum.StatusDatum
-
- getExpanded() - Method in class bixo.utils.EncodingUtils.ExpandedResult
-
- getFetchDelay() - Method in class bixo.config.BaseFetchJobPolicy.FetchSetInfo
-
- getFetchDelay() - Method in class bixo.datum.FetchSetDatum
-
- getFetchedUrl() - Method in class bixo.datum.ContentDatum
-
- getFetchedUrl() - Method in class bixo.datum.FetchedDatum
-
- getFetchedUrl() - Method in class bixo.fetcher.FetchedResult
-
- getFetcherMode() - Method in class bixo.config.FetcherPolicy
-
- getFetcherPolicy() - Method in class bixo.fetcher.BaseFetcher
-
- getFetchRequest(long, long, int) - Method in class bixo.config.AdaptiveFetcherPolicy
-
- getFetchRequest(long, long, int) - Method in class bixo.config.FakeUserFetcherPolicy
-
- getFetchTime() - Method in class bixo.datum.FetchedDatum
-
- getFetchTime() - Method in class bixo.datum.FetchSetDatum
-
- getFetchTime() - Method in class bixo.fetcher.FetchedResult
-
- getFileSystem() - Method in class bixo.hadoop.HadoopConfigured
-
- getFileSystem(URI) - Method in class bixo.hadoop.HadoopConfigured
-
- getFileSystem(String) - Method in class bixo.hadoop.HadoopConfigured
-
If the path is a valid URI we lookup the file system based on the uri, if
it is not we use the configured file system.
- getFirst(String) - Method in class bixo.datum.HttpHeaders
-
- getGroupingField() - Static method in class bixo.datum.FetchSetDatum
-
- getGroupingField() - Static method in class bixo.datum.GroupedUrlDatum
-
- getGroupingKey() - Method in class bixo.datum.FetchSetDatum
-
- getGroupingKey(UrlDatum) - Method in class bixo.operations.BaseGroupGenerator
-
Return key used to group URL into one queue
- getGroupingRef() - Method in class bixo.datum.FetchSetDatum
-
- getGroupKey() - Method in class bixo.datum.GroupedUrlDatum
-
- getHeaders() - Method in class bixo.datum.ContentDatum
-
- getHeaders() - Method in class bixo.datum.FetchedDatum
-
- getHeaders() - Method in class bixo.datum.StatusDatum
-
- getHeaders() - Method in class bixo.fetcher.FetchedResult
-
- getHostAddress() - Method in class bixo.datum.ContentDatum
-
- getHostAddress() - Method in class bixo.datum.FetchedDatum
-
- getHostAddress() - Method in class bixo.datum.ParsedDatum
-
- getHostAddress() - Method in class bixo.datum.StatusDatum
-
- getHostAddress() - Method in class bixo.fetcher.FetchedResult
-
- getHostAddress() - Method in class bixo.utils.DomainInfo
-
- getHttpHeaders() - Method in exception bixo.exceptions.HttpFetchException
-
- getHttpStatus() - Method in exception bixo.exceptions.HttpFetchException
-
- getHttpVersion() - Method in class bixo.fetcher.SimpleHttpFetcher
-
- getIntProperty(String) - Method in class bixo.config.BixoPlatform
-
- getLanguage() - Method in class bixo.datum.ParsedDatum
-
- getLanguage(FetchedDatum, String) - Method in class bixo.parser.BaseParser
-
Extract language from (first) explicit header
- getLength() - Method in class bixo.datum.ContentBytes
-
- getLinkAttributeTypes() - Method in class bixo.config.ParserPolicy
-
- getLinkAttributeTypes() - Method in class bixo.parser.BaseLinkExtractor
-
- getLinks() - Method in class bixo.parser.BaseLinkExtractor
-
- getLinks() - Method in class bixo.parser.NullLinkExtractor
-
- getLinks() - Method in class bixo.parser.SimpleLinkExtractor
-
- getLinkTags() - Method in class bixo.config.ParserPolicy
-
- getLinkTags() - Method in class bixo.parser.BaseLinkExtractor
-
- getLocalizedMessage() - Method in exception bixo.exceptions.BaseFetchException
-
- getLogDir() - Method in class bixo.config.BixoPlatform
-
- getMaxConnectionsPerHost() - Method in class bixo.config.FetcherPolicy
-
- getMaxContentSize() - Method in class bixo.config.FetcherPolicy
-
Deprecated.
- getMaxContentSize(String) - Method in class bixo.fetcher.BaseFetcher
-
- getMaxFetchTime() - Static method in class bixo.robots.RobotUtils
-
- getMaxParseDuration() - Method in class bixo.config.ParserPolicy
-
- getMaxRedirects() - Method in class bixo.config.FetcherPolicy
-
- getMaxRequestsPerConnection() - Method in class bixo.config.AdaptiveFetcherPolicy
-
- getMaxRequestsPerConnection() - Method in class bixo.config.FetcherPolicy
-
- getMaxRetryCount() - Method in class bixo.fetcher.SimpleHttpFetcher
-
- getMaxThreads() - Method in class bixo.fetcher.BaseFetcher
-
- getMaxUrls() - Method in class bixo.config.FetcherPolicy
-
Calculate the maximum number of URLs that could be fetched in the remaining time.
- getMaxUrlsPerServer(ScoredUrlDatum) - Method in class bixo.config.DefaultFetchJobPolicy
-
Return max URLs per fetch job for the server indicated by the URL in .
- getMaxUrlsPerSet(ScoredUrlDatum) - Method in class bixo.config.DefaultFetchJobPolicy
-
Return max URLs per fetch set for the server indicated by the URL in .
- getMessage() - Method in exception bixo.exceptions.BaseFetchException
-
- getMetadata() - Method in class bixo.datum.UrlAndMetadata
-
- getMimeTypeFromContentType(String) - Static method in class bixo.utils.HttpUtils
-
- getMinPageFetchInterval() - Method in class bixo.config.FetcherPolicy
-
- getMinResponseRate() - Method in class bixo.config.FetcherPolicy
-
Return the minimum response rate.
- getNames() - Method in class bixo.datum.HttpHeaders
-
- getNewBaseUrl() - Method in class bixo.datum.FetchedDatum
-
- getNewBaseUrl() - Method in class bixo.fetcher.FetchedResult
-
- getNextRequestTime() - Method in class bixo.fetcher.FetchRequest
-
- getNumRedirects() - Method in class bixo.datum.FetchedDatum
-
- getNumRedirects() - Method in class bixo.fetcher.FetchedResult
-
- getNumReduceTasks() - Method in class bixo.config.BixoPlatform
-
- getNumUrls() - Method in class bixo.fetcher.FetchRequest
-
- getNumWarnings() - Method in class bixo.robots.SimpleRobotRulesParser
-
- getOutlinks() - Method in class bixo.datum.ParsedDatum
-
- getParsedMeta() - Method in class bixo.datum.ParsedDatum
-
- getParsedText() - Method in class bixo.datum.ParsedDatum
-
- getParsedTextField() - Static method in class bixo.datum.ParsedDatum
-
- getParserPolicy() - Method in class bixo.parser.BaseParser
-
- getPayload() - Method in class bixo.fetcher.FetchedResult
-
- getPlatformType() - Method in class bixo.config.BixoPlatform
-
- getPLD(String) - Static method in class bixo.utils.DomainNames
-
Extract the PLD (paid-level domain) from the hostname.
- getPLD(URL) - Static method in class bixo.utils.DomainNames
-
Extract the PLD (paid-level domain) from the URL.
- getProcess() - Method in interface bixo.fetcher.IFetchMgr
-
- getProcess() - Method in class bixo.operations.FetchBuffer
-
- getProperty(String) - Method in class bixo.config.BixoPlatform
-
- getProtocolAndDomain() - Method in class bixo.utils.DomainInfo
-
- getRandomLinks(int) - Method in class bixo.utils.DmozLinks
-
- getReason() - Method in exception bixo.exceptions.RedirectFetchException
-
- getRedirectedUrl() - Method in exception bixo.exceptions.RedirectFetchException
-
- getRedirectMode() - Method in class bixo.config.FetcherPolicy
-
- getRelAttributes() - Method in class bixo.datum.Outlink
-
- getRequestTimeout() - Method in class bixo.config.FetcherPolicy
-
- getResponseRate() - Method in class bixo.datum.FetchedDatum
-
- getResponseRate() - Method in class bixo.fetcher.FetchedResult
-
- getRobotRules(BaseFetcher, BaseRobotsParser, URL) - Static method in class bixo.robots.RobotUtils
-
Externally visible, static method for use in tools and for testing.
- getScore() - Method in class bixo.datum.ScoredUrlDatum
-
- getSitemaps() - Method in class bixo.robots.BaseRobotRules
-
- getSize() - Method in class bixo.hadoop.DiskBytesWritable
-
Get the current size of the buffer.
- getSocketTimeout() - Method in class bixo.fetcher.SimpleHttpFetcher
-
- getSortingField() - Static method in class bixo.datum.FetchSetDatum
-
- getSortingField() - Static method in class bixo.datum.ScoredUrlDatum
-
- getSortKey() - Method in class bixo.config.BaseFetchJobPolicy.FetchSetInfo
-
- getStatus() - Method in class bixo.datum.ScoredUrlDatum
-
- getStatus() - Method in class bixo.datum.StatusDatum
-
- getStatusTailPipe() - Method in class bixo.pipes.FetchPipe
-
- getStatusTime() - Method in class bixo.datum.StatusDatum
-
- getSuperDomain(String) - Static method in class bixo.utils.DomainNames
-
Extract the domain immediately containing this subdomain.
- getTailPipe() - Method in class bixo.pipes.ParsePipe
-
- getTempDir() - Method in class bixo.config.BixoPlatform
-
- getTikaParser() - Method in class bixo.parser.SimpleParser
-
- getTitle() - Method in class bixo.datum.ParsedDatum
-
- getToUrl() - Method in class bixo.datum.Outlink
-
- getUrl() - Method in class bixo.datum.FetchedDatum
-
Return the original base URL.
- getUrl() - Method in class bixo.datum.ParsedDatum
-
- getUrl() - Method in class bixo.datum.StatusDatum
-
- getUrl() - Method in class bixo.datum.UrlAndMetadata
-
- getUrl() - Method in class bixo.datum.UrlDatum
-
- getUrl() - Method in exception bixo.exceptions.BaseFetchException
-
- getUrls() - Method in class bixo.config.BaseFetchJobPolicy.FetchSetInfo
-
- getUrls() - Method in class bixo.datum.FetchSetDatum
-
- getUserAgent() - Method in class bixo.fetcher.BaseFetcher
-
- getUserAgentString() - Method in class bixo.config.UserAgent
-
- getValidMimeTypes() - Method in class bixo.config.FetcherPolicy
-
- GroupedUrlDatum - Class in bixo.datum
-
- GroupedUrlDatum() - Constructor for class bixo.datum.GroupedUrlDatum
-
- GroupedUrlDatum(Fields) - Constructor for class bixo.datum.GroupedUrlDatum
-
- GroupedUrlDatum(Fields, Tuple) - Constructor for class bixo.datum.GroupedUrlDatum
-
- GroupedUrlDatum(TupleEntry) - Constructor for class bixo.datum.GroupedUrlDatum
-
- GroupedUrlDatum(String, String) - Constructor for class bixo.datum.GroupedUrlDatum
-
- GroupedUrlDatum(Fields, String, String) - Constructor for class bixo.datum.GroupedUrlDatum
-
- GroupedUrlDatum(UrlDatum, String) - Constructor for class bixo.datum.GroupedUrlDatum
-
- GroupFunction - Class in bixo.operations
-
- GroupFunction(BaseGroupGenerator) - Constructor for class bixo.operations.GroupFunction
-
- GroupingKey - Class in bixo.utils
-
- GroupingKey() - Constructor for class bixo.utils.GroupingKey
-
- valueOf(String) - Static method in enum bixo.config.BixoPlatform.Platform
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.config.FetcherPolicy.FetcherMode
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.config.FetcherPolicy.RedirectMode
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.datum.UrlStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.exceptions.AbortedFetchReason
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.exceptions.RedirectFetchException.RedirectExceptionReason
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.hadoop.FetchCounters
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.hadoop.ImportCounters
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.parser.ParserCounters
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum bixo.robots.SimpleRobotRules.RobotRulesMode
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum bixo.config.BixoPlatform.Platform
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.config.FetcherPolicy.FetcherMode
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.config.FetcherPolicy.RedirectMode
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.datum.UrlStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.exceptions.AbortedFetchReason
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.exceptions.RedirectFetchException.RedirectExceptionReason
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.hadoop.FetchCounters
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.hadoop.ImportCounters
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.parser.ParserCounters
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum bixo.robots.SimpleRobotRules.RobotRulesMode
-
Returns an array containing the constants of this enum type, in
the order they are declared.