A B C D E F G H I L M N O P R S T U V W _ 

A

abort() - Method in class bixo.fetcher.BaseFetcher
 
abort() - Method in class bixo.fetcher.LoggingFetcher
 
abort() - Method in class bixo.fetcher.SimpleHttpFetcher
 
AbortedFetchException - Exception in bixo.exceptions
 
AbortedFetchException() - Constructor for exception bixo.exceptions.AbortedFetchException
 
AbortedFetchException(String, AbortedFetchReason) - Constructor for exception bixo.exceptions.AbortedFetchException
 
AbortedFetchException(String, String, AbortedFetchReason) - Constructor for exception bixo.exceptions.AbortedFetchException
 
AbortedFetchReason - Enum in bixo.exceptions
 
ACCEPT - Static variable in interface bixo.fetcher.HttpHeaderNames
 
ACCEPT_CHARSET - Static variable in interface bixo.fetcher.HttpHeaderNames
 
ACCEPT_ENCODING - Static variable in interface bixo.fetcher.HttpHeaderNames
 
ACCEPT_LANGUAGE - Static variable in interface bixo.fetcher.HttpHeaderNames
 
AdaptiveFetcherPolicy - Class in bixo.config
 
AdaptiveFetcherPolicy(long, long) - Constructor for class bixo.config.AdaptiveFetcherPolicy
 
AdaptiveFetcherPolicy(int, int, long, long) - Constructor for class bixo.config.AdaptiveFetcherPolicy
 
add(String, String) - Method in class bixo.datum.HttpHeaders
 
add(Fields, String...) - Static method in class bixo.utils.FieldUtils
 
addContent(char[], int, int) - Method in class bixo.parser.BaseContentExtractor
 
addContent(char) - Method in class bixo.parser.BaseContentExtractor
 
addContent(char[], int, int) - Method in class bixo.parser.SimpleContentExtractor
 
addContent(char) - Method in class bixo.parser.SimpleContentExtractor
 
addLink(Outlink) - Method in class bixo.parser.BaseLinkExtractor
 
addLink(Outlink) - Method in class bixo.parser.SimpleLinkExtractor
 
addRule(String, boolean) - Method in class bixo.robots.SimpleRobotRules
 
addSitemap(String) - Method in class bixo.robots.BaseRobotRules
 
addValidMimeType(String) - Method in class bixo.config.FetcherPolicy
 
addValidMimeTypes(Set<String>) - Method in class bixo.config.FetcherPolicy
 
ALL_LINK_ATTRIBUTE_TYPES - Static variable in class bixo.parser.BaseLinkExtractor
 
ALL_LINK_TAGS - Static variable in class bixo.parser.BaseLinkExtractor
 
append(byte[]) - Method in class bixo.hadoop.DiskBytesWritable
 

B

BASE_URL_FN - Static variable in class bixo.datum.ContentDatum
 
BaseContentExtractor - Class in bixo.parser
 
BaseContentExtractor() - Constructor for class bixo.parser.BaseContentExtractor
 
BaseFetcher - Class in bixo.fetcher
 
BaseFetcher(int, FetcherPolicy, UserAgent) - Constructor for class bixo.fetcher.BaseFetcher
 
BaseFetchException - Exception in bixo.exceptions
 
BaseFetchException() - Constructor for exception bixo.exceptions.BaseFetchException
 
BaseFetchException(String) - Constructor for exception bixo.exceptions.BaseFetchException
 
BaseFetchException(String, String) - Constructor for exception bixo.exceptions.BaseFetchException
 
BaseFetchException(String, Exception) - Constructor for exception bixo.exceptions.BaseFetchException
 
BaseFetchException(String, String, Exception) - Constructor for exception bixo.exceptions.BaseFetchException
 
BaseFetchJobPolicy - Class in bixo.config
 
BaseFetchJobPolicy() - Constructor for class bixo.config.BaseFetchJobPolicy
 
BaseFetchJobPolicy.FetchSetInfo - Class in bixo.config
 
BaseFetchJobPolicy.FetchSetInfo(List<ScoredUrlDatum>, long, long, boolean) - Constructor for class bixo.config.BaseFetchJobPolicy.FetchSetInfo
 
BaseGroupGenerator - Class in bixo.operations
 
BaseGroupGenerator() - Constructor for class bixo.operations.BaseGroupGenerator
 
BaseLinkExtractor - Class in bixo.parser
 
BaseLinkExtractor() - Constructor for class bixo.parser.BaseLinkExtractor
 
BaseParser - Class in bixo.parser
 
BaseParser(ParserPolicy) - Constructor for class bixo.parser.BaseParser
 
BaseRobotRules - Class in bixo.robots
Result from parsing a single robots.txt file - which means we get a set of rules, and a crawl-delay.
BaseRobotRules() - Constructor for class bixo.robots.BaseRobotRules
 
BaseRobotsParser - Class in bixo.robots
 
BaseRobotsParser() - Constructor for class bixo.robots.BaseRobotsParser
 
BaseScoreGenerator - Class in bixo.operations
 
BaseScoreGenerator() - Constructor for class bixo.operations.BaseScoreGenerator
 
BaseUrlFilter - Class in bixo.urls
Filter urls
BaseUrlFilter() - Constructor for class bixo.urls.BaseUrlFilter
 
BaseUrlNormalizer - Class in bixo.urls
 
BaseUrlNormalizer() - Constructor for class bixo.urls.BaseUrlNormalizer
 
BaseUrlValidator - Class in bixo.urls
Validate urls
BaseUrlValidator() - Constructor for class bixo.urls.BaseUrlValidator
 
bixo.config - package bixo.config
 
bixo.datum - package bixo.datum
 
bixo.exceptions - package bixo.exceptions
 
bixo.fetcher - package bixo.fetcher
 
bixo.hadoop - package bixo.hadoop
 
bixo.operations - package bixo.operations
 
bixo.parser - package bixo.parser
 
bixo.pipes - package bixo.pipes
 
bixo.robots - package bixo.robots
 
bixo.urls - package bixo.urls
 
bixo.utils - package bixo.utils
 
BIXO_IT_AGENT - Static variable in class bixo.utils.ConfigUtils
 
BIXO_TEST_AGENT - Static variable in class bixo.utils.ConfigUtils
 
BIXO_TOOL_AGENT - Static variable in class bixo.utils.ConfigUtils
 
BixoPlatform - Class in bixo.config
 
BixoPlatform(Class, BixoPlatform.Platform) - Constructor for class bixo.config.BixoPlatform
 
BixoPlatform(Class, BixoPlatform.Platform, Level) - Constructor for class bixo.config.BixoPlatform
 
BixoPlatform(Class, Configuration) - Constructor for class bixo.config.BixoPlatform
 
BixoPlatform(Class, JobConf) - Constructor for class bixo.config.BixoPlatform
 
BixoPlatform(Class, JobConf, Level) - Constructor for class bixo.config.BixoPlatform
 
BixoPlatform.Platform - Enum in bixo.config
 
BLOCKED_GROUPING_KEY - Static variable in class bixo.utils.GroupingKey
 
BoilerpipeContentExtractor - Class in bixo.parser
BoilerpipeContentExtractor is a content extractor that extracts Boilerpipe cleaned content
BoilerpipeContentExtractor() - Constructor for class bixo.parser.BoilerpipeContentExtractor
Defaults to using DefaultExtractor when setting up the BoilerpipeContentHandler
BoilerpipeContentExtractor(Class<? extends ExtractorBase>) - Constructor for class bixo.parser.BoilerpipeContentExtractor
BoilerpipeExtractor doesn't implement Serializable, but a caller can work around this limitation by specifying the BoilerpipeExtractor class to use with the BoilerpipeContentHandler (this would work for most extractors; it won't work for KeepEverythingWithMinKWordsExtractor which takes a parameter).

C

calcMaxUrls() - Method in class bixo.config.FetcherPolicy
 
characters(char[], int, int) - Method in class bixo.parser.BaseContentExtractor
 
characters(char[], int, int) - Method in class bixo.parser.BaseLinkExtractor
 
characters(char[], int, int) - Method in class bixo.parser.BoilerpipeContentExtractor
 
characters(char[], int, int) - Method in class bixo.parser.HtmlContentExtractor
 
characters(char[], int, int) - Method in class bixo.parser.NullLinkExtractor
 
checkClientTrusted(X509Certificate[], String) - Method in class bixo.fetcher.DummyX509TrustManager
 
checkServerTrusted(X509Certificate[], String) - Method in class bixo.fetcher.DummyX509TrustManager
 
cleanup(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.FetchBuffer
 
cleanup(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.FilterAndScoreByUrlAndRobots
 
cleanup(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.UrlFilter
 
cleanup(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.UrlLengthener
 
clear() - Method in class bixo.utils.DiskQueue
 
clearRules() - Method in class bixo.robots.SimpleRobotRules
 
clone(Tuple, FlowProcess) - Static method in class bixo.config.BixoPlatform
 
collect(Tuple) - Method in interface bixo.fetcher.IFetchMgr
 
collect(Tuple) - Method in class bixo.operations.FetchBuffer
 
combine(Fields, Fields) - Static method in class bixo.utils.FieldUtils
 
compare(byte[], int, int, byte[], int, int) - Method in class bixo.datum.ContentBytes.Comparator
Compare the buffers in serialized form.
compareTo(GroupedUrlDatum) - Method in class bixo.datum.GroupedUrlDatum
 
compareTo(Object) - Method in class bixo.datum.UrlAndMetadata
 
compareTo(AbortedFetchException) - Method in exception bixo.exceptions.AbortedFetchException
 
compareTo(HttpFetchException) - Method in exception bixo.exceptions.HttpFetchException
 
compareTo(IOFetchException) - Method in exception bixo.exceptions.IOFetchException
 
compareTo(RedirectFetchException) - Method in exception bixo.exceptions.RedirectFetchException
 
compareTo(UrlFetchException) - Method in exception bixo.exceptions.UrlFetchException
 
compareTo(Object) - Method in class bixo.hadoop.DiskBytesWritable
 
compareToBase(BaseFetchException) - Method in exception bixo.exceptions.BaseFetchException
 
ConfigUtils - Class in bixo.utils
 
ConfigUtils() - Constructor for class bixo.utils.ConfigUtils
 
CONTENT_DISPOSITION - Static variable in interface bixo.fetcher.HttpHeaderNames
 
CONTENT_ENCODING - Static variable in interface bixo.fetcher.HttpHeaderNames
 
CONTENT_FN - Static variable in class bixo.datum.ContentDatum
 
CONTENT_FN - Static variable in class bixo.datum.FetchedDatum
 
CONTENT_LANGUAGE - Static variable in interface bixo.fetcher.HttpHeaderNames
 
CONTENT_LENGTH - Static variable in interface bixo.fetcher.HttpHeaderNames
 
CONTENT_LOCATION - Static variable in interface bixo.fetcher.HttpHeaderNames
 
CONTENT_MD5 - Static variable in interface bixo.fetcher.HttpHeaderNames
 
CONTENT_PIPE_NAME - Static variable in class bixo.pipes.FetchPipe
 
CONTENT_TYPE - Static variable in interface bixo.fetcher.HttpHeaderNames
 
CONTENT_TYPE_FN - Static variable in class bixo.datum.ContentDatum
 
CONTENT_TYPE_FN - Static variable in class bixo.datum.FetchedDatum
 
ContentBytes - Class in bixo.datum
 
ContentBytes() - Constructor for class bixo.datum.ContentBytes
 
ContentBytes(byte[]) - Constructor for class bixo.datum.ContentBytes
 
ContentBytes.Comparator - Class in bixo.datum
A Comparator optimized for BytesWritable.
ContentBytes.Comparator() - Constructor for class bixo.datum.ContentBytes.Comparator
 
ContentDatum - Class in bixo.datum
 
ContentDatum(Tuple) - Constructor for class bixo.datum.ContentDatum
 
ContentDatum(TupleEntry) - Constructor for class bixo.datum.ContentDatum
 
ContentDatum(String, String, HttpHeaders, ContentBytes, String) - Constructor for class bixo.datum.ContentDatum
 
ContentDatum(String, Payload) - Constructor for class bixo.datum.ContentDatum
Create place-holder ContentDatum from the data used to attempt the fetch.
ContentDatum(ScoredUrlDatum) - Constructor for class bixo.datum.ContentDatum
Create place-holder FetchedDatum from the data used to attempt the fetch.
copySharedDirToLocal(FlowProcess, String) - Method in class bixo.config.BixoPlatform
 
CrawlDirUtils - Class in bixo.utils
 
CrawlDirUtils() - Constructor for class bixo.utils.CrawlDirUtils
 
createFetcher(BaseFetcher) - Static method in class bixo.robots.RobotUtils
 
createFetcher(UserAgent, int) - Static method in class bixo.robots.RobotUtils
 

D

decodeUrl(String) - Method in class bixo.urls.SimpleUrlNormalizer
 
DEFAULT_ACCEPT_LANGUAGE - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_BROWSER_VERSION - Static variable in class bixo.config.UserAgent
 
DEFAULT_CRAWL_DELAY - Static variable in class bixo.config.BaseFetchJobPolicy
 
DEFAULT_CRAWL_DELAY - Static variable in class bixo.config.FetcherPolicy
Deprecated.
DEFAULT_CRAWL_END_TIME - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_LINK_ATTRIBUTE_TYPES - Static variable in class bixo.parser.BaseLinkExtractor
 
DEFAULT_LINK_TAGS - Static variable in class bixo.parser.BaseLinkExtractor
 
DEFAULT_MAX_CONNECTIONS_PER_HOST - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_MAX_CONTENT_SIZE - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_MAX_PARSE_DURATION - Static variable in class bixo.config.ParserPolicy
 
DEFAULT_MAX_REDIRECTS - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_MAX_REQUESTS_PER_CONNECTION - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_MIN_PAGE_FETCH_INTERVAL - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_MIN_RESPONSE_RATE - Static variable in class bixo.config.FetcherPolicy
 
DEFAULT_REFILL_RATIO - Static variable in class bixo.utils.DiskQueue
 
DEFAULT_SCORE - Static variable in class bixo.operations.FixedScoreGenerator
 
DefaultFetchJobPolicy - Class in bixo.config
 
DefaultFetchJobPolicy() - Constructor for class bixo.config.DefaultFetchJobPolicy
 
DefaultFetchJobPolicy(FetcherPolicy) - Constructor for class bixo.config.DefaultFetchJobPolicy
 
DefaultFetchJobPolicy(int, int, long) - Constructor for class bixo.config.DefaultFetchJobPolicy
 
DEFERRED_GROUPING_KEY - Static variable in class bixo.utils.GroupingKey
 
DiskBytesWritable - Class in bixo.hadoop
 
DiskBytesWritable(byte[]) - Constructor for class bixo.hadoop.DiskBytesWritable
 
DiskQueue<E extends java.io.Serializable> - Class in bixo.utils
A queue that writes extra elements to disk, and reads them in as needed.
DiskQueue(int) - Constructor for class bixo.utils.DiskQueue
Construct a disk-backed queue that keeps at most elements in memory.
DiskQueue(int, Comparator<? super E>) - Constructor for class bixo.utils.DiskQueue
 
DmozLinks - Class in bixo.utils
 
DmozLinks(File) - Constructor for class bixo.utils.DmozLinks
 
DomainInfo - Class in bixo.utils
 
DomainInfo(String) - Constructor for class bixo.utils.DomainInfo
 
DomainNames - Class in bixo.utils
Utilities to extract the PLD (paid-level domain, as per the IRLbot paper) from a hostname and perform similar hostname analysis.
DomainNames() - Constructor for class bixo.utils.DomainNames
 
DOMParser - Class in bixo.parser
 
DOMParser(Fields) - Constructor for class bixo.parser.DOMParser
 
DOMParser(Fields, boolean) - Constructor for class bixo.parser.DOMParser
 
DummyX509TrustManager - Class in bixo.fetcher
 
DummyX509TrustManager(KeyStore) - Constructor for class bixo.fetcher.DummyX509TrustManager
Constructor for DummyX509TrustManager.

E

emptyQueue(Queue<GroupedUrlDatum>, String, TupleEntryCollector, FlowProcess) - Static method in class bixo.operations.ProcessRobotsTask
Clear out the queue by outputting all entries with .
EncodingUtils - Class in bixo.utils
 
EncodingUtils() - Constructor for class bixo.utils.EncodingUtils
 
EncodingUtils.ExpandedResult - Class in bixo.utils
 
EncodingUtils.ExpandedResult(byte[], boolean) - Constructor for class bixo.utils.EncodingUtils.ExpandedResult
 
endDocument() - Method in class bixo.parser.BoilerpipeContentExtractor
 
endDocument() - Method in class bixo.parser.HtmlContentExtractor
 
endElement(String, String, String) - Method in class bixo.parser.BaseContentExtractor
 
endElement(String, String, String) - Method in class bixo.parser.BaseLinkExtractor
 
endElement(String, String, String) - Method in class bixo.parser.BoilerpipeContentExtractor
 
endElement(String, String, String) - Method in class bixo.parser.HtmlContentExtractor
 
endElement(String, String, String) - Method in class bixo.parser.NullLinkExtractor
 
endElement(String, String, String) - Method in class bixo.parser.SimpleLinkExtractor
 
endFetchSet() - Method in class bixo.config.BaseFetchJobPolicy
 
endFetchSet() - Method in class bixo.config.DefaultFetchJobPolicy
 
endPrefixMapping(String) - Method in class bixo.parser.BoilerpipeContentExtractor
 
endPrefixMapping(String) - Method in class bixo.parser.HtmlContentExtractor
 
equals(Object) - Method in class bixo.config.FetcherPolicy
 
equals(Object) - Method in class bixo.config.ParserPolicy
 
equals(Object) - Method in class bixo.datum.ContentBytes
Are the two byte sequences equal?
equals(Object) - Method in class bixo.datum.Outlink
 
EXCEPTION_FN - Static variable in class bixo.datum.StatusDatum
 
execute(Runnable) - Method in class bixo.utils.ThreadedExecutor
Execute using the thread pool.
extractLoopNumber(BasePath) - Static method in class bixo.utils.CrawlDirUtils
Given a "crawl dir" style input path, extract the loop number from the path.

F

failedFetch(int) - Method in class bixo.robots.BaseRobotsParser
The fetch of robots.txt failed, so return rules appropriate give the HTTP status code.
failedFetch(int) - Method in class bixo.robots.SimpleRobotRulesParser
 
FAKE_CONTENT_LOCATION - Static variable in class bixo.fetcher.LoggingFetcher
 
FakeUserFetcherPolicy - Class in bixo.config
 
FakeUserFetcherPolicy() - Constructor for class bixo.config.FakeUserFetcherPolicy
 
FakeUserFetcherPolicy(long) - Constructor for class bixo.config.FakeUserFetcherPolicy
 
fetch(String) - Method in class bixo.fetcher.SimpleHttpFetcher
 
fetch(HttpRequestBase, String, Payload) - Method in class bixo.fetcher.SimpleHttpFetcher
 
FETCH_TIME_FN - Static variable in class bixo.datum.FetchedDatum
 
FetchBuffer - Class in bixo.operations
 
FetchBuffer(BaseFetcher) - Constructor for class bixo.operations.FetchBuffer
 
FetchCounters - Enum in bixo.hadoop
 
FETCHED_URL_FN - Static variable in class bixo.datum.ContentDatum
 
FETCHED_URL_FN - Static variable in class bixo.datum.FetchedDatum
 
FetchedDatum - Class in bixo.datum
 
FetchedDatum(Tuple) - Constructor for class bixo.datum.FetchedDatum
 
FetchedDatum(TupleEntry) - Constructor for class bixo.datum.FetchedDatum
 
FetchedDatum(String, String, long, HttpHeaders, ContentBytes, String, int) - Constructor for class bixo.datum.FetchedDatum
 
FetchedDatum(String, Payload) - Constructor for class bixo.datum.FetchedDatum
Create place-holder FetchedDatum from the data used to attempt the fetch.
FetchedDatum(ScoredUrlDatum) - Constructor for class bixo.datum.FetchedDatum
Create place-holder FetchedDatum from the data used to attempt the fetch.
FetchedResult - Class in bixo.fetcher
 
FetchedResult(String, String, long, HttpHeaders, byte[], String, int, Payload, String, int, String) - Constructor for class bixo.fetcher.FetchedResult
 
FetcherPolicy - Class in bixo.config
Definition of policy for fetches.
FetcherPolicy() - Constructor for class bixo.config.FetcherPolicy
 
FetcherPolicy(int, int, long, long, int) - Constructor for class bixo.config.FetcherPolicy
 
FetcherPolicy.FetcherMode - Enum in bixo.config
 
FetcherPolicy.RedirectMode - Enum in bixo.config
 
FetchPipe - Class in bixo.pipes
 
FetchPipe(Pipe, BaseScoreGenerator, BaseFetcher, int) - Constructor for class bixo.pipes.FetchPipe
Generate an assembly that will fetch all of the UrlDatum tuples coming out of urlProvider.
FetchPipe(Pipe, BaseScoreGenerator, BaseFetcher, BaseFetcher, BaseRobotsParser, BaseFetchJobPolicy, int) - Constructor for class bixo.pipes.FetchPipe
 
FetchRequest - Class in bixo.fetcher
 
FetchRequest(int, long) - Constructor for class bixo.fetcher.FetchRequest
 
FetchSetDatum - Class in bixo.datum
A FetchSetDatum represents a group of URLs that will be fetched using one persistent connection to the target server.
FetchSetDatum() - Constructor for class bixo.datum.FetchSetDatum
 
FetchSetDatum(Tuple) - Constructor for class bixo.datum.FetchSetDatum
 
FetchSetDatum(TupleEntry) - Constructor for class bixo.datum.FetchSetDatum
 
FetchSetDatum(List<ScoredUrlDatum>, long, long, int, String) - Constructor for class bixo.datum.FetchSetDatum
 
FetchTask - Class in bixo.fetcher
Runnable instance for fetching a set of URLs from the same server, using keep-alive.
FetchTask(IFetchMgr, BaseFetcher, List<ScoredUrlDatum>, String) - Constructor for class bixo.fetcher.FetchTask
 
FIELDS - Static variable in class bixo.datum.ContentDatum
 
FIELDS - Static variable in class bixo.datum.FetchedDatum
 
FIELDS - Static variable in class bixo.datum.FetchSetDatum
 
FIELDS - Static variable in class bixo.datum.GroupedUrlDatum
 
FIELDS - Static variable in class bixo.datum.ParsedDatum
 
FIELDS - Static variable in class bixo.datum.ScoredUrlDatum
 
FIELDS - Static variable in class bixo.datum.StatusDatum
 
FIELDS - Static variable in class bixo.datum.UrlDatum
 
FieldUtils - Class in bixo.utils
 
FieldUtils() - Constructor for class bixo.utils.FieldUtils
 
FilterAndScoreByUrlAndRobots - Class in bixo.operations
Filter out URLs by either domain (not popular enough) or if they're blocked by robots.txt
FilterAndScoreByUrlAndRobots(UserAgent, int, BaseRobotsParser, BaseScoreGenerator) - Constructor for class bixo.operations.FilterAndScoreByUrlAndRobots
 
FilterAndScoreByUrlAndRobots(BaseFetcher, BaseRobotsParser, BaseScoreGenerator) - Constructor for class bixo.operations.FilterAndScoreByUrlAndRobots
 
finalize() - Method in class bixo.utils.DiskQueue
 
findAllSubdirs(BasePlatform, BasePath, String) - Static method in class bixo.utils.CrawlDirUtils
Return an array of paths to all of the subdirs in crawl dirs found inside of , where the subdir name ==
findLatestLoopDir(BasePlatform, BasePath) - Static method in class bixo.utils.CrawlDirUtils
 
findNextLoopDir(BasePlatform, BasePath, int) - Static method in class bixo.utils.CrawlDirUtils
Given a loopNumber, returns the name of the next loop directory.
finished(String) - Method in interface bixo.fetcher.IFetchMgr
 
finished(String) - Method in class bixo.operations.FetchBuffer
 
FixedScoreGenerator - Class in bixo.operations
 
FixedScoreGenerator() - Constructor for class bixo.operations.FixedScoreGenerator
 
FixedScoreGenerator(double) - Constructor for class bixo.operations.FixedScoreGenerator
 
flush(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.FetchBuffer
 
flush(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.FilterAndScoreByUrlAndRobots
 
flush(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.UrlLengthener
 
fromHexString(String) - Static method in class bixo.utils.StringUtils
Convert a String containing consecutive (no inside whitespace) hexadecimal digits into a corresponding byte array.

G

generateScore(String) - Method in class bixo.operations.BaseScoreGenerator
 
generateScore(String, String, GroupedUrlDatum) - Method in class bixo.operations.BaseScoreGenerator
 
generateScore(String, String, String) - Method in class bixo.operations.BaseScoreGenerator
Return score for URL, based on domain & URL path
generateScore(String, String, String) - Method in class bixo.operations.FixedScoreGenerator
 
get(ScoredUrlDatum) - Method in class bixo.fetcher.BaseFetcher
 
get(ScoredUrlDatum) - Method in class bixo.fetcher.LoggingFetcher
 
get(ScoredUrlDatum) - Method in class bixo.fetcher.SimpleHttpFetcher
 
get() - Method in class bixo.hadoop.DiskBytesWritable
Get the data from the BytesWritable.
getAbortReason() - Method in exception bixo.exceptions.AbortedFetchException
 
getAcceptedIssuers() - Method in class bixo.fetcher.DummyX509TrustManager
 
getAcceptEncoding() - Method in class bixo.fetcher.SimpleHttpFetcher
Return the current value used for the ACCEPT-ENCODING request parameter.
getAcceptLanguage() - Method in class bixo.config.FetcherPolicy
 
getActiveCount() - Method in class bixo.utils.ThreadedExecutor
Return number of active threads
getAgentName() - Method in class bixo.config.UserAgent
 
getAll(String) - Method in class bixo.datum.HttpHeaders
 
getAnchor() - Method in class bixo.datum.Outlink
 
getBaseUrl() - Method in class bixo.datum.ContentDatum
Return the original URL - use the UrlDatum support for this.
getBaseUrl() - Method in class bixo.fetcher.FetchedResult
 
getBooleanProperty(String) - Method in class bixo.config.BixoPlatform
 
getBytes() - Method in class bixo.datum.ContentBytes
 
getCapacity() - Method in class bixo.hadoop.DiskBytesWritable
Get the capacity, which is the maximum size that could handled without resizing the backing storage.
getCause() - Method in exception bixo.exceptions.BaseFetchException
 
getCharset(FetchedDatum) - Method in class bixo.parser.BaseParser
Extract encoding from content-type If a charset is returned, then it's a valid/normalized charset name that's supported on this platform.
getCharsetFromContentType(String) - Static method in class bixo.utils.HttpUtils
 
getConf() - Method in class bixo.hadoop.HadoopConfigured
 
getConnectionTimeout() - Method in class bixo.fetcher.SimpleHttpFetcher
 
getContent() - Method in class bixo.fetcher.FetchedResult
 
getContent() - Method in class bixo.parser.BaseContentExtractor
 
getContent() - Method in class bixo.parser.BoilerpipeContentExtractor
getContent returns the boilerpipe extracted text.
getContent() - Method in class bixo.parser.HtmlContentExtractor
 
getContent() - Method in class bixo.parser.SimpleContentExtractor
 
getContentBytes() - Method in class bixo.datum.ContentDatum
 
getContentBytes() - Method in class bixo.datum.FetchedDatum
 
getContentLength() - Method in class bixo.datum.ContentDatum
 
getContentLength() - Method in class bixo.datum.FetchedDatum
 
getContentLocation(FetchedDatum) - Method in class bixo.parser.BaseParser
Figure out the right base URL to use, for when we need to resolve relative URLs.
getContentTailPipe() - Method in class bixo.pipes.FetchPipe
 
getContentType() - Method in class bixo.datum.ContentDatum
 
getContentType() - Method in class bixo.datum.FetchedDatum
 
getContentType() - Method in class bixo.fetcher.FetchedResult
 
getCrawlDelay() - Method in class bixo.config.FakeUserFetcherPolicy
 
getCrawlDelay() - Method in class bixo.config.FetcherPolicy
Deprecated.
getCrawlDelay() - Method in class bixo.robots.BaseRobotRules
 
getCrawlDelayFromKey(String) - Static method in class bixo.utils.GroupingKey
 
getCrawlEndTime() - Method in class bixo.config.FetcherPolicy
 
getDefaultCrawlDelay() - Method in class bixo.config.BaseFetchJobPolicy
 
getDefaultCrawlDelay() - Method in class bixo.config.FakeUserFetcherPolicy
 
getDefaultCrawlDelay() - Method in class bixo.config.FetcherPolicy
Deprecated.
getDefaultLogDir() - Method in class bixo.config.BixoPlatform
 
getDefaultMaxContentSize() - Method in class bixo.fetcher.BaseFetcher
 
getDomain() - Method in class bixo.utils.DomainInfo
 
getDomainFromKey(String) - Static method in class bixo.utils.GroupingKey
 
getException() - Method in class bixo.datum.StatusDatum
 
getExpanded() - Method in class bixo.utils.EncodingUtils.ExpandedResult
 
getFetchDelay() - Method in class bixo.config.BaseFetchJobPolicy.FetchSetInfo
 
getFetchDelay() - Method in class bixo.datum.FetchSetDatum
 
getFetchedUrl() - Method in class bixo.datum.ContentDatum
 
getFetchedUrl() - Method in class bixo.datum.FetchedDatum
 
getFetchedUrl() - Method in class bixo.fetcher.FetchedResult
 
getFetcherMode() - Method in class bixo.config.FetcherPolicy
 
getFetcherPolicy() - Method in class bixo.fetcher.BaseFetcher
 
getFetchRequest(long, long, int) - Method in class bixo.config.AdaptiveFetcherPolicy
 
getFetchRequest(long, long, int) - Method in class bixo.config.FakeUserFetcherPolicy
 
getFetchTime() - Method in class bixo.datum.FetchedDatum
 
getFetchTime() - Method in class bixo.datum.FetchSetDatum
 
getFetchTime() - Method in class bixo.fetcher.FetchedResult
 
getFileSystem() - Method in class bixo.hadoop.HadoopConfigured
 
getFileSystem(URI) - Method in class bixo.hadoop.HadoopConfigured
 
getFileSystem(String) - Method in class bixo.hadoop.HadoopConfigured
If the path is a valid URI we lookup the file system based on the uri, if it is not we use the configured file system.
getFirst(String) - Method in class bixo.datum.HttpHeaders
 
getGroupingField() - Static method in class bixo.datum.FetchSetDatum
 
getGroupingField() - Static method in class bixo.datum.GroupedUrlDatum
 
getGroupingKey() - Method in class bixo.datum.FetchSetDatum
 
getGroupingKey(UrlDatum) - Method in class bixo.operations.BaseGroupGenerator
Return key used to group URL into one queue
getGroupingRef() - Method in class bixo.datum.FetchSetDatum
 
getGroupKey() - Method in class bixo.datum.GroupedUrlDatum
 
getHeaders() - Method in class bixo.datum.ContentDatum
 
getHeaders() - Method in class bixo.datum.FetchedDatum
 
getHeaders() - Method in class bixo.datum.StatusDatum
 
getHeaders() - Method in class bixo.fetcher.FetchedResult
 
getHostAddress() - Method in class bixo.datum.ContentDatum
 
getHostAddress() - Method in class bixo.datum.FetchedDatum
 
getHostAddress() - Method in class bixo.datum.ParsedDatum
 
getHostAddress() - Method in class bixo.datum.StatusDatum
 
getHostAddress() - Method in class bixo.fetcher.FetchedResult
 
getHostAddress() - Method in class bixo.utils.DomainInfo
 
getHttpHeaders() - Method in exception bixo.exceptions.HttpFetchException
 
getHttpStatus() - Method in exception bixo.exceptions.HttpFetchException
 
getHttpVersion() - Method in class bixo.fetcher.SimpleHttpFetcher
 
getIntProperty(String) - Method in class bixo.config.BixoPlatform
 
getLanguage() - Method in class bixo.datum.ParsedDatum
 
getLanguage(FetchedDatum, String) - Method in class bixo.parser.BaseParser
Extract language from (first) explicit header
getLength() - Method in class bixo.datum.ContentBytes
 
getLinkAttributeTypes() - Method in class bixo.config.ParserPolicy
 
getLinkAttributeTypes() - Method in class bixo.parser.BaseLinkExtractor
 
getLinks() - Method in class bixo.parser.BaseLinkExtractor
 
getLinks() - Method in class bixo.parser.NullLinkExtractor
 
getLinks() - Method in class bixo.parser.SimpleLinkExtractor
 
getLinkTags() - Method in class bixo.config.ParserPolicy
 
getLinkTags() - Method in class bixo.parser.BaseLinkExtractor
 
getLocalizedMessage() - Method in exception bixo.exceptions.BaseFetchException
 
getLogDir() - Method in class bixo.config.BixoPlatform
 
getMaxConnectionsPerHost() - Method in class bixo.config.FetcherPolicy
 
getMaxContentSize() - Method in class bixo.config.FetcherPolicy
Deprecated.
getMaxContentSize(String) - Method in class bixo.fetcher.BaseFetcher
 
getMaxFetchTime() - Static method in class bixo.robots.RobotUtils
 
getMaxParseDuration() - Method in class bixo.config.ParserPolicy
 
getMaxRedirects() - Method in class bixo.config.FetcherPolicy
 
getMaxRequestsPerConnection() - Method in class bixo.config.AdaptiveFetcherPolicy
 
getMaxRequestsPerConnection() - Method in class bixo.config.FetcherPolicy
 
getMaxRetryCount() - Method in class bixo.fetcher.SimpleHttpFetcher
 
getMaxThreads() - Method in class bixo.fetcher.BaseFetcher
 
getMaxUrls() - Method in class bixo.config.FetcherPolicy
Calculate the maximum number of URLs that could be fetched in the remaining time.
getMaxUrlsPerServer(ScoredUrlDatum) - Method in class bixo.config.DefaultFetchJobPolicy
Return max URLs per fetch job for the server indicated by the URL in .
getMaxUrlsPerSet(ScoredUrlDatum) - Method in class bixo.config.DefaultFetchJobPolicy
Return max URLs per fetch set for the server indicated by the URL in .
getMessage() - Method in exception bixo.exceptions.BaseFetchException
 
getMetadata() - Method in class bixo.datum.UrlAndMetadata
 
getMimeTypeFromContentType(String) - Static method in class bixo.utils.HttpUtils
 
getMinPageFetchInterval() - Method in class bixo.config.FetcherPolicy
 
getMinResponseRate() - Method in class bixo.config.FetcherPolicy
Return the minimum response rate.
getNames() - Method in class bixo.datum.HttpHeaders
 
getNewBaseUrl() - Method in class bixo.datum.FetchedDatum
 
getNewBaseUrl() - Method in class bixo.fetcher.FetchedResult
 
getNextRequestTime() - Method in class bixo.fetcher.FetchRequest
 
getNumRedirects() - Method in class bixo.datum.FetchedDatum
 
getNumRedirects() - Method in class bixo.fetcher.FetchedResult
 
getNumReduceTasks() - Method in class bixo.config.BixoPlatform
 
getNumUrls() - Method in class bixo.fetcher.FetchRequest
 
getNumWarnings() - Method in class bixo.robots.SimpleRobotRulesParser
 
getOutlinks() - Method in class bixo.datum.ParsedDatum
 
getParsedMeta() - Method in class bixo.datum.ParsedDatum
 
getParsedText() - Method in class bixo.datum.ParsedDatum
 
getParsedTextField() - Static method in class bixo.datum.ParsedDatum
 
getParserPolicy() - Method in class bixo.parser.BaseParser
 
getPayload() - Method in class bixo.fetcher.FetchedResult
 
getPlatformType() - Method in class bixo.config.BixoPlatform
 
getPLD(String) - Static method in class bixo.utils.DomainNames
Extract the PLD (paid-level domain) from the hostname.
getPLD(URL) - Static method in class bixo.utils.DomainNames
Extract the PLD (paid-level domain) from the URL.
getProcess() - Method in interface bixo.fetcher.IFetchMgr
 
getProcess() - Method in class bixo.operations.FetchBuffer
 
getProperty(String) - Method in class bixo.config.BixoPlatform
 
getProtocolAndDomain() - Method in class bixo.utils.DomainInfo
 
getRandomLinks(int) - Method in class bixo.utils.DmozLinks
 
getReason() - Method in exception bixo.exceptions.RedirectFetchException
 
getRedirectedUrl() - Method in exception bixo.exceptions.RedirectFetchException
 
getRedirectMode() - Method in class bixo.config.FetcherPolicy
 
getRelAttributes() - Method in class bixo.datum.Outlink
 
getRequestTimeout() - Method in class bixo.config.FetcherPolicy
 
getResponseRate() - Method in class bixo.datum.FetchedDatum
 
getResponseRate() - Method in class bixo.fetcher.FetchedResult
 
getRobotRules(BaseFetcher, BaseRobotsParser, URL) - Static method in class bixo.robots.RobotUtils
Externally visible, static method for use in tools and for testing.
getScore() - Method in class bixo.datum.ScoredUrlDatum
 
getSitemaps() - Method in class bixo.robots.BaseRobotRules
 
getSize() - Method in class bixo.hadoop.DiskBytesWritable
Get the current size of the buffer.
getSocketTimeout() - Method in class bixo.fetcher.SimpleHttpFetcher
 
getSortingField() - Static method in class bixo.datum.FetchSetDatum
 
getSortingField() - Static method in class bixo.datum.ScoredUrlDatum
 
getSortKey() - Method in class bixo.config.BaseFetchJobPolicy.FetchSetInfo
 
getStatus() - Method in class bixo.datum.ScoredUrlDatum
 
getStatus() - Method in class bixo.datum.StatusDatum
 
getStatusTailPipe() - Method in class bixo.pipes.FetchPipe
 
getStatusTime() - Method in class bixo.datum.StatusDatum
 
getSuperDomain(String) - Static method in class bixo.utils.DomainNames
Extract the domain immediately containing this subdomain.
getTailPipe() - Method in class bixo.pipes.ParsePipe
 
getTempDir() - Method in class bixo.config.BixoPlatform
 
getTikaParser() - Method in class bixo.parser.SimpleParser
 
getTitle() - Method in class bixo.datum.ParsedDatum
 
getToUrl() - Method in class bixo.datum.Outlink
 
getUrl() - Method in class bixo.datum.FetchedDatum
Return the original base URL.
getUrl() - Method in class bixo.datum.ParsedDatum
 
getUrl() - Method in class bixo.datum.StatusDatum
 
getUrl() - Method in class bixo.datum.UrlAndMetadata
 
getUrl() - Method in class bixo.datum.UrlDatum
 
getUrl() - Method in exception bixo.exceptions.BaseFetchException
 
getUrls() - Method in class bixo.config.BaseFetchJobPolicy.FetchSetInfo
 
getUrls() - Method in class bixo.datum.FetchSetDatum
 
getUserAgent() - Method in class bixo.fetcher.BaseFetcher
 
getUserAgentString() - Method in class bixo.config.UserAgent
 
getValidMimeTypes() - Method in class bixo.config.FetcherPolicy
 
GroupedUrlDatum - Class in bixo.datum
 
GroupedUrlDatum() - Constructor for class bixo.datum.GroupedUrlDatum
 
GroupedUrlDatum(Fields) - Constructor for class bixo.datum.GroupedUrlDatum
 
GroupedUrlDatum(Fields, Tuple) - Constructor for class bixo.datum.GroupedUrlDatum
 
GroupedUrlDatum(TupleEntry) - Constructor for class bixo.datum.GroupedUrlDatum
 
GroupedUrlDatum(String, String) - Constructor for class bixo.datum.GroupedUrlDatum
 
GroupedUrlDatum(Fields, String, String) - Constructor for class bixo.datum.GroupedUrlDatum
 
GroupedUrlDatum(UrlDatum, String) - Constructor for class bixo.datum.GroupedUrlDatum
 
GroupFunction - Class in bixo.operations
 
GroupFunction(BaseGroupGenerator) - Constructor for class bixo.operations.GroupFunction
 
GroupingKey - Class in bixo.utils
 
GroupingKey() - Constructor for class bixo.utils.GroupingKey
 

H

HadoopConfigured - Class in bixo.hadoop
Helps to deal with all the hadoop configuration related lookups.
HadoopConfigured() - Constructor for class bixo.hadoop.HadoopConfigured
 
handleException(ParsedDatum, Exception, TupleEntryCollector) - Method in class bixo.parser.DOMParser
An exception occurred while parsing or processing the _input ParsedDatum.
hashCode() - Method in class bixo.config.FetcherPolicy
 
hashCode() - Method in class bixo.config.ParserPolicy
 
hashCode() - Method in class bixo.datum.ContentBytes
 
hashCode() - Method in class bixo.datum.Outlink
 
hasNoArchiveMetaTags(String) - Static method in class bixo.utils.HtmlUtils
 
hasNoFollowMetaTags(String) - Static method in class bixo.utils.HtmlUtils
 
hasOnlyNonEnglishMetaTags(String) - Static method in class bixo.utils.HtmlUtils
 
HEADERS_FN - Static variable in class bixo.datum.StatusDatum
 
HOST_ADDRESS_FN - Static variable in class bixo.datum.ContentDatum
 
HOST_ADDRESS_FN - Static variable in class bixo.datum.FetchedDatum
 
HOST_ADDRESS_FN - Static variable in class bixo.datum.ParsedDatum
 
HOST_ADDRESS_FN - Static variable in class bixo.datum.StatusDatum
 
HtmlContentExtractor - Class in bixo.parser
 
HtmlContentExtractor() - Constructor for class bixo.parser.HtmlContentExtractor
 
HtmlContentExtractor(String) - Constructor for class bixo.parser.HtmlContentExtractor
 
HtmlUtils - Class in bixo.utils
 
HtmlUtils() - Constructor for class bixo.utils.HtmlUtils
 
HTTP_HEADERS_FN - Static variable in class bixo.datum.ContentDatum
 
HTTP_HEADERS_FN - Static variable in class bixo.datum.FetchedDatum
 
HttpFetchException - Exception in bixo.exceptions
 
HttpFetchException() - Constructor for exception bixo.exceptions.HttpFetchException
 
HttpFetchException(String, String, int, HttpHeaders) - Constructor for exception bixo.exceptions.HttpFetchException
 
HttpHeaderNames - Interface in bixo.fetcher
A collection of HTTP header names.
HttpHeaders - Class in bixo.datum
 
HttpHeaders() - Constructor for class bixo.datum.HttpHeaders
 
HttpHeaders(Tuple) - Constructor for class bixo.datum.HttpHeaders
 
HttpUtils - Class in bixo.utils
 
HttpUtils() - Constructor for class bixo.utils.HttpUtils
 

I

IFetchMgr - Interface in bixo.fetcher
 
ignorableWhitespace(char[], int, int) - Method in class bixo.parser.BaseContentExtractor
 
ignorableWhitespace(char[], int, int) - Method in class bixo.parser.BoilerpipeContentExtractor
 
ignorableWhitespace(char[], int, int) - Method in class bixo.parser.HtmlContentExtractor
 
ImportCounters - Enum in bixo.hadoop
 
init() - Method in class bixo.parser.BoilerpipeContentExtractor
 
init() - Method in class bixo.parser.HtmlContentExtractor
 
init() - Method in class bixo.parser.SimpleParser
 
INSTANCE - Static variable in class bixo.parser.NullLinkExtractor
 
INVALID_URL_GROUPING_KEY - Static variable in class bixo.utils.GroupingKey
 
IOFetchException - Exception in bixo.exceptions
 
IOFetchException() - Constructor for exception bixo.exceptions.IOFetchException
 
IOFetchException(String, IOException) - Constructor for exception bixo.exceptions.IOFetchException
 
IoUtils - Class in bixo.utils
 
IoUtils() - Constructor for class bixo.utils.IoUtils
 
isAllowAll() - Method in class bixo.robots.BaseRobotRules
 
isAllowAll() - Method in class bixo.robots.SimpleRobotRules
Is our ruleset set up to allow all access?
isAllowed(String) - Method in class bixo.robots.BaseRobotRules
 
isAllowed(String) - Method in class bixo.robots.SimpleRobotRules
 
isAllowNone() - Method in class bixo.robots.BaseRobotRules
 
isAllowNone() - Method in class bixo.robots.SimpleRobotRules
Is our ruleset set up to disallow all access?
isClientTrusted(X509Certificate[]) - Method in class bixo.fetcher.DummyX509TrustManager
 
isDeferVisits() - Method in class bixo.robots.BaseRobotRules
 
isExtractLanguage() - Method in class bixo.parser.SimpleParser
 
isGoodDomain(String, String) - Method in class bixo.operations.BaseScoreGenerator
Return whether the domain should be crawled.
isIPAddress(String) - Static method in class bixo.utils.DomainNames
Check whether this paid level domain is just a naked IP address.
isLastList() - Method in class bixo.datum.FetchSetDatum
 
isLocal() - Method in class bixo.config.BixoPlatform
 
isNoFollow() - Method in class bixo.datum.Outlink
 
isRemove(FlowProcess, FilterCall<NullContext>) - Method in class bixo.operations.UrlFilter
 
isRemove(UrlDatum) - Method in class bixo.urls.BaseUrlFilter
Return true if we should filter out (remove) the datum
isRemove(UrlDatum) - Method in class bixo.urls.SimpleUrlFilter
 
isSafe() - Method in class bixo.operations.FetchBuffer
 
isSafe() - Method in class bixo.operations.FilterAndScoreByUrlAndRobots
 
isSafe() - Method in class bixo.parser.DOMParser
 
isServerTrusted(X509Certificate[]) - Method in class bixo.fetcher.DummyX509TrustManager
 
isSkipped() - Method in class bixo.datum.FetchSetDatum
 
isSkipping() - Method in class bixo.config.BaseFetchJobPolicy.FetchSetInfo
 
isSpecialKey(String) - Static method in class bixo.utils.GroupingKey
 
isTerminateFetch() - Method in class bixo.config.FetcherPolicy
 
isTextSchemeCompressable() - Method in class bixo.config.BixoPlatform
 
isTruncated() - Method in class bixo.utils.EncodingUtils.ExpandedResult
 
isUrlWithinDomain(String, String) - Static method in class bixo.utils.DomainNames
Check whether the domain of the URL is the given domain or a subdomain of the given domain.
isValid(String) - Method in class bixo.urls.BaseUrlValidator
Return true if the url is valid
isValid(String) - Method in class bixo.urls.SimpleUrlValidator
 
isValidHostAddress() - Method in class bixo.utils.DomainInfo
 
iterator() - Method in class bixo.utils.DiskQueue
 

L

LANGUAGE_FN - Static variable in class bixo.datum.ParsedDatum
 
LAST_MODIFIED - Static variable in interface bixo.fetcher.HttpHeaderNames
 
LoadUrlsFunction - Class in bixo.operations
 
LoadUrlsFunction(int) - Constructor for class bixo.operations.LoadUrlsFunction
 
LoadUrlsFunction() - Constructor for class bixo.operations.LoadUrlsFunction
 
loadUrlShorteners() - Static method in class bixo.operations.UrlLengthener
 
LOCATION - Static variable in interface bixo.fetcher.HttpHeaderNames
 
LoggingFetcher - Class in bixo.fetcher
 
LoggingFetcher(int) - Constructor for class bixo.fetcher.LoggingFetcher
 

M

main(String[]) - Static method in class bixo.utils.DmozLinks
 
makeBinaryScheme(Fields) - Method in class bixo.config.BixoPlatform
 
makeFetcher(int, UserAgent) - Static method in class bixo.operations.UrlLengthener
Return a SimpleHttpFetcher that's appropriate for lengthening URLs.
MakeFetchSetsBuffer - Class in bixo.operations
We get ScoredUrlDatums, grouped by server IP address.
MakeFetchSetsBuffer(BaseFetchJobPolicy, int) - Constructor for class bixo.operations.MakeFetchSetsBuffer
 
makeFlowConnector() - Method in class bixo.config.BixoPlatform
 
makeFlowProcess() - Method in class bixo.config.BixoPlatform
 
makeGroupingKey(String, long) - Static method in class bixo.utils.GroupingKey
 
makeLoopDir(BasePlatform, BasePath, int) - Static method in class bixo.utils.CrawlDirUtils
 
makePartitionTap(Tap, Partition) - Method in class bixo.config.BixoPlatform
 
makePartitionTap(Tap, Partition, SinkMode) - Method in class bixo.config.BixoPlatform
 
makePath(String) - Method in class bixo.config.BixoPlatform
 
makePath(BasePath, String) - Method in class bixo.config.BixoPlatform
 
makeProtocolAndDomain(String) - Static method in class bixo.utils.UrlUtils
 
makeSinkMap(Tap, Tap) - Static method in class bixo.pipes.FetchPipe
Utility routine that helps create the Cascading map needed when there are multiple tails (like with this subassembly) and you need to build the Flow
makeTap(Scheme, BasePath) - Method in class bixo.config.BixoPlatform
 
makeTap(Scheme, BasePath, SinkMode) - Method in class bixo.config.BixoPlatform
 
makeTemplateTap(Tap, String, Fields) - Method in class bixo.config.BixoPlatform
 
makeTestDomain(int) - Static method in class bixo.utils.DomainInfo
 
makeTextScheme() - Method in class bixo.config.BixoPlatform
 
makeTextScheme(boolean) - Method in class bixo.config.BixoPlatform
 
makeUrl(URL, String) - Static method in class bixo.utils.UrlUtils
 
makeUrlStatusFromKey(String) - Static method in class bixo.utils.GroupingKey
 
mapToUrlStatus() - Method in exception bixo.exceptions.AbortedFetchException
 
mapToUrlStatus() - Method in exception bixo.exceptions.BaseFetchException
 
mapToUrlStatus() - Method in exception bixo.exceptions.HttpFetchException
 
mapToUrlStatus() - Method in exception bixo.exceptions.IOFetchException
 
mapToUrlStatus() - Method in exception bixo.exceptions.RedirectFetchException
 
mapToUrlStatus() - Method in exception bixo.exceptions.UrlFetchException
 
MAX_POLL_TIME - Static variable in class bixo.utils.ThreadedExecutor
 

N

NEW_BASE_URL_FN - Static variable in class bixo.datum.FetchedDatum
 
nextFetchSet(ScoredUrlDatum) - Method in class bixo.config.BaseFetchJobPolicy
 
nextFetchSet(ScoredUrlDatum) - Method in class bixo.config.DefaultFetchJobPolicy
 
nextSortKey(Random, long, long) - Static method in class bixo.config.DefaultFetchJobPolicy
Time to move the request time forward.
NO_CRAWL_END_TIME - Static variable in class bixo.config.FetcherPolicy
 
NO_MAX_PARSE_DURATION - Static variable in class bixo.config.ParserPolicy
 
NO_MIN_RESPONSE_RATE - Static variable in class bixo.config.FetcherPolicy
 
NO_REDIRECTS - Static variable in class bixo.config.FetcherPolicy
 
normalize(String) - Method in class bixo.urls.BaseUrlNormalizer
Convert into a normalized format, where unimportant differences between two URLs have been removed.
normalize(String) - Method in class bixo.urls.SimpleUrlNormalizer
 
normalizeHostname(String) - Method in class bixo.urls.SimpleUrlNormalizer
 
normalizePath(String) - Method in class bixo.urls.SimpleUrlNormalizer
 
normalizeQuery(String) - Method in class bixo.urls.SimpleUrlNormalizer
 
NormalizeUrlFunction - Class in bixo.operations
 
NormalizeUrlFunction(BaseUrlNormalizer) - Constructor for class bixo.operations.NormalizeUrlFunction
 
nowWithUnderLine() - Static method in class bixo.utils.TimeStampUtils
 
NullLinkExtractor - Class in bixo.parser
 
NullLinkExtractor() - Constructor for class bixo.parser.NullLinkExtractor
 
NUM_REDIRECTS_FN - Static variable in class bixo.datum.FetchedDatum
 

O

offer(E) - Method in class bixo.utils.DiskQueue
 
operate(FlowProcess, BufferCall<NullContext>) - Method in class bixo.operations.FetchBuffer
 
operate(FlowProcess, BufferCall<NullContext>) - Method in class bixo.operations.FilterAndScoreByUrlAndRobots
 
operate(FlowProcess, FunctionCall<NullContext>) - Method in class bixo.operations.GroupFunction
 
operate(FlowProcess, FunctionCall<NullContext>) - Method in class bixo.operations.LoadUrlsFunction
 
operate(FlowProcess, BufferCall<NullContext>) - Method in class bixo.operations.MakeFetchSetsBuffer
 
operate(FlowProcess, FunctionCall<NullContext>) - Method in class bixo.operations.NormalizeUrlFunction
 
operate(FlowProcess, FunctionCall<NullContext>) - Method in class bixo.operations.UrlLengthener
 
operate(FlowProcess, FunctionCall<NullContext>) - Method in class bixo.parser.DOMParser
 
Outlink - Class in bixo.datum
 
Outlink() - Constructor for class bixo.datum.Outlink
 
Outlink(String, String, String) - Constructor for class bixo.datum.Outlink
 
Outlink(String, String) - Constructor for class bixo.datum.Outlink
 
OUTLINKS_FN - Static variable in class bixo.datum.ParsedDatum
 

P

parse(FetchedDatum) - Method in class bixo.parser.BaseParser
 
parse(FetchedDatum) - Method in class bixo.parser.SimpleParser
 
PARSE_PIPE_NAME - Static variable in class bixo.pipes.ParsePipe
 
parseContent(String, byte[], String, String) - Method in class bixo.robots.BaseRobotsParser
Parse the robots.txt file in , and return rules appropriate for processing paths by
parseContent(String, byte[], String, String) - Method in class bixo.robots.SimpleRobotRulesParser
 
PARSED_META_FN - Static variable in class bixo.datum.ParsedDatum
 
PARSED_TEXT_FN - Static variable in class bixo.datum.ParsedDatum
 
ParsedDatum - Class in bixo.datum
 
ParsedDatum() - Constructor for class bixo.datum.ParsedDatum
No argument constructor for use with FutureTask
ParsedDatum(TupleEntry) - Constructor for class bixo.datum.ParsedDatum
 
ParsedDatum(String, String, String, String, String, Outlink[], Map<String, String>) - Constructor for class bixo.datum.ParsedDatum
 
ParsePipe - Class in bixo.pipes
 
ParsePipe(Pipe) - Constructor for class bixo.pipes.ParsePipe
 
ParsePipe(Pipe, BaseParser) - Constructor for class bixo.pipes.ParsePipe
 
ParserCounters - Enum in bixo.parser
 
ParserPolicy - Class in bixo.config
Definition of policy for parsing.
ParserPolicy() - Constructor for class bixo.config.ParserPolicy
 
ParserPolicy(int) - Constructor for class bixo.config.ParserPolicy
 
ParserPolicy(int, Set<String>, Set<String>) - Constructor for class bixo.config.ParserPolicy
 
peek() - Method in class bixo.utils.DiskQueue
 
poll() - Method in class bixo.utils.DiskQueue
 
prepare(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.FetchBuffer
 
prepare(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.FilterAndScoreByUrlAndRobots
 
prepare(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.UrlFilter
 
prepare(FlowProcess, OperationCall<NullContext>) - Method in class bixo.operations.UrlLengthener
 
prepare(FlowProcess, OperationCall<NullContext>) - Method in class bixo.parser.DOMParser
 
process(ParsedDatum, Document, TupleEntryCollector, FlowProcess) - Method in class bixo.parser.DOMParser
The _input ParsedDatum was successfully converted into a Dom4J Document.
processDeflateEncoded(byte[]) - Static method in class bixo.utils.EncodingUtils
 
processDeflateEncoded(byte[], int) - Static method in class bixo.utils.EncodingUtils
 
processGzipEncoded(byte[]) - Static method in class bixo.utils.EncodingUtils
 
processGzipEncoded(byte[], int) - Static method in class bixo.utils.EncodingUtils
 
processingInstruction(String, String) - Method in class bixo.parser.BoilerpipeContentExtractor
 
processingInstruction(String, String) - Method in class bixo.parser.HtmlContentExtractor
 
ProcessRobotsTask - Class in bixo.operations
 
ProcessRobotsTask(String, BaseScoreGenerator, Queue<GroupedUrlDatum>, BaseFetcher, BaseRobotsParser, TupleEntryCollector, LoggingFlowProcess) - Constructor for class bixo.operations.ProcessRobotsTask
 

R

readBaseFields(DataInput) - Method in exception bixo.exceptions.BaseFetchException
 
readFields(DataInput) - Method in class bixo.datum.ContentBytes
 
readFields(DataInput) - Method in class bixo.datum.HttpHeaders
 
readFields(DataInput) - Method in class bixo.datum.Outlink
 
readFields(DataInput) - Method in exception bixo.exceptions.AbortedFetchException
 
readFields(DataInput) - Method in exception bixo.exceptions.HttpFetchException
 
readFields(DataInput) - Method in exception bixo.exceptions.IOFetchException
 
readFields(DataInput) - Method in exception bixo.exceptions.RedirectFetchException
 
readFields(DataInput) - Method in exception bixo.exceptions.UrlFetchException
 
readFields(DataInput) - Method in class bixo.hadoop.DiskBytesWritable
 
readInputLine() - Static method in class bixo.utils.IoUtils
Read one line of input from the console.
RedirectFetchException - Exception in bixo.exceptions
 
RedirectFetchException() - Constructor for exception bixo.exceptions.RedirectFetchException
 
RedirectFetchException(String, String, RedirectFetchException.RedirectExceptionReason) - Constructor for exception bixo.exceptions.RedirectFetchException
 
RedirectFetchException.RedirectExceptionReason - Enum in bixo.exceptions
 
remove() - Method in class bixo.utils.DiskQueue
 
rename(BasePath, BasePath) - Method in class bixo.config.BixoPlatform
 
reset() - Method in class bixo.parser.BaseContentExtractor
 
reset() - Method in class bixo.parser.BaseLinkExtractor
 
reset() - Method in class bixo.parser.BoilerpipeContentExtractor
 
reset() - Method in class bixo.parser.HtmlContentExtractor
 
reset() - Method in class bixo.parser.SimpleContentExtractor
 
reset() - Method in class bixo.parser.SimpleLinkExtractor
 
resetNumReduceTasks() - Method in class bixo.config.BixoPlatform
 
ResolveRedirectsTask - Class in bixo.operations
 
ResolveRedirectsTask(String, BaseFetcher, TupleEntryCollector, FlowProcess) - Constructor for class bixo.operations.ResolveRedirectsTask
 
RESPONSE_RATE_FN - Static variable in class bixo.datum.FetchedDatum
 
RobotUtils - Class in bixo.robots
 
RobotUtils() - Constructor for class bixo.robots.RobotUtils
 
run() - Method in class bixo.fetcher.FetchTask
 
run() - Method in class bixo.operations.ProcessRobotsTask
 
run() - Method in class bixo.operations.ResolveRedirectsTask
 

S

safeClose(InputStream) - Static method in class bixo.utils.IoUtils
 
safeClose(OutputStream) - Static method in class bixo.utils.IoUtils
 
safeGetHost(String) - Static method in class bixo.utils.DomainNames
No-exception utility routine to return the hostname for a URL.
ScoredUrlDatum - Class in bixo.datum
 
ScoredUrlDatum() - Constructor for class bixo.datum.ScoredUrlDatum
 
ScoredUrlDatum(Tuple) - Constructor for class bixo.datum.ScoredUrlDatum
 
ScoredUrlDatum(TupleEntry) - Constructor for class bixo.datum.ScoredUrlDatum
 
ScoredUrlDatum(String) - Constructor for class bixo.datum.ScoredUrlDatum
 
ScoredUrlDatum(String, String, UrlStatus) - Constructor for class bixo.datum.ScoredUrlDatum
 
ScoredUrlDatum(String, String, UrlStatus, double) - Constructor for class bixo.datum.ScoredUrlDatum
 
seMinPageFetchInterval(long) - Method in class bixo.config.FetcherPolicy
Set the minimum time (in milliseconds) between each page fetch request, when fetching a FetchSet worth of URLs using a single connection.
set(DiskBytesWritable) - Method in class bixo.hadoop.DiskBytesWritable
Set the BytesWritable to the contents of the given newData.
set(byte[], int, int) - Method in class bixo.hadoop.DiskBytesWritable
Set the value to a copy of the given byte range
setAcceptEncoding(String) - Method in class bixo.fetcher.SimpleHttpFetcher
 
setAcceptLanguage(String) - Method in class bixo.config.FetcherPolicy
 
setBaseUrl(String) - Method in class bixo.datum.ContentDatum
 
setCapacity(int) - Method in class bixo.hadoop.DiskBytesWritable
Change the capacity of the backing storage.
setConnectionTimeout(int) - Method in class bixo.fetcher.SimpleHttpFetcher
 
setContent(ContentBytes) - Method in class bixo.datum.ContentDatum
 
setContent(ContentBytes) - Method in class bixo.datum.FetchedDatum
 
setContentType(String) - Method in class bixo.datum.ContentDatum
 
setContentType(String) - Method in class bixo.datum.FetchedDatum
 
setCrawlDelay(long) - Method in class bixo.config.FetcherPolicy
Deprecated.
setCrawlDelay(long) - Method in class bixo.robots.BaseRobotRules
 
setCrawlEndTime(long) - Method in class bixo.config.FetcherPolicy
 
setDefaultCrawlDelay(long) - Method in class bixo.config.BaseFetchJobPolicy
 
setDefaultMaxContentSize(int) - Method in class bixo.fetcher.BaseFetcher
 
setDeferVisits(boolean) - Method in class bixo.robots.BaseRobotRules
 
setDocumentLocator(Locator) - Method in class bixo.parser.BoilerpipeContentExtractor
 
setDocumentLocator(Locator) - Method in class bixo.parser.HtmlContentExtractor
 
setException(BaseFetchException) - Method in class bixo.datum.StatusDatum
 
setExpanded(byte[]) - Method in class bixo.utils.EncodingUtils.ExpandedResult
 
setExtractLanguage(boolean) - Method in class bixo.parser.SimpleParser
 
setFetchDelay(long) - Method in class bixo.datum.FetchSetDatum
 
setFetchedUrl(String) - Method in class bixo.datum.ContentDatum
 
setFetchedUrl(String) - Method in class bixo.datum.FetchedDatum
 
setFetcherMode(FetcherPolicy.FetcherMode) - Method in class bixo.config.FetcherPolicy
 
setFetchTime(long) - Method in class bixo.datum.FetchedDatum
 
setFetchTime(long) - Method in class bixo.datum.FetchSetDatum
 
setFlowPriority(BasePlatform.FlowPriority) - Method in class bixo.config.BixoPlatform
 
setGroupingKey(int) - Method in class bixo.datum.FetchSetDatum
 
setGroupingRef(String) - Method in class bixo.datum.FetchSetDatum
 
setGroupKey(String) - Method in class bixo.datum.GroupedUrlDatum
 
setHeaders(HttpHeaders) - Method in class bixo.datum.ContentDatum
 
setHeaders(HttpHeaders) - Method in class bixo.datum.FetchedDatum
 
setHeaders(HttpHeaders) - Method in class bixo.datum.StatusDatum
 
setHostAddress(String) - Method in class bixo.datum.ContentDatum
 
setHostAddress(String) - Method in class bixo.datum.FetchedDatum
 
setHostAddress(String) - Method in class bixo.datum.ParsedDatum
 
setHostAddress(String) - Method in class bixo.datum.StatusDatum
 
setHttpVersion(HttpVersion) - Method in class bixo.fetcher.SimpleHttpFetcher
 
setJobPollingInterval(long) - Method in class bixo.config.BixoPlatform
 
setLanguage(String) - Method in class bixo.datum.ParsedDatum
 
setLastList(boolean) - Method in class bixo.datum.FetchSetDatum
 
setLinkAttributeTypes(Set<String>) - Method in class bixo.config.ParserPolicy
 
setLinkAttributeTypes(Set<String>) - Method in class bixo.parser.BaseLinkExtractor
 
setLinkTags(Set<String>) - Method in class bixo.config.ParserPolicy
 
setLinkTags(Set<String>) - Method in class bixo.parser.BaseLinkExtractor
 
setLogDir(File) - Method in class bixo.config.BixoPlatform
 
setLogLevel(Level, String...) - Method in class bixo.config.BixoPlatform
 
setMaxConnectionsPerHost(int) - Method in class bixo.config.FetcherPolicy
 
setMaxContentSize(int) - Method in class bixo.config.FetcherPolicy
Deprecated.
setMaxContentSize(String, int) - Method in class bixo.fetcher.BaseFetcher
 
setMaxParseDuration(int) - Method in class bixo.config.ParserPolicy
 
setMaxRedirects(int) - Method in class bixo.config.FetcherPolicy
 
setMaxRequestsPerConnection(int) - Method in class bixo.config.FetcherPolicy
 
setMaxRetryCount(int) - Method in class bixo.fetcher.SimpleHttpFetcher
 
setMetadata(Map<String, Comparable>) - Method in class bixo.datum.UrlAndMetadata
 
setMetaDataMap(Payload) - Method in class bixo.fetcher.FetchedResult
 
setMinResponseRate(int) - Method in class bixo.config.FetcherPolicy
 
setNewBaseUrl(String) - Method in class bixo.datum.FetchedDatum
 
setNumRedirects(int) - Method in class bixo.datum.FetchedDatum
 
setNumReduceTasks(int) - Method in class bixo.config.BixoPlatform
 
setOutlinks(Outlink[]) - Method in class bixo.datum.ParsedDatum
 
setParsedMeta(Map<String, String>) - Method in class bixo.datum.ParsedDatum
 
setParsedText(String) - Method in class bixo.datum.ParsedDatum
 
setProperty(String, String) - Method in class bixo.config.BixoPlatform
 
setProperty(String, int) - Method in class bixo.config.BixoPlatform
 
setProperty(String, boolean) - Method in class bixo.config.BixoPlatform
 
setRedirectMode(FetcherPolicy.RedirectMode) - Method in class bixo.config.FetcherPolicy
 
setRequestTimeout(long) - Method in class bixo.config.FetcherPolicy
 
setResponseRate(int) - Method in class bixo.datum.FetchedDatum
 
setScore(double) - Method in class bixo.datum.ScoredUrlDatum
 
setSize(int) - Method in class bixo.hadoop.DiskBytesWritable
Change the size of the buffer.
setSkipped(boolean) - Method in class bixo.datum.FetchSetDatum
 
setSocketTimeout(int) - Method in class bixo.fetcher.SimpleHttpFetcher
 
setStatus(UrlStatus) - Method in class bixo.datum.ScoredUrlDatum
 
setStatus(UrlStatus) - Method in class bixo.datum.StatusDatum
 
setStatusTime(long) - Method in class bixo.datum.StatusDatum
 
setTitle(String) - Method in class bixo.datum.ParsedDatum
 
setTruncated(boolean) - Method in class bixo.utils.EncodingUtils.ExpandedResult
 
setUrl(String) - Method in class bixo.datum.FetchedDatum
 
setUrl(String) - Method in class bixo.datum.ParsedDatum
 
setUrl(String) - Method in class bixo.datum.StatusDatum
 
setUrl(String) - Method in class bixo.datum.UrlAndMetadata
 
setUrl(String) - Method in class bixo.datum.UrlDatum
 
setUrls(List<ScoredUrlDatum>) - Method in class bixo.datum.FetchSetDatum
 
setValidMimeTypes(Set<String>) - Method in class bixo.config.FetcherPolicy
 
shareLocalDir(String) - Method in class bixo.config.BixoPlatform
 
SimpleContentExtractor - Class in bixo.parser
 
SimpleContentExtractor() - Constructor for class bixo.parser.SimpleContentExtractor
 
SimpleHttpFetcher - Class in bixo.fetcher
 
SimpleHttpFetcher(UserAgent) - Constructor for class bixo.fetcher.SimpleHttpFetcher
 
SimpleHttpFetcher(int, UserAgent) - Constructor for class bixo.fetcher.SimpleHttpFetcher
 
SimpleHttpFetcher(int, FetcherPolicy, UserAgent) - Constructor for class bixo.fetcher.SimpleHttpFetcher
 
SimpleLinkExtractor - Class in bixo.parser
 
SimpleLinkExtractor() - Constructor for class bixo.parser.SimpleLinkExtractor
 
SimpleParser - Class in bixo.parser
 
SimpleParser() - Constructor for class bixo.parser.SimpleParser
 
SimpleParser(ParserPolicy) - Constructor for class bixo.parser.SimpleParser
 
SimpleParser(BaseContentExtractor, BaseLinkExtractor, ParserPolicy) - Constructor for class bixo.parser.SimpleParser
 
SimpleParser(ParserPolicy, boolean) - Constructor for class bixo.parser.SimpleParser
 
SimpleParser(BaseContentExtractor, BaseLinkExtractor, ParserPolicy, boolean) - Constructor for class bixo.parser.SimpleParser
 
SimpleParser(BaseContentExtractor, BaseLinkExtractor, ParserPolicy, ParseContext) - Constructor for class bixo.parser.SimpleParser
 
SimpleRobotRules - Class in bixo.robots
Result from parsing a single robots.txt file - which means we get a set of rules, and a crawl-delay.
SimpleRobotRules() - Constructor for class bixo.robots.SimpleRobotRules
 
SimpleRobotRules(SimpleRobotRules.RobotRulesMode) - Constructor for class bixo.robots.SimpleRobotRules
 
SimpleRobotRules.RobotRule - Class in bixo.robots
Single rule that maps from a path prefix to an allow flag.
SimpleRobotRules.RobotRule(String, boolean) - Constructor for class bixo.robots.SimpleRobotRules.RobotRule
 
SimpleRobotRules.RobotRule(Pattern, boolean) - Constructor for class bixo.robots.SimpleRobotRules.RobotRule
 
SimpleRobotRules.RobotRulesMode - Enum in bixo.robots
 
SimpleRobotRulesParser - Class in bixo.robots
 
SimpleRobotRulesParser() - Constructor for class bixo.robots.SimpleRobotRulesParser
 
SimpleUrlFilter - Class in bixo.urls
Simple UrlFilter that uses a URL validator.
SimpleUrlFilter() - Constructor for class bixo.urls.SimpleUrlFilter
 
SimpleUrlFilter(BaseUrlValidator) - Constructor for class bixo.urls.SimpleUrlFilter
 
SimpleUrlNormalizer - Class in bixo.urls
 
SimpleUrlNormalizer() - Constructor for class bixo.urls.SimpleUrlNormalizer
 
SimpleUrlNormalizer(boolean) - Constructor for class bixo.urls.SimpleUrlNormalizer
 
SimpleUrlNormalizer(boolean, boolean) - Constructor for class bixo.urls.SimpleUrlNormalizer
 
SimpleUrlValidator - Class in bixo.urls
 
SimpleUrlValidator() - Constructor for class bixo.urls.SimpleUrlValidator
 
size() - Method in class bixo.utils.DiskQueue
 
SKIP_SCORE - Static variable in class bixo.operations.BaseScoreGenerator
 
SKIPPED_GROUPING_KEY - Static variable in class bixo.utils.GroupingKey
 
skippedEntity(String) - Method in class bixo.parser.BoilerpipeContentExtractor
 
skippedEntity(String) - Method in class bixo.parser.HtmlContentExtractor
 
splitOnChar(String, char) - Static method in class bixo.utils.StringUtils
 
startDocument() - Method in class bixo.parser.BoilerpipeContentExtractor
 
startDocument() - Method in class bixo.parser.HtmlContentExtractor
 
startElement(String, String, String, Attributes) - Method in class bixo.parser.BaseContentExtractor
 
startElement(String, String, String, Attributes) - Method in class bixo.parser.BaseLinkExtractor
 
startElement(String, String, String, Attributes) - Method in class bixo.parser.BoilerpipeContentExtractor
 
startElement(String, String, String, Attributes) - Method in class bixo.parser.HtmlContentExtractor
 
startElement(String, String, String, Attributes) - Method in class bixo.parser.NullLinkExtractor
 
startElement(String, String, String, Attributes) - Method in class bixo.parser.SimpleLinkExtractor
 
startFetchSet(String, long) - Method in class bixo.config.BaseFetchJobPolicy
 
startFetchSet(String, long) - Method in class bixo.config.DefaultFetchJobPolicy
 
startPrefixMapping(String, String) - Method in class bixo.parser.BoilerpipeContentExtractor
 
startPrefixMapping(String, String) - Method in class bixo.parser.HtmlContentExtractor
 
STATUS_FN - Static variable in class bixo.datum.StatusDatum
 
STATUS_PIPE_NAME - Static variable in class bixo.pipes.FetchPipe
 
STATUS_TIME_FN - Static variable in class bixo.datum.StatusDatum
 
StatusDatum - Class in bixo.datum
 
StatusDatum() - Constructor for class bixo.datum.StatusDatum
 
StatusDatum(TupleEntry) - Constructor for class bixo.datum.StatusDatum
 
StatusDatum(String, HttpHeaders, String, Payload) - Constructor for class bixo.datum.StatusDatum
Constructor for creating StatusDatum for a URL that was fetched successfully.
StatusDatum(String, BaseFetchException, Payload) - Constructor for class bixo.datum.StatusDatum
 
StatusDatum(String, UrlStatus, Payload) - Constructor for class bixo.datum.StatusDatum
 
StatusDatum(String, UrlStatus, HttpHeaders, BaseFetchException, long, String, Payload) - Constructor for class bixo.datum.StatusDatum
 
StringUtils - Class in bixo.utils
 
StringUtils() - Constructor for class bixo.utils.StringUtils
 

T

terminate(long) - Method in class bixo.utils.ThreadedExecutor
Terminate the thread pool.
ThreadedExecutor - Class in bixo.utils
A wrapper for ThreadPoolExecutor that implements a specific behavior we need in Bixo.
ThreadedExecutor(int, long) - Constructor for class bixo.utils.ThreadedExecutor
 
TimeStampUtils - Class in bixo.utils
 
TimeStampUtils() - Constructor for class bixo.utils.TimeStampUtils
 
TITLE_FN - Static variable in class bixo.datum.ParsedDatum
 
toHexString(byte[]) - Static method in class bixo.utils.StringUtils
Convenience call for StringUtils.toHexString(byte[], String, int), where sep = null; lineLen = Integer.MAX_VALUE.
toHexString(byte[], String, int) - Static method in class bixo.utils.StringUtils
Get a text representation of a byte[] as hexadecimal String, where each pair of hexadecimal digits corresponds to consecutive bytes in the array.
toString() - Method in class bixo.config.FetcherPolicy
 
toString() - Method in class bixo.config.ParserPolicy
 
toString() - Method in class bixo.datum.ContentBytes
Generate the stream of bytes as hex pairs separated by ' '.
toString() - Method in class bixo.datum.ContentDatum
 
toString() - Method in class bixo.datum.FetchedDatum
 
toString() - Method in class bixo.datum.HttpHeaders
 
toString() - Method in class bixo.datum.Outlink
 
toString() - Method in exception bixo.exceptions.BaseFetchException
 
toTuple() - Method in class bixo.datum.HttpHeaders
 

U

UNKNOWN_HOST_GROUPING_KEY - Static variable in class bixo.utils.GroupingKey
 
UNSET_CRAWL_DELAY - Static variable in class bixo.config.BaseFetchJobPolicy
 
URL_FN - Static variable in class bixo.datum.FetchedDatum
 
URL_FN - Static variable in class bixo.datum.ParsedDatum
 
URL_FN - Static variable in class bixo.datum.StatusDatum
 
URL_FN - Static variable in class bixo.datum.UrlDatum
 
URL_FN - Static variable in class bixo.operations.UrlLengthener
 
UrlAndMetadata - Class in bixo.datum
 
UrlAndMetadata(String, Map<String, Comparable>) - Constructor for class bixo.datum.UrlAndMetadata
 
UrlDatum - Class in bixo.datum
 
UrlDatum() - Constructor for class bixo.datum.UrlDatum
 
UrlDatum(UrlDatum) - Constructor for class bixo.datum.UrlDatum
 
UrlDatum(Fields) - Constructor for class bixo.datum.UrlDatum
 
UrlDatum(Fields, Tuple) - Constructor for class bixo.datum.UrlDatum
 
UrlDatum(TupleEntry) - Constructor for class bixo.datum.UrlDatum
 
UrlDatum(Fields, String) - Constructor for class bixo.datum.UrlDatum
 
UrlDatum(String) - Constructor for class bixo.datum.UrlDatum
 
UrlFetchException - Exception in bixo.exceptions
 
UrlFetchException() - Constructor for exception bixo.exceptions.UrlFetchException
 
UrlFetchException(String, String) - Constructor for exception bixo.exceptions.UrlFetchException
 
UrlFilter - Class in bixo.operations
 
UrlFilter(BaseUrlFilter) - Constructor for class bixo.operations.UrlFilter
 
UrlLengthener - Class in bixo.operations
 
UrlLengthener(BaseFetcher) - Constructor for class bixo.operations.UrlLengthener
 
UrlLengthener(BaseFetcher, Fields) - Constructor for class bixo.operations.UrlLengthener
 
UrlStatus - Enum in bixo.datum
 
UrlUtils - Class in bixo.utils
 
UrlUtils() - Constructor for class bixo.utils.UrlUtils
 
UserAgent - Class in bixo.config
 
UserAgent(String, String, String) - Constructor for class bixo.config.UserAgent
 
UserAgent(String, String, String, String) - Constructor for class bixo.config.UserAgent
 
UserAgent(String, String, String, String, String) - Constructor for class bixo.config.UserAgent
 

V

valueOf(String) - Static method in enum bixo.config.BixoPlatform.Platform
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.config.FetcherPolicy.FetcherMode
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.config.FetcherPolicy.RedirectMode
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.datum.UrlStatus
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.exceptions.AbortedFetchReason
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.exceptions.RedirectFetchException.RedirectExceptionReason
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.hadoop.FetchCounters
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.hadoop.ImportCounters
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.parser.ParserCounters
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum bixo.robots.SimpleRobotRules.RobotRulesMode
Returns the enum constant of this type with the specified name.
values() - Static method in enum bixo.config.BixoPlatform.Platform
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.config.FetcherPolicy.FetcherMode
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.config.FetcherPolicy.RedirectMode
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.datum.UrlStatus
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.exceptions.AbortedFetchReason
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.exceptions.RedirectFetchException.RedirectExceptionReason
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.hadoop.FetchCounters
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.hadoop.ImportCounters
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.parser.ParserCounters
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum bixo.robots.SimpleRobotRules.RobotRulesMode
Returns an array containing the constants of this enum type, in the order they are declared.

W

write(DataOutput) - Method in class bixo.datum.ContentBytes
 
write(DataOutput) - Method in class bixo.datum.HttpHeaders
 
write(DataOutput) - Method in class bixo.datum.Outlink
 
write(DataOutput) - Method in exception bixo.exceptions.AbortedFetchException
 
write(DataOutput) - Method in exception bixo.exceptions.HttpFetchException
 
write(DataOutput) - Method in exception bixo.exceptions.IOFetchException
 
write(DataOutput) - Method in exception bixo.exceptions.RedirectFetchException
 
write(DataOutput) - Method in exception bixo.exceptions.UrlFetchException
 
write(DataOutput) - Method in class bixo.hadoop.DiskBytesWritable
 
writeBaseFields(DataOutput) - Method in exception bixo.exceptions.BaseFetchException
 

_

_contentExtractor - Variable in class bixo.parser.SimpleParser
 
_crawlDelay - Variable in class bixo.config.FetcherPolicy
 
_curAnchor - Variable in class bixo.parser.BaseLinkExtractor
 
_curRelAttributes - Variable in class bixo.parser.BaseLinkExtractor
 
_curUrl - Variable in class bixo.parser.BaseLinkExtractor
 
_fetcherPolicy - Variable in class bixo.fetcher.BaseFetcher
 
_inAnchorTag - Variable in class bixo.parser.BaseLinkExtractor
 
_inBody - Variable in class bixo.parser.BaseContentExtractor
 
_inHead - Variable in class bixo.parser.BaseContentExtractor
 
_inTitle - Variable in class bixo.parser.BaseContentExtractor
 
_linkAttributeTypes - Variable in class bixo.parser.BaseLinkExtractor
 
_linkExtractor - Variable in class bixo.parser.SimpleParser
 
_linkTags - Variable in class bixo.parser.BaseLinkExtractor
 
_maxContentSizes - Variable in class bixo.fetcher.BaseFetcher
 
_maxThreads - Variable in class bixo.fetcher.BaseFetcher
 
_parseContext - Variable in class bixo.parser.SimpleParser
 
_userAgent - Variable in class bixo.fetcher.BaseFetcher
 
A B C D E F G H I L M N O P R S T U V W _ 

Copyright © 2012 Bixo Labs