public class SimpleRobotRulesParser extends BaseRobotsParser
Constructor and Description |
---|
SimpleRobotRulesParser() |
Modifier and Type | Method and Description |
---|---|
BaseRobotRules |
failedFetch(int httpStatusCode)
The fetch of robots.txt failed, so return rules appropriate give the
HTTP status code.
|
int |
getNumWarnings() |
BaseRobotRules |
parseContent(java.lang.String url,
byte[] content,
java.lang.String contentType,
java.lang.String robotName)
Parse the robots.txt file in
|
public BaseRobotRules failedFetch(int httpStatusCode)
BaseRobotsParser
failedFetch
in class BaseRobotsParser
httpStatusCode
- a failure status code (NOT 2xx)public BaseRobotRules parseContent(java.lang.String url, byte[] content, java.lang.String contentType, java.lang.String robotName)
BaseRobotsParser
parseContent
in class BaseRobotsParser
url
- URL that content was fetched from (for reporting purposes)content
- raw bytes from the site's robots.txt filecontentType
- HTTP response header (mime-type)robotName
- name of crawler, to be used when processing file contents
(just the name portion, w/o version or other details)public int getNumWarnings()
Copyright © 2012 Bixo Labs