Crawling password protected websites
Introduction
Some websites are protected by an authentication scheme which requires a username/password combination to access the site. In order for Funnelback to successfully crawl password protected sites, it must be given a valid user name and password to use.
The authentication schemes that Funnelback currently supports are:
- HTTP Basic Authentication
- Windows Integrated Authentication (NTLM)
Giving Funnelback a username and password
Funnelback supports multiple HTTP Basic username/password pairs per collection. If you have a single account to configure you can set the values using parameters in a collection's collection.cfg file. To allow Funnelback access to the protected website:
For basic HTTP authentication:
- Set the
http_userparameter to a valid HTTP Basic username. - Set the
http_passwdparameter to the HTTP Basic username's password.
For NTLM/Windows Integrated authentication:
- Set the
crawler.ntlm.domainparameter to a valid NTLM domain. - Set the
crawler.ntlm.usernameparameter to a valid username in the NTLM domain. - Set the
crawler.ntlm.passwordparameter to the NTLM username's password.
For FTP sites:
- Set the
ftp_userparameter to a valid FTP username. - Set the
ftp_passwdparameter to the FTP Basic username's password.
Note: ftp will need to be added to the crawler.protocols in order to crawl an FTP site.
Specifying multiple HTTP Basic usernames and passwords
If you need to specify multiple HTTP Basic accounts for different web servers you can configure this using site profiles.