www.crawlwall.com    the firewall for webpages  

CrawlWall™ Bot Blocker FAQ

Can I block spiders with robots.txt?

Yes, you can deny a spider access to your site using robots.txt. However, you can only block those spiders that actually honor robots.txt, which is a small portion of what's crawling the web. It's a time consuming tedious task to track and maintain lists of spiders and then research which actually use robots.txt, which don't and what user agent name they use.

CrawlWall simplifies managing your robots.txt file by showing you what's known to be crawling and gives you the power to quickly point and click which are allowed and which are denied.

Doesn't .htaccess allow me to filter out spiders that don't honor robots.txt?

Yes, but that is an opt-out strategy and you have to know the name of the spider before you can block it. if you want to spend all your time looking at log files and chasing the new robot of the day, you can manually filter them. Unfortunately, some spiders now use random gibberish names so they won't ever be in your filter list. Additionally, spiders sometimes change their names periodically so what you block today is crawling again tomorrow.

CrawlWall eliminates all these problems using  an opt-in strategy. This means that all crawlers are blocked by default and you choose to opt-in the spiders you want on your website, like Google or Yahoo.

Can't bad spiders just fake being Googlebot to gain access?

That was possible using traditional filtering techniques, but CrawlWall actually knows what IP addresses all the major spiders use and filters out everything pretending to be Google, Yahoo, etc. from other IP addresses.

Additionally, companies like Google and Yahoo provide other proxy related services that access your website and CrawlWall can differentiate between the actual search engine spiders and other services which provides protection from bad bots trying to crawl your site thru a language translator, web accelerator, or other technologies.

How does CrawlWall avoid blocking human visitors?

When CrawlWall suspects automated activity is happening that is being cloaked as a human using a browser, a challenge is presented to the visitor. Humans can easily respond to the challenge and continue using the website while a stealth spider will become trapped and quarantined from further access.

How does CrawlWall stop bad spiders from bypassing your security measures?

CrawlWall uses an array of techniques that profile behavior to identify a human vs. a spider. Additionally, there is a constantly changing set of challenges so that the spiders can't easily program around or use "blow-through" techniques to defeat CrawlWall.

How difficult is CrawlWall to install?

CrawlWall is fairly simple to install in about 5 minutes into any webhosting account that supports PHP. It's designed to work seamlessly even with servers managed by control panels like Plesk, CPanel and others.

Can CrawlWall protect an entire server?

Not at this time. Our initial emphasis is on letting individual webmasters protect themselves in a hosted environment. Full server protection is planned for the future.

What's the CrawlWall Networking Option?

The true power of CrawlWall is realized with the networking option which allows all of the CrawlWall sites to communicate new threats to the entire network of sites.

Will my website go down if the CrawlWall Network goes offline?

No! 

That's the best part as CrawlWall will continue to operate normally with no degradation in service and protect your site even if the CrawlWall Network is unavailable. The only difference when your server can't connect to the CrawlWall Network is that live notification of active threats or newly identified spiders won't arrive until the CrawlWall Network is reachable by your server. CrawlWall will still protect your server as usual but it will have to go through it's regular threat assessment methods before stopping a crawler instead of blocking them on the first page request.



All trademarks, trade names, service marks and logos referenced herein belong to their respective companies