Protecting Your Websites From Search Engines
If you are accepting documents from clients in the form of a word document or pdf document and you're storing them in a directory on your web server, whether or not you have the section of the website password protected or not, the files on that web server are open in that directory simply because they reside on the web server relative to the root directory. And thanks to the power of Google, you can determine what word and pdf documents are available on a website by doing a simple search such as this:
google filetype:doc
Experiment with this format in a Google search, the results may surprise you. It surprised me once when I did a search for pdf documents for a bank website and found pdf documents that were designated for client eyes only, only I wasn't a client and I was able to view their documents.
The website itself was protected, and the area to access the documents was protected by username and password. What wasn't protected was the directory in which the institution was storing the documents. Adding insult to injury was not having done a common web practice in utilizing a robots.txt file in the root directory of their website.
There are a number of levels of security that simply must be in place. I'm certainly not a security expert, but I know that you have to protect your website from a network level, and from an application level as well. Your network operations team needs to do their part in locking down your directories, and installing patches and configuring and monitoring firewalls.
And developers need to do their part in securing their applications that require security. A very simple way to do that is by including a robots.txt file in the root directory of the web server. I'm not speaking of the root directory of the application, I'm talking about the web server.
The robots.txt file must reside in the root of the domain and must be named "robots.txt". A robots.txt file located in a subdirectory isn't valid, as bots only check for this file in the root of the domain. For instance, "http://www.example.com/robots.txt" is a valid location. But, "http://www.example.com/mysite/robots.txt" is not.
There are many variations to your robots.txt file and you have a lot of flexibility in what directories or files you want indexed by search engines, and those you do not. You do not use a robot.txt file to assure that pages are indexed with search engines, you use them to define what files and directories you do not want indexed by search engines. This is one simple but very important way to keep someone from finding the "For Your Eyes Only" documents relative to your company website.
Related Tags: google, search engines, robots.txt, robots
Ben Cortese is a developer and business analyst for the financial industry and enjoys developing websites through MerchantWeb Marketing.
Copyright 2007.
Your Article Search Directory : Find in ArticlesRecent articles in this category:
- Never Undervalue Secure Backups
The Internet can be a very intimidating place. It is full of information, and every day new busi - Lack of Internet Privacy - One Step Away from Identity Theft
Four Reasons to Use Privacy Software Unless you are a spammer, hacker, terrorist, or other such - Site Security Issues Abound
It has happened web wide and it has happened to the best of the best - NASA, DOD, Google, Micros - Online Security: How Secure are You When You Get on the Internet?
Internet technology specialists widely agree that security is becoming the primary concern of th - Software Engineering Standards Providing Industry Integrity
Software engineering is a relatively new career field in technology today in comparison to other - Protect Your E-mail by Obfuscatoin
E-mail harvesting is the process of obtaining lists of e-mail addresses from the internet. This is u - Antispyware Host File - Protect Your Computer Now With This Simple Fix
An antispyware host file is a simple defence mechanism against rogue sites that are out to get you a - What Key Features Make The Best Spyware Removal Programs?
Range Of FeaturesThe best spyware removal programs should combine multiple features so that you get - Online Job Scams and Identity Theft-What Every Job Hunter Needs to Know
There are three main types of common online job scams:1. Phony Job Offers-With this type of scam, In - Spybot Sickness - Spyware Flue
When we talk about spyware symptoms, we talk about both how you get spyware and what the symptoms of
Most viewed articles in this category:
- Parental Control Software - How Will It Help My Family?
Parental Control Software such as Safe Eyes, is a set of tools that allows parents to control what o - Designing IE Exclusive Sites Is Counterproductive And Puts Your Visitors At Risk
Excuse me for being so forthright, but designing a web site exclusively for a specific browser is do - Mind the letters and words in your password.
Radhika Venkata (c).1. Don't choose predictable passwords like asending or desending numbers or lett - Safe Password Tips for Better Computer Security
With so many online accounts to manage, most people tend to use the same password for everything. M - Minimising Credit Card Fraud - For Online Retailers
Well organized criminal organizations steal credit card numbers in many different ways (virus progra - Fraud Prevention Tips
Current areas where mail order, telephone order, and Internet fraud are most prevalent include:* Wes - Money Scams: How to Avoid Getting Ripped Off
Scams take many forms: overseas lotteries, get-rich-quick schemes, work-from-home jobs and hundreds - The Best Internet Privacy Software
Detect, Protect, Dis-infect, Reject, Delete, then - RepeatThe best internet privacy software isn't a - The Most Effective Spyware Removers
There are a number of things you need to look for when you are seeking the most effective spyware re - Drive Out That Spybot
Spyware, Keyloggers, Hackers, Cyber Terrorists, Cybercriminals, Cybergangs, etc. These threats are r