Using web fetching to create novel websites
Collecting information from a website and presenting it to your visitors in a different, more useful/interesting format can be a valuable service. It can help attract new visitors and get existing visitors coming back for more.
The best way to collect data from a website is to use their web service if they provide one. Google, Yahoo, MSN, amazon, ebay, technorati and numerous others all offer web services. These are well documented interfaces through which you can easily collect data. The problems come when you want to collect data from a site that doesn't provide a web service.
If a site doesn't offer a web service or some other well structured version of their data then the only option remaining is web scraping. This can be messy, often requires extensive testing and debugging, and will break if ever the target website changes its design. With all these problems it should only ever be a last resort.
The reason it is so difficult is that you are collecting the data you need from the html used to display the website to each visitor. Each site is unique. Each web scraper is also unique.
If the site we are interested in doesn't offer a web service, no rss feed and we can't get the data we need from anywhere else then how do we go about building a web scraper?
The process can be broken down into three parts: get the html of the page, extract the data we need, and then do something with that data.
Depending on how the site you are fetching information from is set up fetching the html could be really very easy or exceedingly complicated. At its simplest all you will need to do is open up a file just as you would a file located on your local server. If you don't need to log in to view the data you want on the site then it may well be this easy. For your sake I hope it is this simple.
If you need to log in to the site to access the data you need then things can get complicated . . . really complicated. I'm currently trying to collect/develop scripts to fetch the contact lists from webmail services such as hotmail. Here the obstacles you must face include cookies and variables in the URL. These problems can be overcome but you'll need plenty of time.
Once you have the html for the page you need to clean it up so that you have just the data you need with no extra text. The best way to do this is with regular expressions. By defining patterns for the data you want you can cut out the chaff and just keep what you need. Unless you are extremely comfortable with regular expressions you may find it easier to use several simpler patterns one after each other. Your script may take a little longer to run but it will be far easier to develop.
Finally you want to do something with the data you gather. This may be as simple as storing it in a text file or displaying it immediately to a visitor to your site. You may also decide to do some more complex tasks with the data. The important thing is that once you have the data you have the choice of what to do with it.
Web scraping isn't easy but the benefits can be considerable. If a web service is available though save yourself some time and use that instead.
TorrentialWebDev focuses on developing the tools to build and promote Web Applications.
Related Tags: data collection, web fetching, data mining, web service
Your Article Search Directory : Find in Articles
Recent articles in this category:
- Things To Know About Getting Visitors To Your Website
You Must Get Targeted Traffic To Your Business Website, Paid or Free.Let's face it! Without traffic
- Website Designing :better Buck Up For Better Conversion
Slow and steady wins the race. Is it? It may not be true for the realms of offshore web designing. W
- Know Basics Of Websites Before Designing Website For Your Specific Needs
Present is the age of internet and without having an online identity no organization or firm is comp
- Web Design Methodology
Since the budding of the internet the use of web has been increased by leaps and bounds due to boomi
- Indian Php Developers Is Cost-effective Web Development Solutions
PHP is an open source programming language which is used to create customized web development soluti
- Why A Logo
Pictures are a powerful medium of communication; they tend to stick with us. You could forget a line
- 8 Common Web Development Mistakes To Avoid
When you are ready to start with your online business the first thing which comes in front of you is
- Web Design- The Window To A World Of Success
It can take your business to the international level in a quick and easy way. Your website can take
- Only Basements - Perfect Basement Design Ideas In Ottawa And Montreal
An idea can change our lives. Any idea which is constructive offers great benefits. It can be anythi
- A Premium Website Helps In Creating Charisma In The Web World
Websites play a very important role in the advancement of a company. It is your website that forms a
Most viewed articles in this category:
- Why Custom Logo Design Matters-Are You Satisfied
Why custom logo design matters With so many companies competing for the same clients, it's becoming
- Internet Web Page Design
Internet web page design is something we must master if we are to build a successful internet busine
- Is your website innovative? Increase Your Business Over Night!
While studying online for IT investment opportunities, I found that one main factor was constant. Al
- How to Get Profits from Your 404 Page not Found file.
"'Page Not Found' on this Server. Check the URL and try again. Or Refresh the page..."This
- The Psychology of Web Surfers
Here are some things you should know about web surfers: They are busy They
- Advantages of an Online Site Builder
There are numerous choices when it comes to building web sites. One of the first choices you will ha
- How To Create Clear Web Site Graphics (Part 2 of 2)
Web site graphics can spice up your web sites and increase stickability if used correctly.In this ar
- The Title An Accurate And Descriptive Summary
It's the little things that count. The obvious and most times overlooked are usually the most import
- How To Create Clear Web Site Graphics (Part 1 of 2)
Clear web site graphics can spice up your web sites and increase stickability if appropriately used.
- Building Community Websites Equals Success Online
In this article I will discuss how important is it to build community websites rather than straight