Tuesday, 23 April 2013

What is Web Scraping?

Scraping comes from “Screen Scraping” which is a term used for a set of products that turn old “Green Screen” mainframe applications into web services by “wrapping” the screen protocol.  Screen Scrapers connect up to the fields of a 32×80 character terminal and read pixels, text and numbers to fill in forms and in turn wrap the application into a programmatic interface or web service.  Examples of such products are IBM Rational HATS, Attachmate EXTRA.

Web Scraping is conceptually identical to Screen Scraping as it “wraps” a human interface into a programmatic interface, but instead of “wrapping” a character based mainframe protocol, it “wraps” a Web site or Web application and turns it into an API.

It sounds similar but technically, and in use cases, it’s quite different.

Web Scraping does not represent all approaches of wrapping Web applications into API’s – it’s limited to traditional methods that use script languages like PERL or Python to extract data from static HTML with regular expressions. This method of extracting data from web sites has been used for years, but it has been running into two growing challenges:  it’s fragile toward changes in the underlying web application, and more importantly, it simply does not work with today’s dynamic AJAX powered web sites.

If you are a PERL programmer I encourage you to build a simple “web scraper”. Go to Gmail.com and create a PERL script that can log in and read the content of your inbox. You will quickly find out that it is nearly impossible.

Let me introduce the Kapow Web Data Server – it takes over where fragile “Web Scraping” scripts fail, delivering a point-and-click interface to turn a website like gmail.com into a sharable REST or SOAP service in the cloud or on-premise, virtually in minutes. Web data access has never been easier and more resilient.

Web Scraping represents a business concept with growing value in today’s networked world, however, Web Data Serving has taken over to deliver a far more productive and robust alternative to traditional Web Scraping technologies.

I will be continuing with more blogs on this topic, and as always, I’d love to hear your comments.

Source: http://kapowsoftware.com/blog/index.php/what-is-web-scraping

Note:

Delta Ray is experienced web scraping consultant and writes articles on Yellow Pages Data Scraping, Screen Scraping Services, Linkedin Email Scraping, Amazon Product Scraping, Website Harvesting, IMDb Data Scraping, Yelp Review Scraping, Screen Scraping Services, Yelp Review Scraping and yellowpages data scraping.

No comments:

Post a Comment