Saturday, 29 June 2013

What is Data Mining? Why Data Mining is Important?

Searching, Collecting, Filtering and Analyzing of data define as data mining. The large amount of information can be retrieved from wide range of form such as different data relationships, patterns or any significant statistical co-relations. Today the advent of computers, large databases and the internet is make easier way to collect millions, billions and even trillions of pieces of data that can be systematically analyzed to help look for relationships and to seek solutions to difficult problems.

The government, private company, large organization and all businesses are looking for large volume of information collection for research and business development. These all collected data can be stored by them to future use. Such kind of information is most important whenever it is require. It will take very much time for searching and find require information from the internet or any other resources.

Here is an overview of data mining services inclusion:

* Market research, product research, survey and analysis
* Collection information about investors, funds and investments
* Forums, blogs and other resources for customer views/opinions
* Scanning large volumes of data
* Information extraction
* Pre-processing of data from the data warehouse
* Meta data extraction
* Web data online mining services
* data online mining research
* Online newspaper and news sources information research
* Excel sheet presentation of data collected from online sources
* Competitor analysis
* data mining books
* Information interpretation
* Updating collected data

After applying the process of data mining, you can easily information extract from filtered information and processing the refining the information. This data process is mainly divided into 3 sections; pre-processing, mining and validation. In short, data online mining is a process of converting data into authentic information.

The most important is that it takes much time to find important information from the data. If you want to grow your business rapidly, you must take quick and accurate decisions to grab timely available opportunities.

Outsourcing Web Research is one of the best data mining outsourcing organizations having more than 17 years of experience in the market research industry. To know more information about our company please contact us.


Source: http://ezinearticles.com/?What-is-Data-Mining?-Why-Data-Mining-is-Important?&id=3613677

Thursday, 27 June 2013

Data Extraction Services - A Helpful Hand For Large Organization

The data extraction is the way to extract and to structure data from not structured and semi-structured electronic documents, as found on the web and in various data warehouses. Data extraction is extremely useful for the huge organizations which deal with considerable amounts of data, daily, which must be transformed into significant information and be stored for the use this later on.

Your company with tons of data but it is difficult to control and convert the data into useful information. Without right information at the right time and based on half of accurate information, decision makers with a company waste time by making wrong strategic decisions. In high competing world of businesses, the essential statistics such as information customer, the operational figures of the competitor and the sales figures inter-members play a big role in the manufacture of the strategic decisions. It can help you to take strategic business decisions that can shape your business' goals..

Outsourcing companies provide custom made services to the client's requirements. A few of the areas where it can be used to generate better sales leads, extract and harvest product pricing data, capture financial data, acquire real estate data, conduct market research , survey and analysis, conduct product research and analysis and duplicate an online database..

The different types of Data Extraction Services:

    Database Extraction:
    Reorganized data from multiple databases such as statistics about competitor's products, pricing and latest offers and customer opinion and reviews can be extracted and stored as per the requirement of company.
    Web Data Extraction:
    Web Data Extraction is also known as data Extraction which is usually referred to the practice of extract or reading text data from a targeted website.

Businesses have now realized about the huge benefits they can get by outsourcing their services. Then outsourcing is profitable option for business. Since all projects are custom based to suit the exact needs of the customer, huge savings in terms of time, money and infrastructure are among the many advantages that outsourcing brings.

Advantages of Outsourcing Data Extraction Services:

    Improved technology scalability
    Skilled and qualified technical staff who are proficient in English
    Advanced infrastructure resources
    Quick turnaround time
    Cost-effective prices
    Secure Network systems to ensure data safety
    Increased market coverage

By outsourcing, you can definitely increase your competitive advantages. Outsourcing of services helps businesses to manage their data effectively, which in turn would enable them to experience an increase in profits.


Source: http://ezinearticles.com/?Data-Extraction-Services---A-Helpful-Hand-For-Large-Organization&id=2477589

Tuesday, 25 June 2013

Data Loss Symptoms, Causes, and Implications of Downtime

A number of failures can cause data files to disappear or become corrupted. Symptoms of data loss appear immediately; it causes that panicky sinking feeling in the stomach when previously accessible data is suddenly out of reach. Today more data is being stored in smaller and smaller spaces, with hard drives of 2011 having more than 500 times the capacity of those in 2001. This makes a greater, costlier impact when hardware and software malfunction. Hardware malfunctions alone account for nearly 40% of all data loss.

If the hard drive of a computer isn't spinning or won't work at all, if you hear a scraping or rattling sound, or if an error message lets you know a device is not recognized then the hardware is failing and your data is at risk. You may see file or folder names that are scrambled or disappear. A hard disk may be silent for a long time after you request data by opening a folder or file.

Hard drive damage can be caused by power surges, dust in the computer, crashes, and controller failure. Other problems are caused by human error - accidentally deleted files, damage caused from dropping a device, or spilled liquids. Do-it-yourself repairs by inexperienced people can further destroy the drive and it's cargo of important data. An estimated 32% of data loss is caused by human error.

Although virus protection has become increasingly sophisticated, 7% of all data loss is caused by computer viruses. The computer may display strange and unpredictable behavior that gets more and more pronounced, the screen may go blank, or a taunting message may appear announcing the arrival of the malevolent virus within your hard drive. Once infected, the files will need to be processed by a data retrieval company if they are of substantial value.

Backup should be performed routinely of course but backups don't usually contain all the up-to-date data, the files may be corrupted already, or the hardware and storage media may not be working. Companies rely heavily on their computer systems for accounting, inventory, payroll, and many other time-sensitive activities. Backing up data is critically important but not foolproof, especially if a great amount of data is created daily and some of it is lost.

When the computer systems go down, the operation of a company is bogged down. The potential loss caused by this downtime will motivate business owners to have the best data recovery company they can find to take the case and save the day with advanced technology; established data retrieval services will also have the highest ethics when it comes to handling your confidential information.

Companies that have suffered extensive data loss caused by problems in hard drives, servers, hard disks, tapes, and media devices can find consolation in the fact that there's a good chance that the data can be retrieved in a short period of time. This allows operations to go back to normal after only several days, reducing the loss of productivity. The daily downtime losses for large companies can run in the millions. The data recovery industry is there to bring their computer operations back to life in a short a time as possible.



Source: http://ezinearticles.com/?Data-Loss-Symptoms,-Causes,-and-Implications-of-Downtime&id=6277522

Saturday, 22 June 2013

How Data Mining Can Help in Customer Relationship Management Or CRM?

Customer relationship management (CRM) is critical activity of improvising customer interactions while at the same time making the interactions more amicable through individualization. Data mining utilizes various data analysis and modeling methods to detect specific patterns and relationships in data. This helps in understanding what a customer wants and forecasting what they will do.

Using Data mining you can find out right prospects and offer them right products. This results in improved revenue because you can respond to each customer in best way using fewer resources.

Basic process of CRM data mining includes:
1. Define business objective
2. Construct marketing database
3. Analyze data
4. Visualize a model
5. Explore model
6. Set up model & start monitoring

Let me explain above steps in detail.

Define the business objective:
Every CRM process has one or more business objective for which you need to construct the suitable model. This model varies depending on your specific goal. The more precise your statement for defining the problem is the more successful is your CRM project.

Construct a marketing database:
This step involves creation of constructive marketing database since your operational data often don't contain the information in the form you want it. The first step in building your database is to clean it up so that you can construct clean models with accurate data.

The data you need may be scattered across different databases such as the client database, operational database and sales databases. This means you have to integrate the data into a single marketing database. Inaccurately reconciled data is a major source of quality issues.

Analyze the data:
Prior to building a correct predictive model, you must analyze your data. Collect a variety of numerical summaries (such as averages, standard deviations and so forth). You may want to generate a cross-section of multi-dimensional data such as pivot tables.

Graphing and visualization tools are a vital aid in data analysis. Data visualization most often provides better insight that leads to innovative ideas and success.


Source: http://ezinearticles.com/?How-Data-Mining-Can-Help-in-Customer-Relationship-Management-Or-CRM?&id=4572272

Thursday, 20 June 2013

Data Mining Process - Why Outsource Data Mining Service?

Overview of Data Mining and Process:
Data mining is one of the unique techniques for investigating information to extract certain data patterns and decide to outcome of existing requirements. Data mining is widely use in client research, services analysis, market research and so on. It is totally based on mathematical algorithm and analytical skills to drive the desired results from the huge database collection.

Information mining is mostly used by financial analyzer, business and professional organization and also there are many growing area of business that are get maximum advantages of data extract with use of data warehouses in their small to large level of businesses.

Most of functionalities which are used in information collecting process define as under:

* Retrieving Data

* Analyzing Data

* Extracting Data

* Transforming Data

* Loading Data

* Managing Databases

Most of small, medium and large levels of businesses are collect huge amount of data or information for analysis and research to develop business. Such kind of large amount will help and makes it much important whenever information or data required.

Why Outsource Data Online Mining Service?

Outsourcing advantages of data mining services:
o Almost save 60% operating cost
o High quality analysis processes ensuring accuracy levels of almost 99.98%
o Guaranteed risk free outsourcing experience ensured by inflexible information security policies and practices
o Get your project done within a quick turnaround time
o You can measure highly skilled and expertise by taking benefits of Free Trial Program.
o Get the gathered information presented in a simple and easy to access format

Thus, data or information mining is very important part of the web research services and it is most useful process. By outsource data extraction and mining service; you can concentrate on your co relative business and growing fast as you desire.

Outsourcing web research is trusted and well known Internet Market research organization having years of experience in BPO (business process outsourcing) field.

If you want to more information about data mining services and related web research services, then contact us.


Source: http://ezinearticles.com/?Data-Mining-Process---Why-Outsource-Data-Mining-Service?&id=3789102

Wednesday, 19 June 2013

Data Entry Services Are The Core of Any Business

Data entry is the core of any business and though it may appear to be easy to manage and handle, this involves many processes that need to be dealt systematically. Huge changes have taken place in the field of data entry and due to this handling the work has become much easier then before. So if you want to make use of the best data entry services to maintain the data and other information about your company, you must be ready to spend money for this. It is in no way an attempt to say that data entry services are costly, but just to say that good services will not come that cheap either. You just need to decide if you will hire professionals to do this work in house or if you would like to hire the services from an outside firm. The business is your and you are the best person to decide what is suitable for your business.

Doing the data entry of any business in house can be advantageous and disadvantageous as well. The main advantage can be in the form that you can keep an eye on the work being done to maintain proper records of all aspects of your company. This can prove to be a bit costly to you as you will have to hire the services of a data entry operator. The employee will be on rolls and thus will be entitled to all the benefits like allowances and other bonuses. So another option that you can use for this is to get a third party handle the work for you. This is a better option as you can hire the services depending on the type of work you need to be done.

This is one of the core components of your business and consequently you must ensure that this is handled properly. Data entry services are not the only aspect that business owners are seeking out these days. With the huge surge in the field of information and technology data conversion is equally important. The need to convert the data that has been entered is gaining momentum day by day. Conversion of the data makes it more accessible and this can be used easily without too many hassles to draw customers for buying the goods. Traditional methods have been done away with and professionals who work for data entry services these days are highly skilled and in tune with the latest methods.

Data entry services done for a company by third party has been found to be very suitable. In fact studies have indicated that outsourcing data entry services is one the rise due to the high rate of success enjoyed by business owners for this. The main advantage of getting data entry services done by a third party is that it works out very cheap and the work done is of the top most quality. So if the data entry services of the best quality id provided there is absolutely no chance why someone would not undertake the process to increase and brighten business prospects.


Source: http://ezinearticles.com/?Data-Entry-Services-Are-The-Core-of-Any-Business&id=556117

Monday, 17 June 2013

PDF Scraping: Making Modern File Formats More Accessible

Data scraping is the process of automatically sorting through information contained on the internet inside html, PDF or other documents and collecting relevant information to into databases and spreadsheets for later retrieval. On most websites, the text is easily and accessibly written in the source code but an increasing number of businesses are using Adobe PDF format (Portable Document Format: A format which can be viewed by the free Adobe Acrobat software on almost any operating system. See below for a link.). The advantage of PDF format is that the document looks exactly the same no matter which computer you view it from making it ideal for business forms, specification sheets, etc.; the disadvantage is that the text is converted into an image from which you often cannot easily copy and paste. PDF Scraping is the process of data scraping information contained in PDF files. To PDF scrape a PDF document, you must employ a more diverse set of tools.

There are two main types of PDF files: those built from a text file and those built from an image (likely scanned in). Adobe's own software is capable of PDF scraping from text-based PDF files but special tools are needed for PDF scraping text from image-based PDF files. The primary tool for PDF scraping is the OCR program. OCR, or Optical Character Recognition, programs scan a document for small pictures that they can separate into letters. These pictures are then compared to actual letters and if matches are found, the letters are copied into a file. OCR programs can perform PDF scraping of image-based PDF files quite accurately but they are not perfect.

Once the OCR program or Adobe program has finished PDF scraping a document, you can search through the data to find the parts you are most interested in. This information can then be stored into your favorite database or spreadsheet program. Some PDF scraping programs can sort the data into databases and/or spreadsheets automatically making your job that much easier.

Quite often you will not find a PDF scraping program that will obtain exactly the data you want without customization. Surprisingly a search on Google only turned up one business, (the amusingly named ScrapeGoat.com http://www.ScrapeGoat.com) that will create a customized PDF scraping utility for your project. A handful of off the shelf utilities claim to be customizable, but seem to require a bit of programming knowledge and time commitment to use effectively. Obtaining the data yourself with one of these tools may be possible but will likely prove quite tedious and time consuming. It may be advisable to contract a company that specializes in PDF scraping to do it for you quickly and professionally.

Let's explore some real world examples of the uses of PDF scraping technology. A group at Cornell University wanted to improve a database of technical documents in PDF format by taking the old PDF file where the links and references were just images of text and changing the links and references into working clickable links thus making the database easy to navigate and cross-reference. They employed a PDF scraping utility to deconstruct the PDF files and figure out where the links were. They then could create a simple script to re-create the PDF files with working links replacing the old text image.

A computer hardware vendor wanted to display specifications data for his hardware on his website. He hired a company to perform PDF scraping of the hardware documentation on the manufacturers' website and save the PDF scraped data into a database he could use to update his webpage automatically.

PDF Scraping is just collecting information that is available on the public internet. PDF Scraping does not violate copyright laws.

PDF Scraping is a great new technology that can significantly reduce your workload if it involves retrieving information from PDF files. Applications exist that can help you with smaller, easier PDF Scraping projects but companies exist that will create custom applications for larger or more intricate PDF Scraping jobs.


Source: http://ezinearticles.com/?PDF-Scraping:-Making-Modern-File-Formats-More-Accessible&id=193321

Friday, 14 June 2013

Basics of Online Web Research, Web Mining & Data Extraction Services

The evolution of the World Wide Web and Search engines has brought the abundant and ever growing pile of data and information on our finger tips. It has now become a popular and important resource for doing information research and analysis.

Today, Web research services are becoming more and more complicated. It involves various factors such as business intelligence and web interaction to deliver desired results.

Web Researchers can retrieve web data using search engines (keyword queries) or browsing specific web resources. However, these methods are not effective. Keyword search gives a large chunk of irrelevant data. Since each webpage contains several outbound links it is difficult to extract data by browsing too.

Web mining is classified into web content mining, web usage mining and web structure mining. Content mining focuses on the search and retrieval of information from web. Usage mining extract and analyzes user behavior. Structure mining deals with the structure of hyperlinks.

Web mining services can be divided into three subtasks:

Information Retrieval (IR): The purpose of this subtask is to automatically find all relevant information and filter out irrelevant ones. It uses various Search engines such as Google, Yahoo, MSN, etc and other resources to find the required information.

Generalization: The goal of this subtask is to explore users' interest using data extraction methods such as clustering and association rules. Since web data are dynamic and inaccurate, it is difficult to apply traditional data mining techniques directly on the raw data.

Data Validation (DV): It tries to uncover knowledge from the data provided by former tasks. Researcher can test various models, simulate them and finally validate given web information for consistency.

Should you have any queries regarding Web research or Data mining applications, please feel free to contact us. We would be pleased to answer each of your queries in detail. Find more information at http://www.outsourcingwebresearch.com



Source: http://ezinearticles.com/?Basics-of-Online-Web-Research,-Web-Mining-and-Data-Extraction-Services&id=4511101

Wednesday, 12 June 2013

Changes Document Of Data Scraping Services

Processed either manually or by using software that can be achieved. Mining companies to increase extraction and web crawling process has led to increased use. Another important task of these companies to process and analyze the data collected. One of the important aspects of these companies is that experts use. Therefore, the mining companies is not limited to the role of data mining, but also to identify different relationships with our customers and be able to build the model.

Some of the most common methods used to scrape web crawling, text, fun, DOM analysis and include matching expression. After the process is only analyzers, HTML pages or meaning can be achieved through annotations. There are many different ways of scaling data, but more importantly is working toward the same goal. The main purpose of using web scraping service to retrieve and compile data in databases and web sites. In the business world is to remain relevant to the business process.

The central question about the relevance of web scraping contact. The process is relevant to the business world? The answer is yes. The fact that it is used by large companies in the world and many awards speaks derivatives.

Proved to scrape data from websites using the software program is the process of extracting data from the Web. We offer the best web software to extract data. That kind of experience and knowledge in web data extraction is completed image, screen scrapping, email extractor services, data mining, web hoarding.

You can use the data scraping services.

Data as the information is available on the network, name, word, or what is available in web.
restaurants our city California software and marketing company to use the data from these data can market their product as restaurants.

For example, many companies, especially in the data as per your need for help.

Web Data Extraction

Websites tagged text-based languages (HTML and XHTML) are created using, and often contain a lot of useful data as text. However, the majority of web pages and automate human end users are not designed for ease of use. Because of this, scrape toolkits that web content is created. A web scraper to have an API to extract data from a Web site. We offer quality and affordable web applications for data mining

Data collection

In general, the information of the data transfer between the programs, people automatically by computer processing is performed by appropriate structures. Such formats and protocols are strictly structured change documented, analyzed easily, and to maintain a minimum ambiguity. Often, these transmissions are not readable.

Email Extractor

A tool that automatically any reliable source called an email extractor to extract email ids help. It is fundamentally different websites, HTML files, text files or any other format without ID duplicate email contacts collection services.



Source: http://data.ezinemark.com/changes-document-of-data-scraping-services-7d385510416d.html

Monday, 10 June 2013

Screen Scraping

There is a huge difference between screen scraping and data mining. Basically, screen scraping allows you to obtain information while data mining on the other hand allows you to analyze the information you obtain. Before the advent of the internet, screen scraping literally meant scraping off or extracting information from text so it could be analyzed. Today, screen scraping is basically used to scrape information off the web. With that, specially designed programs and applications crawl through websites to pull out data needed by individuals doing the scraping. This is usually done when a person wants to build websites for price and product comparison, archiving web pages, or acquiring texts so it can be easily evaluated and filtered.

When you perform a screen scraping, you are able to scrape off data more directly. This is also one of the fastest ways to obtain data since the process is fully automated. Different types of screen scraping services can offer different ways of obtaining information. This is usually the solution especially when the website that is subject for scraping has several barriers designed to block this type of automated activity. Some screen scraping services offer text grepping and common expression matching. Extracting information from the web can be done through a UNIX grep command or other related techniques for expression matching. Some services offer web scraping applications that can be used to customize and tailor fit web based scraping solutions.

These applications can try to automatically distinguish the data structure of a particular page or offer a recording interface that significantly reduces the need to create screen scraping codes manually or other scraping functions that can be utilized to take out and convert web content as well as database interfaces that could accumulate the scraped information using local databanks.On the other hand, data mining is basically the process of automatically searching large caches of information and data for patterns. This means that you already have the information and what you only need to do is to analyze the contents to find the useful things you need. This is very different with screen scraping wherein you still need to look for the data before you can analyze it.

Data mining also involves a lot of complicated algorithms often based on various statistical methods. This process has nothing to do with how you obtain the data. All it cares about is analyzing what is available for evaluation. Screen scraping is often mistaken for data mining where in fact these are two different things. Today, there are online services that offer screen scraping. Depending on what you need, you can have it custom tailored to meet your specific needs and perform precisely the tasks you want. Finding reliable screen scraping services is not difficult and you can simply search them online and find the right company that can have the right solution for your needs.


Source: http://www.fetch.com/screen-scraping-article/

Thursday, 6 June 2013

Data Mining vs Screen-Scraping

Data mining isn't screen-scraping. I know that some people in the room may disagree with that statement, but they're actually two almost completely different concepts.

In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That's a pretty big simplification, so I'll elaborate a bit.

The term "screen-scraping" comes from the old mainframe terminal days where people worked on computers with green and black screens containing only text. Screen-scraping was used to extract characters from the screens so that they could be analyzed. Fast-forwarding to the web world of today, screen-scraping now most commonly refers to extracting information from web sites. That is, computer programs can "crawl" or "spider" through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, on the other hand, is defined by Wikipedia as the "practice of automatically searching large stores of data for patterns." In other words, you already have the data, and you're now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what's already there.

The difficulty is that people who don't know the term "screen-scraping" will try Googling for anything that resembles it. We include a number of these terms on our web site to help such folks; for example, we created pages entitled Text Data Mining, Automated Data Collection, Web Site Data Extraction, and even Web Site Ripper (I suppose "scraping" is sort of like "ripping"). So it presents a bit of a problem-we don't necessarily want to perpetuate a misconception (i.e., screen-scraping = data mining), but we also have to use terminology that people will actually use.

Todd Wilson is the owner of screen-scraper.com (http://www.screen-scraper.com/), a company which specializes in data extraction from web pages. While not scraping screens Todd is hard at work finishing up a doctoral degree in Instructional Psychology and Technology.


Source: http://ezinearticles.com/?Data-Mining-vs-Screen-Scraping&id=146813

Tuesday, 4 June 2013

Adventures in Screen Scraping with YQL

When coding for work, everything of course has to be done the Right Way®. This isn’t always super exciting, so it is sometimes liberating to cut loose and work on a side project that mashes together a whole bunch of technologies without worrying too much about stability, reliability, scalability, or even if it will continue to run tomorrow. These R&D projects will never have even a single line of code directly pushed into even a development repository, but more often than not I find that I take concepts learned and tested during these coding sessions and apply them in some later project. Even if the entire project is thrown away in relatively short order, some concept of value survives for the long haul.

Plus, it’s just fun.

Recently my wife and I got the very exciting (and scary!) news that we were pregnant with our first child. The little guy or girl’s arrival is still over 5 months away, but already we’re wrestling with tons of difficult questions, and one particularly overwhelming one is “How are we going to decide where to send our child for day care?”

We live in the great state of Minnesota where the Department of Human Services maintains a searchable Licensing Info Lookup website for all sorts of things, including (but not limited to) family child care. Anyone with a child care license can be found here, along with address, phone number, if they can accept newborn infants and how many, etc.

Just one problem. We live on the border of two big suburbs, so you do a search for both cities and together you get over 150 results, and no map.

This is where my inner geek starts to get excited. I’ve got a copy of Visual Studio. I can fix this problem. Let’s do it.

Screen Scraping with YQL

Of course it’s not possible for a web page to directly access data from a different domain, but using a proxy capable of JSONP, it’s possible to grab data from any website, format it into JSON, and then surround it with a callback parameter so that you can inject it as a script tag into your document’s HEAD, where it will execute your callback function by passing the data inside. If you’re not familiar with this, check out Wikipedia’s JSONP page.

It turns out that YQL (Yahoo! Query Language) can serve as the perfect proxy in this situation. In a real-life project, I would be hesitant to rely on YQL as Yahoo could begin to reject an application for high traffic, or just pull the plug on YQL altogether. But for a low traffic site (hit by only myself and my wife) it’s a perfect match.

First you need to analyze the page you wish to scrape. On each of the two search result pages (one for each city) I wished to scrape, the HTML content for each search result looked something like this, where I’ve replaced any real content with {Placeholders}.

<table  border="0" summary="" class="LicTable1">
    <tr>
        <td width="70%" class="LicTitle1"><a href="Details.aspx?l={ResultID}" class="resultsNote">{ResultName}</a></td>
        <td  width="30%" class="LicStatus1">Active</td>
    </tr>
</table>

<table  border="0" summary="" class="LicTable">
    <tr>
        <td width="30%" class="LicContentL">{StreetAddress}<br>{CityStateZip}<br>
        {PhoneNumber}<br>
        {CountyName}</td>
        <td width="70%"  class="LicContentR">License number: {LicenseNumber}
        <br /><br /><br />Type of service: Family Child Care</td>
    </tr>
    <!-- SECTION : Optional Row -->
</table>
                            
<img height="10" src="Images/blank.gif" alt="" />

Really, Minnesota? Table-driven design? Ever heard of semantic markup? But I digress. As much as I detest this markup (I would instantly reject it if I saw it in a code review from my own developers) it has enough detail that I can work with this.

The YQL expression goes like this:
?
1
2
3
   
select * from html
where url = "{Url}"
    and xpath='//table[@class="LicTable1" or @class="LicTable"]'

YQL can fetch the original HTML, perform an xpath to locate any table node with the class name “LicTable1″ or “LicTable” and then return those results. To try it for yourself, head over to the Minnesota Department of Human Services Licensing Lookup, perform any search you like, drop the search results URL into the format above, and then drop that into the YQL Console. If it doesn’t work anymore, it’s probably because the MN-DHS changed the markup on the website. That’s what you get when you try to screen scrape – nobody is under any obligation to adhere to any sort of contract in their HTML markup.

You’ll find that YQL can return its results in XML or in JSON, where in JSON the HTML markup is converted into JSON objects. If you implement the callback function (the YQL Console uses “cbfunc” by default, I’m using $.parseData) you can parse through the HTML structure shown above like this:
   
var data = { List: [] };

$.parseData = function (d) {
    var cur = null;
    $(d.query.results.table).each(function (i, tbl) {
        if (tbl.class == "LicTable1") {
            cur = {
                name: tbl.tr.td[0].a.content,
                href: "http://licensinglookup.dhs.state.mn.us/" + tbl.tr.td[0].a.href,
                enabled: true
            };
        }
        else if (tbl.class == "LicTable" && cur != null) {
            var lines = tbl.tr.td[0].p.content.split('\n');
            cur.address1 = $.trim(lines[0]);
            cur.address2 = $.trim(lines[1]);
            cur.phone = $.trim(lines[2]);
            data.List.push(cur);
            cur = null;
        }
    });
};

Now I’ve got all my data added to a JSON data model on the client side. With this in hand, it became pretty straightforward to:

    Transfer the data to a server-side data model with an ASP.NET Script Service.
    Persist the data to a flat file with XML Serialization.
    Geocode the addresses to latitude/longitude pairs with the Google Maps API.
    Display a pin for each location on a map, along with a detail pane, where clicking on the pin or the left-side summary would highlight the other.
    Add a textarea to the left-side summary so that we can take down notes when we call each daycare location, then save that data server-side as well.

The possibilities for extension are endless, but with this level of sophistication, my wife and I were able to pick out several home day cares that are located conveniently close to the route of our commute, and start with that list when making our calls.

Of course as it turns out, we are grossly ahead of schedule, and most calls resulted in being told we were calling way too early.

But as a software developer, I’ll never object to being called ahead of schedule.


Source: http://www.make-awesome.com/2011/07/adventures-in-screen-scraping-with-yql/