Friday, 31 May 2013

Screen Scraping a Web Service

If you don't have access to a SOA or even a RESTful web service, but see data online that you need to access, consider using Pervasive's Extract Schema Designer and Map Designer to parse the HTML source and load well-formed records into your database. I call this "screen scraping" because the technique is similar to gathering data off of a mainframe terminal session. See this emacs help manual example where the third line in from the top will always give the section heading number and text.

Take, for example, a web page displaying a table of 50 U.S. states and state codes. The underlying HTML contains a table with elements for the state code and the state name: States HTML.

f you save this file, you can build a schema for the file using Extract Schema Designer. Load the saved HTML file into Extract Schema Designer and mark off the lines and fields of interest.  Extract Extract Schema Designer will produce a .cxl file that will be the source connector for a map. After setting up the source in Map Designer, set up the target (to an RDBMS table), then map the fields. In Part 2 of the tutorial, I use the Map by Position feature to quickly map the source fields to target fields since the names ('state_code' versus 'state_cd') don't completely match.

It's more robust to use a well-defined SOA or RESTful service and if the lines of HTML are compressed, it may not be worth the effort to screen scrape. However, with many sites using well-formed XHTML, you may get the data you need quickly. Make sure that you are legally allowed to load the data in your database first.


Source: http://bekwam.blogspot.in/2010/12/screen-scraping-web-service.html

Wednesday, 29 May 2013

Web Services Update "Screen Scraping"

Web services are presenting Java and NET developers new easy-to-learn opportunities to access and integrate with valuable legacy assets, according to a growing number of enterprise apps professionals. See why non-programmatic approaches, including even screen scraping could boost career opps for Java and .NET enterprise devs.

by Vance McCarthy

Tags: Mainframe, Integration, Web Services, Guruge, Legacy, Applications, Screen Scraping,

Web services could give new life to old-fashioned "screen scraping" styles of client-to-mainframe integration.

Anura Guruge, an independent web services and legacy apps consultant from Gilford, N.H., contends that after massive spending in the mainframe software sector in the years leading up to Y2K, there has been "apathy' in such spending, largely due to budget constraints.

But, he added, a new wave of web services-based approaches can be used to extend mainframe assets throughout the enterprise, which is is sparking new interest in IT pros that can deliver "mainframe modernization," he said.

The current climate, in fact, is presenting a unique opportunity for traditional web, Java and .NET developers and architects to add value to mainframe assets, without the need to learn COBOL, CORBA or other mainframe-centric technologies.

The Mainframe Integration Opps for Java, .NET
Guruge to suggest that a "non-programmatic" technique like screen scraping can be an advantage, not a handicap, to Java and .NET devs -- one they can wrap their arms -- and applications -- around. The trick for devs and architects that want to tap into valuable legacy data, applications and business rules, Guruge said, is to understand just how to construct a web service that can call to a legacy (proprietary) destination.

"The point is that you are developing new applications, but you are not recreating transactions that have been around for 20-plus years," said Guruge. To succeed at this style of mainframe Web services, integration developers must do four (4) key things:

*Identify the target mainframe transactions
*Set up a SOAP connection
*Call and retrieve the mainframe-side data, and
*Merge the results with other elements of a composite application

"Screen Scraping" Grows Up
'Screen scraping' has become a somewhat derogatory term for many enterprise IT professionals, mainly because it's been associated with more than a few integration debacles over the past decade or so. And, moreover, many programmers looked down at the idea of interacting with mainframes using calls that effectively emulated a terminal.

But Guruge insists that today's web services environment presents a whole new range of capabilities for using calls, and much richer opportunities for non legacy devs, including Java and .NET to use them to good advantage.

But, rather than call the approach 'screen scraping,' Guruge prefers to describe the approach as a 'runtime execution model.' The key is that the approach is non-programmatic.

A "Runtime Execution Model" Integration Approach
With REM, the dev isolates the applications he wants to invoke on the mainframe side, and then calls these applications into a "composite of applications" running in the environment the dev knows best, .NET or Java environments, for instance.

The integration is accomplished by the call to the mainframe app, which in effect becomes simply just another object in the dev's Web service.

The approach is enabled by use of a host integration server such as those offered by IBM, NetManage, WRQ, and others. When needed, the integration server steps through a mainframe access process - 'scraping' a succession of screens - and returns results. The legacy application continues to run unchanged on its native platform.

The runtime execution model (which also has been described as a "host integrator" model), was well-established before web services and XML came about. Now that a call to the mainframe app is now a web services call may give the approach better respectability, especially in an era when IT managers are looking for more standards-based ways to integrate with the mainframe, rather than replace it.

The types of legacy 'applications' Guruge recommends for "run time execution" are not those with clean separation of data and business logic. These can be handled by data integrators and app servers.

Instead the apps considered are apps full of CICS-like transactions intermingling data and logic. This code has gone through many iterations in a long life, and is in turn difficult to identify. "Y2K proved conclusively that finding source code is not as easy as you think," he said.

Guruge said a key to REM's quick deployment arises from the introduction of what he calls "Host Integration Studio tools" that help developers capture the necessary mainframe log-in and authentication access procedures, as well as the general screen navigation [screen scraping] to access needed transactions.

The tools, which may increasingly hook into framework standards such as Eclipse, also help you describe the system I/O in terms of WSDL. Capturing of I/O screens occur almost as with Microsoft Office application development, where macros are run and recorded. Too, WSDL descriptions are built with the help from wizards.

As there are quite a number of things running in this execution model, what about performance overhead? The mainframe is not greatly affected, according to Guruge. But the Web services engines cannot be underpowered. "You may not want low-end Pentiums running your host integration," Guruge said dryly. "Platforms in the middle must deliver [according to] mission-critical criteria.," he said.

"This is a mature paradigm with advantages," he continued, "what was driven in the past from the terminal is now run by an exterior application."

One common mistake made by IT professionals doing legacy integration is not looking at the wide picture, Guruge said. . "People get blindsided, [by] not looking at everything that is out there. In fact, they do not have to restrict themselves to a single scheme," he said. "Start with something small, and manageable, but representative,"

Bad integration design can derail any project, but better designs may be more widely obtainable with defined, standard Web services implementations, Guruge suggested. "With things like CORBA - which is still a very powerful technology - there was complexity. With Web services you are getting standards infrastructure - XML, SOAP, WSDL. And you simplify the invocation."

"Web services are simpler, at least in terms of a usage model. I think it is an easier to understand and realize model," he said. Screen scraping should not be derided, said Guruge. "It works, it's a proven technique, it tends to be simple, and it also allows you to combine transactions from multiple applications," he added.

"In terms of drawbacks, well, here we are talking about continuing with legacy applications. With this technique legacy applications never go away," said Guruge. "It's been said before: Big Iron platforms are not dead. "For the big volumes and workload the mainframes still tend to be the ideal platform."


Source: http://www.idevnews.com/stories/3327/Web-Services-Update-Screen-Scraping

Monday, 27 May 2013

Are You Screen Scraping or Data Mining?

Many of us seem to use these terms interchangeably but let’s make sure we are clear about the differences that make each of these approaches different from the other.

Basically, screen scraping is a process where you use a computer program or software to extract information from a website.  This is different than crawling, searching or mining a site because you are not indexing everything on the page – a screen scraper simply extracts precise information selected by the user.  Screen scraping is a useful application when you want to do real-time, price and product comparisons, archive web pages, or acquire data sets that you want to evaluate or filter.

When you perform screen scraping, you are able to scrape data more directly and, you can automate the process if you are using the right solution. Different types of screen scraping services and solutions offer different ways of obtaining information. Some look directly at the html code of the webpage to grab the data while others use more advanced, visual abstraction techniques that can often avoid “breakage” errors when the web source experiences a programming or code change.

On the other hand, data mining is basically the process of automatically searching large amounts of information and data for patterns. This means that you already have the information and what you really need to do is analyze the contents to find the useful things you need. This is very different from screen scraping as screen scraping requires you to look for the data, collect it and then you can analyze it.

Data mining also involves a lot of complicated algorithms often based on various statistical methods. This process has nothing to do with how you obtain the data. All it cares about is analyzing what is available for evaluation.

Screen scraping is often mistaken for data mining when, in fact, these are two different things. Today, there are online services that offer screen scraping. Depending on what you need, you can have it custom tailored to meet your specific needs and perform precisely the tasks you want. But screen scraping does not guarantee any kind of analysis of the data.


Source: http://www.connotate.com/company/blog/138-are_you_screen_scraping_or_data_mining

Thursday, 16 May 2013

“Screen-scraped” bank feeds are unreliable and inaccurate

Many business owners use cloud accounting solutions and benefit from daily bank-feeds, a feature where bank transactions are automatically imported and matched to the correct accounts in their accounting software. Bank feeds remove both the tedious task of …1 May 2013

MYOB warns “screen-scraped” bank feeds are unreliable and inaccurate

Many business owners use cloud accounting solutions and benefit from daily bank-feeds, a feature where bank transactions are automatically imported and matched to the correct accounts in their accounting software. Bank feeds remove both the tedious task of data entry and the challenge of correctly allocating numerous transactions in the bank reconciliation process. However MYOB warns bank feeds services from some software providers may be unreliable and inaccurate.

MYOB General Manager, User Experience and Design, Ben Ross, says the company is committed to providing reliable, accurate data and maintaining rigorous standards of security when managing financial data.

“At MYOB, we understand that reliable access to accurate data is absolutely fundamental for our customers. Automatically importing transaction details into MYOB accounting solutions significantly reduces manual data entry, improves accuracy and saves both time and money,” he says.

Mr Ross explains that it is important for business owners to understand exactly how their accounting software accesses their sensitive banking information, and whether that access is authorised by their banks online terms and conditions.

“There are several ways that accounting service providers can access aggregated bank transaction data and unfortunately some software providers play fast and loose with data quality and customer security,” he says.

MYOB uses a bank-authorised data collection system provided by BankLink for its LiveAccounts and AccountRight Live products. In this process, BankLink supplies secure bank transaction data via direct feeds from financial institutions without needing to disclose logon details. The data is supplied in a secure, ‘read only’ format. The entire process complies with the stringent Payment Card Industry Data Security Standard for the safe handling of transaction data and meets the requirements of more than 100 financial institutions.

“MYOB chose to work with BankLink for its proven reliability, security and coverage of feeds from financial institutions across Australia. BankLink has a team of data accuracy specialists reviewing bank data feeds using processes they have refined over their 25 years of providing this service. For this reason, BankLink feeds are 99.9999% accurate and in some cases, more reliable than the bank’s own raw feeds,” says Mr Ross.

BankLink applies a series of proprietary, data validation routines to all bank transactions that identify and correct any anomalies in the data. This sophisticated error detection system results in a significant increase in data accuracy. Furthermore, BankLink’s direct contractual relationship with the banks means that they have protocols in place to fix any errors promptly without any interruption to service.

Some cloud accounting providers use a method commonly called “screen-scraping”. This process requires a business owner to disclose their internet banking username and password to a third party ‘screen-scraper’. This third party then automatically logs in to the business’s internet banking account at regular intervals, copies their transactions and supplies them to their accounting services provider.

The screen-scraping process may contravene internet banking terms and conditions.

“Most online banking terms and conditions forbid the disclosure of login and password details to any party, and exclude the bank from liability for any fraud which may then occur on the account – whether or not the fraud is related to the actions of the screen-scraper. We caution users of other software against passing on their online banking credentials through to third parties in return for bank feeds that are insecure and contain inaccuracies,” says Mr Ross.

Along with the potential security risks, screen-scraping can also be unreliable as the third party isn’t working directly with the banks. Not surprisingly, this lack of reliability can lead to frustration for accountants, bookkeepers and business owners.

“The concern for business owners is in the accuracy of their business financials. Even if only two in every hundred transactions are wrong, how do you know which two? That adds in a whole lot of extra work that undermines the original time saving benefit of the bank feed system.”

According to Debbie Vihi, owner of Mobile Bookkeeping Services, the bank feeds associated with some cloud accounting packages that use a third party or ‘screen scraping’ can be both unreliable and inaccurate. This makes it time consuming to reconcile a client’s banks accounts.

“I was reconciling a client’s accounts when I noticed that the software had duplicated transactions. The client often had a lot of similar amounts coming out of their bank account so the double ups were not picked up — they were mainly related to credit cards,” she says.

“What was usually a five minute job turned out to be quite time consuming. I had to take a million steps backwards and had to manually tick the statements off against the correct transactions,” says Ms Vihi.

Fortunately for accountants, bookkeepers and business owners who want to enjoy the time and cost benefits of bank feeds, MYOB’s provider BankLink offers both an accurate and more reliable alternative to screen-scraping.

“For anyone using cloud accounting, accurate bank feeds can be a real time-saver, inaccurate bank feeds can be a nightmare. To ensure you are getting accurate, reliable data in a way that doesn’t contravene your bank’s terms and conditions, it’s important to understand how your cloud accounting provider obtains its bank feeds. Users should check that they haven’t inadvertently supplied a third party with their banking login in details and ask their provider what industry standards their third party supplier complies with,” says Mr Ross.

Source: http://onlysoftwareblog.com/2013/05/%E2%80%9Cscreen-scraped%E2%80%9D-bank-feeds-are-unreliable-and-inaccurate/

Monday, 6 May 2013

Web Scraping the Solution to Data Harvesting

The internet is the number one information provider in the world and it is of course the largest in the same course. Web scraping is meant to extract and harvest useful information from the internet. It can be regarded as a multi-displinary process that involves statistics, databases, data harvesting and data retrieval.
There has been noted a rapid expansion of the web and therefore causing an enormous growth of information. This has led to increased difficulty in the extraction of useful and potential information. Web scraping therefore confronts this problem by harvesting explicit information from a number of websites for knowledge discovery and easy access. It is important to realize that query interfaces of web databases are prone to sharing of same building blocks. It is therefore important to realize that the web offers unprecedented challenge and opportunity to data harvesting. This can be noted in the following ways:
    Huge amount of information. A lot of information is found on the internet. The information can range from one aspect to the other. Usually this information is more than want you actually need. Therefore it is a great concern in getting the required information that is also relevant to you. In this case you have to understand that not only the internet offers an opportunity to gather information but the harvesting itself is never an easy task. By use of our web scraping service we focus our attention to the most important information you need. We only gather information that is essential and one that is applicable to your niche and targets.
    Wide and diverse coverage of web information. In the web almost all topics you can think of are covered. Think of any topic, you will realize that such topic is covered widely and adequately. This is an opportunity to get the variety of information. Nevertheless it is still a great challenge of getting information on a particular target from the wide and diverse audience. By use of web scraping the process can be tailored to collect data for a particular field.
    All types of data are available on the web. Information is usually stored in many formats. Think of texts, multimedia, spreadsheets, structured tables and so on and so forth. Harvesting such kind of information is a great task that may consume a lot of resources in terms of personnel, time and financial resources. Our web scraping service collects analyses the data and stores it in the relevant format for easy reading, application and storage.
    Most of the data is linked. This greatly amuses and at the same time annoys me. Almost all the information on the web is linked from one website to the other with several hyperlinks here and there. Such linking may have been used in marketing or any other SEO purposes. When it comes to harvesting information from such sites that make the majority in the internet today, you are likely to mismatch information. Not only would such process be expensive but a waste of time. We tailor our web scraping service to remain relevant and collect information only from a particular website and not non-related linked websites. For instance if you want to get information from articles found on the article directories you may end up collecting information from wrong websites due to interlink age.
    Most of the data is redundant. The issue with this is that you can collect information that is the same from large number of web pages. This is costly and unacceptable in the business world. Information that is found on a large number of web sites may be similar. This is because of banner advertisements, copyright notices, navigation panels and many others. It is therefore important to engage in web scraping so as to solve such kind of problem. Our web scraping avoids such kind of data as it is never beneficial to a business.
    Deep web and surface web. Think of a website and the information that is contained. A clear look will indicate two types of data contained in it. Surface data can be regarded as the data which you get by use of browser. There is more information that is protected from public users. This information may be more beneficial than other information that we can regard as surface data. Our web scraping service deeps further to such information and thereby equipping our customers with relevant and applicable information for their benefit.
    The web is ever dynamic. Think of the new information and the old information removed from the web. This makes the web a dynamic environment in which you can rely on. The content keeps changing now and again. By our web scraping we are able to monitor such kind of content and provide our clients both with the past and latest data.
    It is a virtual society. Ever thought of internet. It can be regarded as a virtual society based on the following reasons. The internet is never only about product and services, data but also about interactions about people, organizations and various automatic systems. This usually poses a great challenge when it comes to harvesting of such data. Our web scraping ensures that relevant data is held up to date.
This article has explored why the internet is such a huge resource when it comes to data. It has also explored why harvesting such kind of data is really a great challenge and if not well planned it may consume a lot of resources. The article also details on the most important solution available, that is web scraping and why it should be used by companies to harvest information in a simple and efficient way.

Source: http://www.loginworks.com/blogs/web-scraping-blogs/174-web-scraping-the-solution-to-data-harvesting

Wednesday, 1 May 2013

How to get rid of Screen Scrapers from your Website

While driving on a long trip this weekend, I had a bit of time to think. One topic that came to my mind was screen scraping, with a focus on APIs. It hit me: screen scraping is more of a problem with the content producer than it is with the “unauthorized scraping” application.

Screen scraping is the process of taking information that is rendered on the client, and then transforming the information in another process. Typically, the information that is obtained is later processed for filtering, saving, or making a calculation on the information. Everyone has performed some [legitimate form] of screen scraping. When you print a web page, the content is reformatted to be printed. Many of the unauthorized formats of screen scraping have been collecting information on current gambling games [poker, etc], redirecting capchas, and collecting airline fare/availability information.

The scrapee’s [the organization that the scraper is targeting] argument against the process is typically a claim that the tool puts an unusual demand on their service. Typically this demand does not provide them with their usual predictable probability of profit that they are used to. Another argument is that the scraper provides an unfair advantage to other users on the service. In most cases, the scrapee fights against this in legal or technical manners. A third argument is that the content is being misappropriated, or some value is being gained by the scraper and defrauded from the scrapee.

The problem I have with the fighting back against scrapers, is that it never solves the problem that the scrapers try to fix. Let’s take a few examples to go over my point: the KVS tool, TV schedules, and poker bots. The KVS tool uses [frequently updated] plugins to scrape airline sites to get accurate pricing and seat availability details. The tool is really good for people that want to get a fair bit of information on what fares are available and when. It does not provide any information that was not provided by anyone else. It just made many more queries than most people can do manually. Airlines fight against this because they make a lot of money on uninformed users. Their business model is to guarantee that their passengers are not buying up cheap seats. When an airline claims that they have a “lowest price guarantee” that typically means that they show the discount tickets for as long as possible, until they’re gone.

Another case where web scraping has caused another issue is with TV schedules. With the MythTV craze a few years ago, many open source users were using MythTV to record programs via their TV card. It’s a great technology, however the schedule is not provided in the cable TV feed, at least in an unencrypted manner. Users had to resort to scrapping television sites for publicly available “copyrighted” schedules.

The Poker-bots are a little bit of an ethical issue. This is something that differs from the real world rules of the game. When playing poker outside of the internet, players do not have access to real-time statistic tools. Online poker providers aggressively fight against the bots. It makes sense; bots can perform the calculations a lot faster than humans can.

Service providers try to block scrapers in a few different ways. The end of the Wikipedia article lists more; this is a shortened version. Web sites try to deny/misinform scrapers in a few manners: profile the web request traffic (clients that have difficulty with cookies, and do not load JavaScript/images are big warning signs), block the requesting provider, provide “invisible false data” (honeypot-like paths on the content), etc. Application-based services [Pokerbots] are more focused on trying to look for processes that may influence the running executable, securing the internal message handling, and sometimes record the session (also typically done on MMORPGs)

In the three cases, my point is not to argue why the service is justified in attempting to block them, my point is that the service providers are ignoring an untapped secondary market. Those service providers have refused to address the needs of this market – or maybe just haven’t seen the market as viable, and are merely ignoring it.

If people wish to make poker bots, create a service that allows just the bots to compete against each other. The developers of these bots are [generally] interested in the technology, not so much the part about ripping-off non-bot users.

For airlines, do not try to hide your data. Open up API keys for individual users. If an individual user is trying to abuse the data to resell it, to create a Hipmunk/Kayak clone, revoke the key. Even if the individual user’s service request don’t fit the profile; there are ways of catching this behavior. Mapmakers have solved this problem a long time ago by creating trap streets. Scrapers are typically used as a last resort, they’re used to do something that the current process is made very difficult to do.

Warning more ranting: with airline sites, it’s difficult to get a very good impression on the cost differences of flying to different markets [like flying from Greensboro rather than Charlotte] or even changing tickets, so purchasing from an airline is difficult without the aid of this kind of tool. Most customers want to book a single round trip ticket, but some may have a complex itinerary that will have them leaving Charlotte stopping over in Texas, then to San Francisco, and then returning to Texas and flying back to my original destination. That could be accomplished by purchasing separate round trip tickets, but the rules of the tickets allow such combinations to exist on a single literary. Why not allow your users to take advantage of these rules [without the aid of a costly customer service representative]?

People who use scrapers do not represent the majority of the service’s customers. In the case of the television schedules example, they do not profit off the information, and the content that they wished to retrieve wasn’t even motivated by profit. Luckily, an organization stepped in and provided this information at a reasonable [$25/yr] cost. The organization is SchedulesDirect.

The silver lining to the battle on scrapers can get interesting. The PokerClients have prompted scraper developers to come up with clever solutions. The “Coding the Wheel” blog has an interesting article about this and how they inject DLLs into running applications, use OCR, and abuse Windows Message Handles [again of another process]. Web scraping introduces interesting topics that deal with machine learning [to create profiles], and identifying usage patterns.

In conclusion, solve the issue that the screen scrapers attempt to solve, and if you have a situation like poker, prevent the behavior you wish to deny.

Source: http://theexceptioncatcher.com/blog/2012/07/how-to-get-rid-of-screen-scrapers-from-your-website/

Note:

Roze Tailer is experienced web scraping consultant and writes articles on screen scraping services, website scraper, Yellow Pages Scraper, amazon data scraping, yellowpages data scraping, product information scraping and yellowpages data scraping.

Screen Scraping, Screen Scrapping Software, Screen Scraper, Screen Scrappers

Screen Scraping

The process of collecting visual data online from web pages is known as screen scraping. Also known as screen scrapping, it is the method of acquiring data displayed on screen by capturing the text manually or via software. To perform scraping automatically, software must be used that can recognize the specific data. This screen scrapping software takes the data from the HTML web pages and converts the unstructured data into structured records or reports. This software for screen scraping can be used for a number of applications. For instance a real estate agent may use a screen scraper to gather data on competing websites to form an average price or offer for a given house in an area.  A marketer may use screen scraping software to collect emails of the customers whereas the researchers in general use it as a tool to gather a wide array of data or information.

Through scrappingexpert.com one can easily get access to screen scrapping services for scrapping of regular price updates, scrapping for sales leads and scrapping of regular products or service updates of the competitors.

Related Coverage

Touch Screen Repair

A touchscreen is a display that can sense the presence and place of a touch in the display location. The term normally refers to touch or contact on the display of the monitor by a hand or finger. Fire Places Screen Savers Might be Ideal to suit your needs and your family members

In the event you see fireplace screensavers before then you know which you may have considered them a pointless piece of software and not value spending for Best Touch-screen Laptops Worldwide

Touch-screen laptops are now widely used all over the world thank to their convenience and functions such as iPad and Android tablets. However, if you expect to own a two-in-one tool that provides the keyboard for writing emails and the touch-screen for a natural way to surf the web as well as browse your photos, touch-screen laptops can meet your demands. Let take a look at top ten best touch-screen laptops worldwide Green Screen Editing Technique

How about amazing your family members, neighbors and friends with some beautiful pictures from around the world? Wouldn’t it be a great surprise for them to see them and wonder that when you went on a world trip? Well! Of course for that you literally do not need to travel the globe but need a simple green screen editor to do the magic work for you.Extracting data from all the categories of the information, cities and countries is really easy with the screen scrappers offered by this scrapping solutions provider. These scrappers can write the extracted data to excel or CSV format and also save it as XML.
They can be used dynamically for both unlimited ad extraction and unique ads extraction. Having the features of automatic updating, these are the latest versions of scrapping softwares that are compatible with latest OS platforms like Microsoft XP, Vista, Windows 7, Linux and Mac.

Source: http://www.offroadwithcobba.com/screen_scraping_screen_scrappi/

Note:

Roze Tailer is experienced web scraping consultant and writes articles on screen scraping services, website scraper, Yellow Pages Scraper, amazon data scraping, yellowpages data scraping, product information scraping and yellowpages data scraping.

RYANAIR: screen scrapers, databases, free-riding and unfair competition in Spain

Here's an instructive piece from our man in Spain, Fidel Porcuna, on a situation in which -- even for a business that has an ample portfolio of rights -- it may be difficult or impossible to guard against free-riding.  Fidel writes:

On 9 October 2012 the Spanish Supreme Court ruled on the dispute between Ryanair Ltd and Atrápalo, S.A., a Spanish online travel agency using a screen scraper software on Ryanair's website. The Court confirmed previous instances decisions that dismissed Ryanair's claim based on copyright infringement of a database, infringement of a sui generis or standalone database right, and unfair competition. Proved facts were as follows: Atrápalo regularly enters  Ryanair's website as a mere user. By means of a screen scraper software that reads the search patterns of the Ryanair website, Atrápalo extracts the information on flights its own user is requesting through Atrápalo's website and provides it, omitting that such information is scraped from Ryanair's website. Atrápalo collects not only time details, but also prices as displayed in Ryanair's website. To such such prices Atrápalo adds a cut (its profit). Ryanair offers a whole range of complementary services to anyone who navigates through Ryanair's website searching for a flight. The terms and conditions regulating the use of Ryanair websites include a prohibition to use screen scrapers and use of the websites with a commercial purpose.

Based very much exhaustively on the CJEU's interpretation of Directive 96/9 on the legal protection of databases (cases C-604/10 Football Dataco Ltd, The Scottish Premier League Ltd, The Scottish Football League, PA Sport UK Ltd v Sportradar GmbH and Sportradar AG; C-545/07 Apis-Hristovich EOOD v Lakorda AD; C-444/02 Fixtures Marketing Ltd v Organismos Prognostikon Agonon Podosfairou; C-338/02 Fixtures Marketing Board Ltd v Svenska Spel AB, C-203/02 British Horseracing Board and Others, etc.), the findings of the Court are as follows:


    Ryanair does not have a database protected under Article 12.2 Spanish Copyright Act (as implemented by Articles 1(2) and Article 3(1) of Directive 96/9). The Court declares that there is not a proper database (no collection of independent data), but a software that generates the information requested under the parameters introduced by the user (that is, a software that provides the best price for the flight the user is looking for, considering a range of variable factors). In the hypothetical case that the Court accepted Ryanair's allegation on the existence of a database, in no case it could be accepted that such hypothetical database's structure meets the originality necessary to be protected. Indeed, the selection and arrangement comes from a software, says the Court. Ryanair defended that, in contrast, the Regional Court of Hamburg had declared in Ryanair v Cheaptickets of 26 February 2010 that Ryanair did have a database.
    There is not a sui generis right in a database, as Ryanair's substantial investment was not directed to collect data: it was directed to create a software that generates information under the parameters introduced by the user of Ryanair's website. That is, the investment refers ultimately to creation of information, but not to its collection, verification or presentation.

Importantly, the Court refers to violation of contractual law and unfair competition as follows:

    The Court concludes that there is no contractual relationship between Atrápalo and Ryanair and therefore no violation of a contract exists. The Court accepts that the supply of or the access to information on flights could be subject to a contract under the Spanish law, but it considers that the use of the Ryanair website –- free to anyone who types the URL address –- does not entail a consent to enter into such a contract. Therefore Ryanair failed to prove Atrápalo's consent to enter into its terms and conditions to navigate through its website – despite the latter using the website through a screen scraper, expressly forbidden by such terms and conditions. The situation then, as viewed by the Court, is that Atrápalo carried out something not allowed by Ryanair in a contract to which it did not consent and so no violation of the contract could exist. The Court noted that Ryanair acknowledged that it does not apply proper (technical) means to prevent travel agencies to use their websites.
    Based on procedural reasons, the Supreme Court did not decide on the merits regarding unfair competition. But such is nonetheless rejected by the lower court (Court of Appeals). Ryanair argues that Atrápalo is free-riding on its effort to create a potent and liable software that optimizes flight information according to users' requests. By the screen scraping, Atrápalo is also diverting users away from Ryanair's website where a range of different complementary services are offered to whom is looking for a cheap flight (car rentals, hostels, etc.) causing loss of profits to Ryanair. The Appeals Court believes there is no unfair advantage of Ryanair's repute (as argued by Ryanair's lawyers); and Atrápalo or other travel agencies do not need an authorisation from  Ryanair for exercising their intermediary's role as there is no legal right that would support this. Nor is a bad faith conduct, as Atrápalo is not affecting the normal functioning of the market or altering the market's competitive structure. Indeed, it is beneficial for users and therefore helps in keeping and fostering the free competition of the current economic order.

Source: https://www.marques.org/class46/default.asp?D_A=20130301

Note:

Roze Tailer is experienced web scraping consultant and writes articles on screen scraping services, website scraper, Yellow Pages Scraper, amazon data scraping, yellowpages data scraping, product information scraping and yellowpages data scraping.