New Usage of Web Applications––Web Application Convergence Assisted by Wrappers
This article describes the mechanism by which wrappers work and a method that supports wrapper generation. It also introduces some examples of the utility of wrappers. Wrappers, which enable ordinary web applications to be used as components, let us make new applications by converging the components.
1. Web application convergence and wrappers
Many people use various web applications. Some people may use a hotel search web application and a route plan web application to find a route from their home to a hotel that they have found and print out the results. In this way, several web applications and other systems such as printers are typically used at the same time. Useful applications can be made by converging web applications and other systems. This web application convergence requires each application to be a component. Our wrappers enable such web applications to be used as components.
2. Importance of components
In NTT Network Service Systems Laboratories, we conduct research on service platforms to provide ubiquitous services by combining various components. Users receive appropriate ubiquitous services from the platform when components that are suitable for each user's situation are combined dynamically. Providing such ubiquitous services requires various components to meet changes in a user's situation. Those components are combined to make a service more suitable for handling the situation.
Mashups are becoming popular these days. A mashup is used to create Web 2.0 applications. Mashups stimulate user-initiative in the creation of applications because of their simplicity. In using a Mashup, application creators find components such as web services and converge them. In this way, they can create applications easily, but a huge number of components is necessary to enable them to create applications as they like.
3. Making components from web applications
We have conducted research about wrappers that make web applications, especially web applications that perform searches such as hotel search web applications, usable as components. The wrapper interconverts a web application's protocol and a component's protocol, which is SOAP (simple object access protocol), and makes the web application usable as a web service. However, creating a wrapper for each web application is laborious. Therefore, our wrapper interconverts in accordance with a configuration file written for each web application (Wrapper in Fig. 1). For example, a configuration file for a hotel search web application has to be written to wrap the web application and make it usable as a hotel search component. To make a huge number of web applications usable as components, a huge number of configuration files must be written, so a method of reducing this work is required.
3.2 Generation of configuration files by automatic extraction of search results
A configuration file contains descriptions of where to extract from an HTML (hypertext markup language) document given by a web application and how to convert the extracted search results into an XML (extensible markup language) document. Our wrapper-generation tool supports the generation of configuration files by automatic extraction of search results from the HTML document and the writing of the position of the extracted search results in the HTML document into the configuration file (Wrapper tool in Fig. 1).
To extract the search results, the wrapper tool uses a feature of an HTML document generated by a search web application as below. Most search web applications generate HTML documents automatically using a database. Such HTML documents tend to have a repetitive tag pattern. The wrapper tool extracts search results from the HTML document automatically by using the repetitive pattern.
4. Utilization of repetitive depth pattern of tag
Our method of extracting search results finds areas in an HTML document that have a repetitive depth pattern of tags to find the repetitive tag pattern. The depth of tags means the number of nesting levels and is visualized as shown in Fig. 2(a). HTML documents often have the repetitive depth pattern shown in Fig. 2(b) because web applications generate HTML documents mechanically. The wrapper tool extracts the area that has the same repetitive depth pattern as the search results.
This method is similar to a method used by a human being to find search results in an HTML document generated by a web application. When search results need to be found in an HTML document written in a foreign language, an area that has a repetitive pattern, such as a table or list, is found. The person finds the correct search results even if there are many other elements such as menus and banners. In this case, the person will use neither the HTML tags nor the meaning of text in the HTML document but the appearance of the HTML document to find the results of the hotel search. In this way, the wrapper tool extracts search results from various HTML documents generated by web applications by using a method used by human beings in their daily lives.
5. Experimental results
We collected 100 HTML documents written in Japanese generated by 100 web applications providing services, such as an online shop, hotel search service, and rental apartment search service. At the same time, we implemented an experimental program that finds an area that has the depth pattern and extracts a segment around the area as a search result segment.
Then, we tried to extract search result segments from the HTML documents using the experimental program and found that the program extracted the correct result segments from 80% of the HTML documents. This means that our method can process the HTML documents generated by most web applications.
One of the main reasons for extraction failures was that there were other non-search result segments that had repetitive depth patterns. For example, collections of links and menu segments often had the repetitive tag pattern, and they had the repetitive depth pattern. The experimental program could not determine whether the found segment was the correct search result segment.
6. Wrapper tool
We implemented the wrapper tool based on the experimental program. The tool generated the configuration files for the wrapper. Its graphical user interface (GUI) is shown in Fig. 3. Users operated the tool and made the configuration files as below.
(1) The user entered a uniform resource locator (URL) of a search result HTML document generated by the target web application and clicked the next button on the tool's GUI.
(2) The wrapper tool showed a search result such as information about a hotel. The user confirmed that the information was the correct search result; otherwise, the user was shown the next search result candidate. After being shown the correct search result, the user checked the check boxes on the GUI to indicate output values of the web application used for output values of the component, which is a web service, and entered their output parameter name. For example, the user checked the box next to hotel name and entered hotelName as an output parameter name.
(3) The wrapper tool showed elements in the URL of the search result HTML document. These elements are directory names and a file name in the URL path and input values in the URL query string. The user checked boxes to indicate input values of the web application used for input values of the component and entered their input parameter name. For example, the user checked the box next to an area name and entered area as an input parameter name.
(4) The user entered the port type, which is a class name, and the operation, which is a method name.
(5) The user confirmed whether the generated XSLT file, which has conversion rules for conversion from HTML to SOAP, and another configuration file were correct. These two files were downloaded as a zipped file.
In this way, the user obtained the configuration files through an easy operation, and those files make the web application usable as a component by inputting the configuration files into the wrapper.
7. Application of wrapper technology
Are wrappers useful only for ubiquitous services or mashup services? One of the most popular uses is a bank account aggregation service or cross search services. These services show the search results of multiple web applications on one page. Users can compare those search results by looking at the page. To make such services, the service creator must write programs for each web application to extract important segments from HTML documents generated by the web applications.
An example application for a hotel cross search is shown in Fig. 4. In this case, the application shows search results for travel agencies A, B, and C. The wrapper extracts search result segments from the HTML documents generated by each web application, converts them into a compatible format, and sends them to a server for the cross search application. The server generates an HTML document from the received search results and provides it to the user. The service creator creates cross search applications easily and with less programming by letting the wrapper extract important segments and by using the wrapper tool. In addition, when the design of HTML documents generated by a web application changes, the cross search application manager generates a new configuration file easily by using the wrapper tool, and he/she does not have to write a program. This means that the wrapper and the wrapper tool can also reduce the maintenance cost of cross search applications.
We implemented wrappers and a wrapper tool that can make a huge number of web applications easily usable as components. As future work, we plan to develop wrappers to reduce maintenance costs more effectively by following changes in web application design automatically.