Surfing the Internet Gets Deep

The internet has revolutionized society by changing the way people communicate, find information, and enjoy entertainment. But a standard internet search misses at least 90 percent of the information available.

The internet is separated into two unequal pools of information. The surface web contains pages of information that are utilized by popular search engines. The second pool of information is locked away in the deep web, which consists of countless databases world wide.

According to Walt Warnick, Director of the DOE Office of Scientific and Technical Information (OSTI), "The deep web is huge."

Common browsers like Google and Yahoo crawl across the thousands of internet pages on the surface web, but are unable to dig into the databases to retrieve information from the deep web.

 

Figure credit: Office of Scientific and Technological Information

A WorldWideScience search fans across the surface of the internet and drills into databases to mine requested information in the deep web.

"Asking a scientist, engineer, or educator to find information in their field using common web browsers is like asking a doctor to diagnose disease without X-rays, MRI, or any other piece of diagnostic equipment" said Warnick.

Information in the deep web can only be mined for data using search engines designed for that particular database. Many of the search engines that are available to mine databases often do not use relevance ranking, making filtering through the information a crap shoot.

"Under the current system, finding information in the deep web is a series of practical impossibilities, placing internet users, especially scientists and science educators, at a severe disadvantage" said Warnick.

To address the global science need, OSTI has launched WorldWideScience.org, a science gateway that accelerates the search for data in national and international scientific databases and portals on the internet.  The data spans physical and life sciences as well as medical studies.

Warnick called the development of this technology "a series of sequential miracles" that makes the deep web accessible to a wider audience where billions of dollars worth of government-sponsored scientific research results reside.

WorldWideScience provides a one-stop search engine for global scientific databases. When a query is entered into the search engine, it is transmitted to the gateway server at OSTI in Oak Ridge, Tennessee. Going beyond the capabilities of the common web crawler, which searches the internet horizontally; this new search engine adds the enhanced capability of simultaneously searching a select group of databases vertically.

Since the debut of the search engine prototype in 2007, WorldWideScience has expanded from 10 to 56 participating countries. China is the most recent country to contribute to the search engine, opening a wealth of scientific information to the world. The search engine now scours more than 375 million pages of scientific information contained in deep web databases.

The multilateral WorldWideScience Alliance consists of the participating countries and was established in June 2008 to govern this rapidly growing online gateway to international scientific research information.

 

Figure credit: Office of Scientific and Technological Information

A map of the countries participating in WorldWideScience.

Despite these advances, there is more work to do. Currently, WorldWideScience is limited to searching databases with English titles and abstracts. This constraint confines the number of databases accessible by the search engine. The Alliance is now exploring translation technologies to expand the network of databases accessible to the worldwide community and is making progress toward deploying this capability.

This work is supported by the Department of Energy (DOE) Office of Scientific and Technical Information (OSTI), within the Office of Science. DOE invests in science and solving critical issues impacting people's daily lives and the nation's future. For more information, visit http://www.science.energy.gov.

OSTI advances science and sustains technological creativity by making the research and development findings available and useful to DOE researchers and the public.

This article was written by Stacy W. Kish.