Thursday, July 16, 2009

Interesting New Search Engine Technologies - Wolfram Alpha And Bing (Powerset)

Today let's talk about some cool new Search technologies such as Bing (Powerset) which is a Natural Language technology, and Wolfram Alpha.

Technology is growing at such a rapid rate. It is hard for most of us to keep up with the pace, let alone adopt the new technology at the speed at which it is developed.

Along with the new mobile phones, computers, MP3 players, PDAs etc, now to add to to this list are some brand new Search Engines. Let me rephrase this, Search technologies (Search Engine).

Historically Search Engines included those such as Excite, Dogpile, Altavista, Ask Jeeves, Yahoo, and MSN. Now the most popular are Google, Ask Jeeves, and Bing which was released this year of 2009 (formerly Live Search, Windows Live Search, and MSN Search) .

Customarily, a Search Engine are composed of indexed directories of numerous elements such as URLs, graphics, images, multimedia elements, file formats, font/typeface selection, screen real estate etc.

There are also Meta Search Engines and Invisible/Deep Web Search Engines. Meta Search Engines
such as Dogpile or Metacrawler directs your search to other databases and presents the data to a user in a compiled list. Invisible/Deep Web is the Internet portion of the World Wide Web that is not easily visible or indexed by traditional search engines such as non HTML pages, PDFs, documents, and dynamically generated data.

Google is the dominant and most popular search engine known to individuals, communities, and corporations around the world. Google uses a technique of matching text to find and aggregate web pages that are relevant to the user's search. Google's technique as a search engine doesn't compute answers based on human knowledge.

For some of you the name Google is a derivative of the number "Googol" which is a number one followed by a hundred zeros (1x10100 in Decimal Representation).

How Google aggregates data is by scanning web pages to find instances of the keywords you have entered in the search box such as "Recipe for making Lasagna." Google then returns a compiled list (in order by ranking) to the user that has the keywords "recipe" "for" "making" "Lasagna."

Google creates a directory to organize Internet resources into categories. Something called Bots (otherwise known as web crawlers) that browse the web to gather data and updated information for search engines to create directories. These Bots are computer programs that do this in a automated fashion and on a schedule. This information is indexed to for faster search results provided to user when they enter words or search data into the Search box. Included in this data gathering technique URLs are also learned and gathered to be included in a directory the Search Engine indexes. These are General Directories or Subject Directories that cover a specific topic.

Former Search Engines included those such as Excite, Dogpile, Altavista etc. Then came along Ask Jeeves, Yahoo, MSN, and now Bing's (formerly Live Search, Windows Live Search, and MSN Search). Bing's underlying technology is powered by Powerset, a company Microsoft purchased (more on Powerset later).

Users perform several types of searches such as General Browsing, Keyword, Full Text, and Proximity (Phrase or Near Operator Searching). This also includes Boolean which uses Logic Operators AND, OR, NOT. In a Boolean search operators such as the following are performed:
  • AND = Subject + Subject
  • OR = Subject OR Subject
  • NOT = Subject NOR Subject
Other types of Search techniques include Truncation which allows you to search variations of a word (Example: meth* which will obtain all items or documents that start with "meth"), Proximity (searches for the occurrence of a phrase within separate documents in a specific proximity), or Case Sensitive searching. Being that Google generally is case insensitive when doing a search (searching on a keyword such as "Bill" and "bill" are synonymous), there is a Google tool to perform a Case Sensitive Search using "Case Sensitive Google Search" (http://case-sensitive-search.appspot.com).

Moving into newer technologies for Search Engine options are the current traditional "Search Box," and now "Natural Language," or "Answer Engine."

There is a new Answer Engine developed by Wolfram Alpha. This is a technology very different from Google. As Google is generally a very large Lookup system similar to a librarian for the Internet. Wolfram Alpha does not compete with Google. Wolfram Alpha is geared to compute factual answers to questions in a Search query. Think of Wolfram Alpha as a calculator, or a brainiac right at your finger tips. A tool that will compute a variety of answers to questions, calculate formulas etc. As Wolfram's team has phrased this as a "Computational Knowledge Engine." It is similar to Wikipedia but much more.

Wolfram Alpha computes numerous types of data such as factual answers to questions such as location of a particular GPS coordinates, chemistry or biological questions, averages or specific details to mathematical equation etc. The techniques, heuristics, algorithms, methods used in traditional and conventional Search engines are totally different than with Wolfram's technology.

As there is billions of billions of data on the World Wide Web that is accessible. Search Engines such as Google, Yahoo, etc can very effectively search for specific terms, phrases, keywords etc. Wolfram is now able to compute data from this data pool using algorithms to calculate information and responses to questions.

Another type of Search Engine such as Natural Language has been developed by Powerset, a company in northern California. Microsoft's Bing Search Engine uses Powerset's technology for it's search engine (Microsoft purchased Powerset in 2008).

Powerset applies Natural Language techniques to extract a concepts out of text or a phrase. It then builds a index (similar to Google) and then separates the results by combining the primary Keyword with related verbs and nouns on a web page(s). It can also search by a date. Powerset is able to dynamically compute results and information in a Search. What this means is that a user can type a question in a search box the way they would verbally ask the question and get a result returned in the search. The advantage to this is that a user is able to receive information directly related to their search (question(s)). An example of this would be, a user would type "What is the poorest city in United States?" The results returned would not return data based on the Keywords in its database. It would return a relevant answer based on a totally different answer based on a Natural Language algorithm which is more likely to provide a user the information they are seeking.

There you have it. A high level overview of today's new Search Technologies. As innovation continues to progress we will see Google, Wolfram, and Bing (Powerset) advance and become perfected. Also be certain that new techniques will continue to be developed.

Sincerely,
iKeep It Funky!
http://blog.ikeepitfunky.com
Twitter - @iKeepItFunky