Google as a Bookmarking Tool

Ryan Jazayeri & Robert Miller

Motivation
Our research aims to make frequently-visited pages quicker and easier to reach. We approach this not by altering the search algorithms themselves, but by helping choose better search queries before any search is performed, and by automatically reordering the results based on user browsing history. The project consists of two separate parts: the GoogleURL Generator and Google Reordering.

A GoogleURL for some target webpage is a list of keywords that, when queried in Google, returns that target page as the first search result. For example "yahoo maps" is a GoogleURL for "http://maps.yahoo.com". Good GoogleURLs are short and memorable, and thus easily communicable. The GoogleURL Generator, which we are currently developing, is a tool that, given a webpage, generates GoogleURLs for it. A screenshot of the current version is shown in Figure 1.


Figure 1 : GoogleURL Generator in action.

Google Reordering incorporates user browsing history to try to make search smarter. We use knowledge of the user's browsing history to reorder the results generated by a Google search query, and bring to the top the results that are most likely to be what the user is searching for.

Approach
A simple way to generate a functioning GoogleURL for some page would be to generate a list of keywords from that page, and pick out the rarest words on the page to use. For example, the rarest words might be an obscure last name on the page, and some uncommon words nestled somewhere in the page. This approach would be a poor choice for GoogleURL generation, because the keywords generated would not be memorable.

We instead generate candidate GoogleURLs by attempting to reverse-engineer Google's algorithm. We generate a weight for each keyword on the target page, taking into account its formatting, number of occurrences in the page, and its frequency of occurrence across the web. In addition to these keywords on the target page itself, we also factor in words that appear in backlinks pointing to that target page. We rank each keyword according to these keyword weights.

In constructing candidate GoogleURLs, we prefer to use keywords to which we assigned a high ranking. We additionally examine proximity between keywords and also try to identify any recurring phrases in the document. From this information, we create candidate GoogleURLs of varying length, and verify whether they actually are GoogleURLs.

In Google Reordering, we detect when the user is performing a Google search. We then intercept and manipulate the search results returned by Google. Before presenting those results, we bring to the top the pages that the user is most likely looking for, for example any page among the results that he has visited many times recently.We use LAPIS for all our HTML parsing and pattern matching. We perform Google searches via the Google Web APIs.

References

[1] Ryan Jazayeri. Google as a Bookmarking Tool. MEng thesis, Massachusetts Institute of Technology, June 2004.