Enabling Access to Diverse Content for Web Users

Enabling Access to Diverse Content for Web Users

Our new technical report shows that there is large overlap in the Google and Bing search results, but search over Twitter content provides useful, complementary results.  This comprehensive report consolidates the results from our KDD-2015 and WWW-2015 papers in one place.

Homogeneity in Web Search Results: Diagnosis and Mitigation
Rakesh Agrawal, Behzad Golshan and Evangelos Papalexakis
Technical Report TR-2015-007

Abstract

Access to diverse perspectives nurtures an informed citizenry.  Google and Bing have emerged as the duopoly that largely arbitrates which English language documents are seen by web searchers.  We present our empirical study over the search results produced by Google and Bing that shows a large overlap.  Thus, citizens may not gain different perspectives by simultaneously probing them for the same query.  Fortunately, our study also shows that by mining Twitter data one can obtain search results that are quite distinct from those produced by Google and Bing.  Additionally, the users found those results to be quite informative.

We also present two novel tools we designed for this study.  One uses tensor analysis to derive low-dimensional compact representation of search results and study their behavior over time. The other uses machine learning and quantifies the similarity of results between two search
engines by framing it as a prediction problem.  Although these tools have different underpinnings, yet the analytical results obtained using them corroborate each other, which reinforces the confidence one can place in them for finding meaningful insights from big data.

Bibtex Entry:

@techreport{AGP15:homogeneity,
title={Homogeneity in Web Search Results: Diagnosis and Mitigation},
author={Rakesh Agrawal and Behzad Golshan and Evangelos Papalexakis},
number={TR-2015-007},
institution={Data Insights Laboratories},
address={San Jose, California},
month={June},
year={2015}
}