Enabling Access to Diverse Content for Web Users

Our new technical report shows that there is large overlap in the Google and Bing search results, but search over Twitter content provides useful, complementary results.  This comprehensive report consolidates the results from our KDD-2015 and WWW-2015 papers in one place.

Homogeneity in Web Search Results: Diagnosis and Mitigation
Rakesh Agrawal, Behzad Golshan and Evangelos Papalexakis
Technical Report TR-2015-007

Abstract

Access to diverse perspectives nurtures an informed citizenry.  Google and Bing have emerged as the duopoly that largely arbitrates which English language documents are seen by web searchers.  We present our empirical study over the search results produced by Google and Bing that shows a large overlap.  Thus, citizens may not gain different perspectives by simultaneously probing them for the same query.  Fortunately, our study also shows that by mining Twitter data one can obtain search results that are quite distinct from those produced by Google and Bing.  Additionally, the users found those results to be quite informative.

We also present two novel tools we designed for this study.  One uses tensor analysis to derive low-dimensional compact representation of search results and study their behavior over time. The other uses machine learning and quantifies the similarity of results between two search
engines by framing it as a prediction problem.  Although these tools have different underpinnings, yet the analytical results obtained using them corroborate each other, which reinforces the confidence one can place in them for finding meaningful insights from big data.

Bibtex Entry:

@techreport{AGP15:homogeneity,
title={Homogeneity in Web Search Results: Diagnosis and Mitigation},
author={Rakesh Agrawal and Behzad Golshan and Evangelos Papalexakis},
number={TR-2015-007},
institution={Data Insights Laboratories},
address={San Jose, California},
month={June},
year={2015}
}

Data-Driven Education

Our upcoming tutorial at KDD-2015  will present the novel data-driven education technologies under development and testing.

Data-Driven Education
Rakesh Agrawal
21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2015

Abstract:

The recent improvements in bandwidth and connectivity have enabled enormous improvement in the speed and quality of communication among participants in the education echosystem. It makes it possible to connect students to great teachers of their choice and to other students who are compatible with their level of preparedness and learning temperament. One can choose what one learns and at what pace and in what order and self-assess the understanding of the material through personalized quizzes and tests. At the same time, it becomes feasible to measure student comprehension in real time and adjust the material presented to students to achieve higher levels of competency. Continuous feedback on learning outcomes and subject matter mastery also allows rapid evolution toward the most effective educational material and pedagogical methods.

Given the above emergent educational landscape, this tutorial aims to introduce the attendees to the novel data-driven education technologies under development and testing. The topics to be covered include:

  • Inferring learning units and dependence between them from current educational material, yielding a knowledge graph that provides the core data structure for organizing and navigating learning experiences
  • Enriching the knowledge graph with rich content in multiple formats mined from the web as well as expert-sourcing
  • Personalized learning plans based on explicit student preferences as well as recommendation based on aggregation of past navigations through the knowledge graphs
  • Overlaying the knowledge graph with social graph of students and teachers to allow dynamic formation of classes and study groups with the goal of maximizing overall learning
  • Open research problems

Bibtex Entry:

@inproceedings{Agr15kdd:Data,
title={Data-Driven Education},
author={Rakesh Agrawal},
booktitle={21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
month={August},
year={2015},
address={Sydney, Australia},
note={Also, Data Insights Laboratories Technical Report TR-2015-006, June 2015}
}

KDD-2015 Paper on Complementarity of Twitter and Google Search

Our paper showing search over Twitter content can provide useful results  that complement web search results from Google and Bing has been accepted for presentation at the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015).  This version supplants our Technical Report TR-2015-002.

Whither Social Networks for Web Search?
Rakesh Agrawal, Behzad Golshan and Evangelos Papalexakis
21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2015

Abstract:

Access to diverse perspectives nurtures an informed citizenry. Google and Bing have emerged as the duopoly that largely arbitrates which English language documents are seen by web searchers. A recent study shows that there is now a large overlap in the top organic search results produced by them. Thus, citizens may no longer be able to gain different perspectives by using different search engines.

We present the results of our empirical study that indicates that by mining Twitter data one can obtain search results that are quite distinct from those produced by Google and Bing. Additionally, our user study found that these results were quite informative. The gauntlet is now on search engines to test whether our findings hold in their infrastructure for different social networks and whether enabling diversity has sufficient business imperative for them.

Bibtex Entry:

@inproceedings{AGP15kdd:whither,
title={Whither Social Networks for Web Search?},
author={Rakesh Agrawal and Behzad Golshan and Evangelos Papalexakis},
booktitle={21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
month={August},
year={2015},
address={Sydney, Australia}
}

Data Engineering in Asia: Unique Technical Challenges and Opportunities

Asia is in the midst of a historic transformation. Asia’s per capita income is projected to rise sixfold and its share of global gross domestic product is expected to increase to 52 percent by 2050. Science and technology has been cited as one of the key pillars for the success of Asia’s development.

The plenary panel, “Data Engineering in Asia: Unique Technical Challenges and Opportunities“,  organized by Rakesh Agrawal at the 31st International Conference on Data Engineering (ICDE 2015),  pondered how data engineering can uniquely contribute to the emergent Asian century.

Each panelist described a data-centric problem from their personal experience that had uniquely Asian flavor in the problem definition, or the solution approach, or preferably both.
They also presented one open problem each that merits close attention from data researchers.

The panelists were:

Dr. Rakesh Agrawal (Data Insights Laboratories, Chair)
Prof. Longbing Cao (University of Technology, Sydney, Australia)
Dr. Jong-Deok Choi (Samsung, Korea)
Prof. Beng Chin Ooi (National University of Singapore, Singapore)
Prof. Krithi Ramamritham (IIT Bombay, India)
Prof. Sean Wang (University of Vermont)
Prof. Masatoshi Yoshikawa (Kyoto University, Japan)

TR-2015-005 contains the panel statement and the biographies of the panelists.  You may access the presentations made by the panelists from here.

@inproceedings{Agr15icde:data,
title={Data Engineering in Asia: Unique Technical Challenges and Opportunities},
author={Rakesh Agrawal},
booktitle={31st International Conference on Data Engineering},
month={April},
year={2015},
address={Seoul, Korea},
note={Also, Data Insights Laboratories Technical Report TR-2015-005, April 2015}
}

Big Data: Old Wine in New Bottle?

Astounding advances in storage, communication, and computation have propelled us into a new world, the world of Big Data. The Big Data enthusiasts believe it is a revolution that will transform how we live, work, and think, and it will underpin new waves of innovation and productivity growth.  At the same time, skeptics abound.

The plenary panel, “Big Data: Old Wine in New Bottle?“,  organized by Rakesh Agrawal, debated the following questions at the 31st International Conference on Data Engineering (ICDE 2015):

  • What technological foundations of Big Data differentiate it from conventional data technologies?
  • What are the key road blocks that might cut short the promise of Big Data?
  • What are ten Hilbert’s problems in Big Data?

The panelists were:

Dr. Rakesh Agrawal (Data Insights Laboratories, Chair)
Prof. Sang Kyun Cha (Seoul National University)
Dr. Umeshwar Dayal (Hitachi)
Dr. Mukund Deshpande (Persistent Systems)
Prof. Hector Garcia-Molina (Stanford University)
Dr. Waqar Hasan (VISA)
Dr. David Lomet (Microsoft)
Prof. Volker Markl (Berlin Technical University)

TR-2015-004 contains the panel statement and the biographies of the panelists.  You may access the presentations made by the panelists from here.

Bibtex Entry:

@inproceedings{Agr15icde:big,
title={Big Data: Old Wine in New Bottle?},
author={Rakesh Agrawal},
booktitle={31st International Conference on Data Engineering},
month={April},
year={2015},
address={Seoul, Korea},
note={Also, Data Insights Laboratories Technical Report TR-2015-004, April 2015}
}

Data-Driven Synthesis of Study Plans

Our new technical report describes the outcome of our audacious undertaking to design a tool that algorithmically synthesizes study plan for a course offering from the sole input of concept phrases representing the course content. We welcome your feedback, particularly from teachers and students.

Data-Driven Synthesis of Study Plans
Rakesh Agrawal, Behzad Golshan and Evangelos Papalexakis
Technical Report TR-2015-003

Abstract:

A study plan for an educational course refers to the choice of concepts to be covered and the organization and sequencing of course content. While a good study plan is essential for the success of any course offering, the design of study plans currently remains largely a manual task. We present a novel data-driven method, which given a list of concepts can automatically propose candidate plans to cover all the concepts. The output of our method both identifies which concepts should be studied together and how students should move from one group of concepts to another. For our experimental validation, we use a dataset that contains a list of concept names from the field of physics. We find that our method is able to produce good plan.

Bibtex Entry:

@techreport{AGP15:data-driven,
title={Data-Driven Synthesis of Study Plans},
author={Rakesh Agrawal and Behzad Golshan and Evangelos Papalexakis},
number={TR-2015-003},
institution={Data Insights Laboratories},
address={San Jose, California},
month={March},
year={2015}
}

WWW-2015 Paper on Overlap between Google and Bing Search Results

Our paper showing large overlap in the Google and Bing search results has been accepted for presentation in the Web Science Track at the 24th International Conference on World Wide Web (WWW 2015).  This version supplants Technical Report TR-2015-001.

A study of distinctiveness in web results of two search engines
Rakesh Agrawal, Behzad Golshan and Evangelos Papalexakis
24th International World Wide Web Conference, Web Science Track, Florence, Italy, May 2015

Abstract:

Google and Bing have emerged as the diarchy that arbitrates what documents are seen by Web searchers, particularly those desiring English language documents. We seek to study how distinctive are the top results presented to the users by the two search engines.  A recent eye-tracking has shown that the web searchers decide whether to look at a document primarily based on the snippet and secondarily on the title of the document on the web search result page, and rarely based on the URL of the document. Given that the snippet and title generated by different search engines for the same document are often syntactically different, we first develop tools appropriate for conducting this study. Our empirical evaluation using these tools shows a surprising agreement in the results produced by the two engines for a wide variety of queries used in our study. Thus, this study raises the open question whether it is feasible to design a search engine that would produce results distinct from those produced by Google and Bing that the users will find helpful.

Bibtex Entry:

@inproceedings{AGP15www:study,
title={A Study of Distinctiveness in Web Results of Two Search Engines},
author={Rakesh Agrawal and Behzad Golshan and Evangelos Papalexakis},
booktitle={24th International World Wide Web Conference, Web Science Track},
month={May},
year={2015},
address={Florence, Italy}
}

Complementarity of Twitter and Google Search

This new technical report is a companion report to our earlier study that found a large overlap between the Web search results provided by Google and Bing.  It shows that search over Twitter content can provide useful results  that complement Web search results.

Whither Social Networks for Web Search?
Rakesh Agrawal, Behzad Golshan and Evangelos Papalexakis
Technical Report TR-2015-002

Abstract:

Access to diverse perspectives is essential for inculcating and nurturing an informed citizenry. Google and Bing have emerged as the duopoly that largely arbitrates at least what English language documents are seen by Web searchers. A recent study shows that there is now a large overlap in the top organic search results produced by them. Thus, citizens may no longer be able to garnish different perspectives by probing different search engines.

We present the results of our empirical study that indicates that by mining Twitter data one can obtain search results that are quite distinct from those produced by Google and Bing. Additionally, the users found those results to be quite informative in our user study. The gauntlet is now on search engines to test whether our findings hold in their infrastructure for different social networks and whether enabling diversity has sufficient business imperative for them.

Bibtex Entry:

@techreport{AGP15:whither,
title={Whither Social Networks for Web Search?},
author={Rakesh Agrawal and Behzad Golshan and Evangelos Papalexakis},
number={TR-2015-002},
institution={Data Insights Laboratories},
address={San Jose, California},
month={February},
year={2015}
}

Overlap in Google and Bing Search Results

We are out with our first technical report that studies the overlap between the Web search results provided by Google and Bing. See details below. We welcome your comments.


A study of distinctiveness in web results of two search engines

Rakesh Agrawal, Behzad Golshan and Evangelos Papalexakis
Technical Report TR-2015-001

Abstract:

Google and Bing have emerged as the diarchy that arbitrates what documents are seen by Web searchers, particularly those desiring English language documents. We seek to study how distinctive are the top results presented to the users by the two search engines. A recent eye-tracking has shown that the web searchers decide whether to look at a document primarily based on the snippet and secondarily on the title of the document on the web search result page, and rarely based on the URL of the document. Given that the snippet and title generated by different search engines for the same document are often syntactically different, we first develop tools appropriate for conducting this study. Our empirical evaluation using these tools shows a surprising agreement in the results produced by the two engines for a wide variety of queries used in our study. Thus, this study raises the open question whether it is feasible to design a search engine that would produce results distinct from those produced by Google and Bing that the users will find helpful.

Bibtex Entry:

@techreport{AGP15:study,
title={A Study of Distinctiveness in Web Results of Two Search Engines},
author={Rakesh Agrawal and Behzad Golshan and Evangelos Papalexakis},
number={TR-2015-001},
institution={Data Insights Laboratories},
address={San Jose, California},
month={January},
year={2015}
}