Query autocompletions considered harmful

Search engines suggest completions of partially typed queries to help users speed up their searches, for instance by showing the suggested completions in a drop down box. These query autocompletions enable the user to search faster, searching for long queries with relatively few key strokes. Jakobsson (1986) showed that for a library information system, users are able to identify items using as little as 4.3 characters on average. Query autocompletions are now widely used, either with a drop down box or as instant results (Bast and Weber 2006). While Jakobsson (1986) used the titles of documents as completions of user queries, web search engines today generally use large logs of queries submitted previously by their users. Using previous queries seems a common-sense choice: The best way to predict a query? Use previous queries! Several scientific studies use the AOL query log provided by Pass et al. (2006) to show that query autocompletion algorithms using query logs are effective (Bar-Yossef and Kraus 2011; Shokouhi and Radinsky 2012; Whiting and Jose 2014; Mitra and Craswell 2015; Cai et al. 2016). However, query autocompletion algorithms that are based on query logs are problematic in two important ways: 1) They return offensive and damaging results; 2) They suffer from a destructive feedback loop. Let’s discuss these two problems in the following sections.

Offensive queries

Query autocompletion based on actual user queries may return offensive results. In 2010, a French appeals court ordered Google to remove the word arnaque, which translates roughly as “scam”, from appearing in Google’s autocompletions for CNFDI, the Centre National Privé de Formation a Distance (McGee 2010). Google’s defense argued that its tool is based on an algorithm applied to actual search queries: It was users that searched for “cnfdi arnaque” that caused the algorithm to select offensive suggestions. The court however ruled that Google is responsible for the suggestions that it generates. Google lost a similar lawsuit in Italy, where queries for an unnamed plaintiff’s name were presented with autocomplete suggestions including truffatore (“con man”) and truffa (“fraud”) (Meyer 2011). In another similar law suite, German’s former first lady Bettina Wulff” sued Google for defamation when queries for her name completed with terms like “escort” and “prostitute” (Lardinois 2012). In yet another lawsuit in Japan, Google was ordered to disable autocomplete results for an unidentified man who could not find a job because a search for his name linked him with crimes he was not involved with (BBC 2012).

Google has since updated its autocompletion results by filtering offensive completions for person names, no matter who the person is (Yehoshua 2016). But controversies over query autocompletions remain. A study by Baker and Potts (2013) highlights that autocompletions produce suggested terms which can be viewed as racist, sexist or homophobic. Baker and Pots analyzed the results of over 2,500 query prefixes and concluded that completion algorithms inadvertently help to perpetuate negative stereotypes of certain identity groups.

There is increasing evidence that autocompletions play an important role in spreading fake news and propaganda. Query suggestions actively direct users to fake content on the web, even when they are not looking for it (Roberts 2016). Examples include bizarre completions for – again – person names, like “Michelle Obama is a man”, but also completions like “Did the holocaust happen”, which if selected, returns as its top result a link to the neo-Nazi site stormfront.org (Cadwalladr 2016).

The examples show that query autocompletions can be harmful when they are based on searches by previous users. Harmful completions are suggested when ordinary users seek to expose or confirm rumors and conspiracy theories. Furthermore, there are indications that harmful query suggestions increasingly result from computational propaganda, i.e., organizations use bots to game search engines and social networks (Shorey and Howard 2016).

A destructive feedback loop

Morally unacceptable query completions are not only introduced by the searches of previous users, they are also mutually reinforced by the search engine and its users. When a query autocompletion algorithm suggests morally unacceptable queries, users are likely to select those, even if the users are only confused or stunned by the suggestion. But how does the search engine ever learn it was wrong? It might not ever. As soon as the system determined that some queries are recommended; they are more often selected by users, which in turn makes the queries end up in the training data that the search engine uses to train it’s future query autocompletion algorithms. Such a destructive feedback loop is one of the features of a Weapon of Math Destruction, a term coined by O’Neil (2016) to describe harmful statistical models.

O’Neil sums up three elements of a Weapon of Math Destruction: Damage, Opacity, and Scale. Indeed, the damage caused by query autocompletion algorithms is extensively discussed in the previous section. Query autocompletion algorithms are opaque since they are based on the proprietary, previous searches known only by the search engine. If run by a search engine that has a big market share, the query completion algorithm also scales to a large number of users. Query autocompletions of a search engine with a majority market share in a country might substantially alter the opinion of the country’s citizens, for instance, a substantial number of people will start to doubt whether the holocaust really happened.

Fixing morally unacceptable results using content-based autocompletions

It is instructive to view a query autocompletion algorithm as a recommender system, that is, the search engine recommends queries based on some input. Recommender systems are usually classified into two categories based on how recommendations are made: 1) Collaborative recommendations, and 2) Content-based recommendations (Adomavicius and Tuzhilin 2005). Collaborative query autocompletions are based on similarities between users: “People that typed this prefix often searched for: …”. Content-based query autocompletions are based on similarities with the content: “Web pages that contain this prefix are often about: …”.

Until now, we only discussed collaborative query autocompletion algorithms. What would a content-based query autocompletion algorithm look like? Bhatia et al. (2011) proposed a system that generates autocompletions by using all N-grams of order 1, 2 and 3 (that is single words, word pairs, and word triples) from the documents. They tested their content-based autocompletions on newspaper data and on data from ubuntuforums.org. Instead of N-gram models from all text, Kraft and Zien (2004) built models for query reformulation solely from the anchor text, the clickable text from hyperlinks in web pages. Interestingly, Dang and Croft (2010) argue that anchor text can be an effective substitute for query logs. They studied the use of anchor texts for a range of query reformulation techniques, including query-based stemming and query reformulation, treating the anchor text as if it were a query log.

At Searsia, we develop open source tools that allow search engine developers to easily develop content-based autocompletions, so autocompletions without the need to track users. A first demo of content-based autocompletions using anchor texts now runs on: U. Twente Search. In the coming weeks, we will evaluate our content-based recommendations for U. Twente search, and we will investigate whether content-based autocompletions can replace collaborative autocompletions on web scale. The first results look promising: We believe content-based recommendations can provide high quality query autocompletions, and fix some of the harmful effects of today’s query autocompletion algorithms.

References

M. Jakobsson (1986) Autocompletion in full text transaction entry: a method for humanized input. In: Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems.
H. Bast and I. Weber (2006) Type less, find more: fast autocompletion search with a succinct index. In: Proceedings of the ACM SIGIR Conference on Research and Advanced Technology for Information Retrieval.
Z. Bar-Yossef and N. Kraus (2011) Context-sensitive query auto-completion. In: Proceedings of the World Wide Web Conference (WWW), 107–116
M. Shokouhi and K. Radinsky (2012) Time-sensitive query auto-completion. In: Proceedings of the ACM SIGIR Conference on Research and Advanced Technology for Information Retrieval, pages 601–610
S. Whiting and J.M. Jose (2014) Recent and robust query auto-completion. In: Proceedings of the World Wide Web Conference (WWW), 971–982
M. McGee (2010) Google Loses French Lawsuit Over Google Suggest. Search Engine Land 6 January 2010.
D. Meyer (2011) Google loses autocomplete defamation case in Italy. ZDNet 5 April 2011.
F. Lardinois (2012) Germany’s Former First Lady Sues Google For Defamation Over Autocomplete Suggestions, TechCrunch, 7 September 2012.
BBC (2012) Google ordered to change autocomplete function in Japan. BBC News 26 March 2012.
T. Yehoshua (2016) Google Search Autocomplete. Google Blog 10 June 2016.
H. Roberts (2016) How Google’s ‘autocomplete’ search results spread fake news around the web. Business Insider 5 December 2016.
C. Cadwalladr (2016) Google is not ‘just’ a platform. It frames, shapes and distorts how we see the world. The Guardian 11 December 2016.
P. Baker and A. Potts (2013). Why do White people have thin lips? Google and the perpetuation of stereotypes via auto-complete search forms. Critical Discourse Studies 10(2), 187–204.
S. Shorey and P.N. Howard (2016) Automation, Big Data, and Politics: A Research Review. International Journal of Communication 10(2016), 5032–5055
C. O’Neil (2016) Weapons of math destruction: how big data increases inequality and threatens democracy. Crown.
F. Cai, S. Liang, and M. de Rijke (2016) Prefix-adaptive and time-sensitive personalized query auto completion. IEEE Transaction on Knowledge and Data Engineering
B. Mitra and N. Craswell (2015) Query Auto-Completion for Rare Prefixes. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM), 1755-1758
G. Di Santo, R. McCreadie, C. Macdonald, and I Ounis (2015) Comparing Approaches for Query Autocompletion. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 775-778
S. Bhatia, D. Majumdar, and P. Mitra (2011) Query suggestions in the absence of query logs. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 795-804
V. Dang, and W.B. Croft (2010) Query reformulation using anchor text. In: Proceedings of the third ACM international conference on Web search and data mining
R. Kraft and J. Zien (2004) Mining anchor text for query refinement. In: Proceedings of the 13th International World Wide Web Conference, 666–674.
G. Adomavicius and A. Tuzhilin (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, IEEE Transaction on Knowledge and Data Engineering 17(6), 734-749
A. Zhang, A. Goyal, W. Kong, H. Deng, A. Dong, Y. Chang, C.A. Gunter, and J. Han (2015) Adaptive query auto-completion via implicit negative feedback. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 143–152
G. Pass, A. Chowdhury and C. Torgeson (2006) A Picture of Search. In: _Proceedings of the 1st international conference on Scalable information systems (InfoScale), 1-7

Query autocompletions considered harmful

by Djoerd Hiemstra

Query autocompletions considered harmful

by Djoerd Hiemstra

Offensive queries

A destructive feedback loop

Fixing morally unacceptable results using content-based autocompletions

References