Search engines suggest completions of partially typed queries to help users speed up their searches, for instance by showing the suggested completions in a drop down box. These query autocompletions enable the user to search faster, searching for long queries with relatively few key strokes. Jakobsson (1986) showed that for a library information system, users are able to identify items using as little as 4.3 characters on average. Query autocompletions are now widely used, either with a drop down box or as instant results (Bast and Weber 2006). While Jakobsson (1986) used the titles of documents as completions of user queries, web search engines today generally use large logs of queries submitted previously by their users. Using previous queries seems a common-sense choice: The best way to predict a query? Use previous queries! Several scientific studies use the AOL query log provided by Pass et al. (2006) to show that query autocompletion algorithms using query logs are effective (Bar-Yossef and Kraus 2011; Shokouhi and Radinsky 2012; Whiting and Jose 2014; Mitra and Craswell 2015; Cai et al. 2016). However, query autocompletion algorithms that are based on query logs are problematic in two important ways: 1) They return offensive and damaging results; 2) They suffer from a destructive feedback loop. Let’s discuss these two problems in the following sections.

Offensive queries

Query autocompletion based on actual user queries may return offensive results. In 2010, a French appeals court ordered Google to remove the word arnaque, which translates roughly as “scam”, from appearing in Google’s autocompletions for CNFDI, the Centre National Privé de Formation a Distance (McGee 2010). Google’s defense argued that its tool is based on an algorithm applied to actual search queries: It was users that searched for “cnfdi arnaque” that caused the algorithm to select offensive suggestions. The court however ruled that Google is responsible for the suggestions that it generates. Google lost a similar lawsuit in Italy, where queries for an unnamed plaintiff’s name were presented with autocomplete suggestions including truffatore (“con man”) and truffa (“fraud”) (Meyer 2011). In another similar law suite, German’s former first lady Bettina Wulff” sued Google for defamation when queries for her name completed with terms like “escort” and “prostitute” (Lardinois 2012). In yet another lawsuit in Japan, Google was ordered to disable autocomplete results for an unidentified man who could not find a job because a search for his name linked him with crimes he was not involved with (BBC 2012).

Google has since updated its autocompletion results by filtering offensive completions for person names, no matter who the person is (Yehoshua 2016). But controversies over query autocompletions remain. A study by Baker and Potts (2013) highlights that autocompletions produce suggested terms which can be viewed as racist, sexist or homophobic. Baker and Pots analyzed the results of over 2,500 query prefixes and concluded that completion algorithms inadvertently help to perpetuate negative stereotypes of certain identity groups.

There is increasing evidence that autocompletions play an important role in spreading fake news and propaganda. Query suggestions actively direct users to fake content on the web, even when they are not looking for it (Roberts 2016). Examples include bizarre completions for – again – person names, like “Michelle Obama is a man”, but also completions like “Did the holocaust happen”, which if selected, returns as its top result a link to the neo-Nazi site stormfront.org (Cadwalladr 2016).

The examples show that query autocompletions can be harmful when they are based on searches by previous users. Harmful completions are suggested when ordinary users seek to expose or confirm rumors and conspiracy theories. Furthermore, there are indications that harmful query suggestions increasingly result from computational propaganda, i.e., organizations use bots to game search engines and social networks (Shorey and Howard 2016).

A destructive feedback loop

Morally unacceptable query completions are not only introduced by the searches of previous users, they are also mutually reinforced by the search engine and its users. When a query autocompletion algorithm suggests morally unacceptable queries, users are likely to select those, even if the users are only confused or stunned by the suggestion. But how does the search engine ever learn it was wrong? It might not ever. As soon as the system determined that some queries are recommended; they are more often selected by users, which in turn makes the queries end up in the training data that the search engine uses to train it’s future query autocompletion algorithms. Such a destructive feedback loop is one of the features of a Weapon of Math Destruction, a term coined by O’Neil (2016) to describe harmful statistical models.

O’Neil sums up three elements of a Weapon of Math Destruction: Damage, Opacity, and Scale. Indeed, the damage caused by query autocompletion algorithms is extensively discussed in the previous section. Query autocompletion algorithms are opaque since they are based on the proprietary, previous searches known only by the search engine. If run by a search engine that has a big market share, the query completion algorithm also scales to a large number of users. Query autocompletions of a search engine with a majority market share in a country might substantially alter the opinion of the country’s citizens, for instance, a substantial number of people will start to doubt whether the holocaust really happened.

Fixing morally unacceptable results using content-based autocompletions

It is instructive to view a query autocompletion algorithm as a recommender system, that is, the search engine recommends queries based on some input. Recommender systems are usually classified into two categories based on how recommendations are made: 1) Collaborative recommendations, and 2) Content-based recommendations (Adomavicius and Tuzhilin 2005). Collaborative query autocompletions are based on similarities between users: “People that typed this prefix often searched for: …”. Content-based query autocompletions are based on similarities with the content: “Web pages that contain this prefix are often about: …”.

Until now, we only discussed collaborative query autocompletion algorithms. What would a content-based query autocompletion algorithm look like? Bhatia et al. (2011) proposed a system that generates autocompletions by using all N-grams of order 1, 2 and 3 (that is single words, word pairs, and word triples) from the documents. They tested their content-based autocompletions on newspaper data and on data from ubuntuforums.org. Instead of N-gram models from all text, Kraft and Zien (2004) built models for query reformulation solely from the anchor text, the clickable text from hyperlinks in web pages. Interestingly, Dang and Croft (2010) argue that anchor text can be an effective substitute for query logs. They studied the use of anchor texts for a range of query reformulation techniques, including query-based stemming and query reformulation, treating the anchor text as if it were a query log.

At Searsia, we develop open source tools that allow search engine developers to easily develop content-based autocompletions, so autocompletions without the need to track users. A first demo of content-based autocompletions using anchor texts now runs on: U. Twente Search. In the coming weeks, we will evaluate our content-based recommendations for U. Twente search, and we will investigate whether content-based autocompletions can replace collaborative autocompletions on web scale. The first results look promising: We believe content-based recommendations can provide high quality query autocompletions, and fix some of the harmful effects of today’s query autocompletion algorithms.

References