Presentation Can search be reimagined outside of Google? Critical ethnography of the self as method. - personalised results - anonymous results looking at search via a google vs tor opposition. researching it via a hacker approved debian vs uni provided osx. started gatheirng data, got answer from Google: automated queries after 80th pages set up list of keywords mechanical turking, because manually annotating. for who else? each word has a different structure Tor uses Google as default, gets ads shift now towards semantical search taking context into account & ranking argument is that by searching you are being assigned to groups of users. its not the personalized atomized individual search. Questions from Renee: I am not searching for what is ‘relevant’ or ‘optimised’ Are we actually getting personalised results, or is it because we are assigned to groups, are we really the same as others? Are we adapting to the algorithms as they adapt to us? I guess both, but maybe your research could be more about the different levels of adaptation? - tiers of personalisation Help! Will we be stuck reading the SEO blog to find out WTF is going on? (And not being able to cite this in my PhD: it comes from the internet, it’s also commercial (biased) where academic sources are supposedly neutral?) Can someone (willingly) divulge the recipes of search algorithms? Machine learning, deep mind, etc. Where is it going? Why do machines have to copy humans? why do we copy neural networks, etc. Questions and comments: The Google algorithm is constantly changing. How can you research a subject that is blackboxed and non static. *How many unknown authorative agents can (reverse engineered) research have? timeline of google search algorithms 'updates' https://moz.com/google-algorithm-change Q on the processual chain of agents in research: give it to your graphic designer when data has already be treated/analyzed?? (how to actually "graphically understand" this data?) Also the idea of Tor / Debian combo as a 'neutral', extremely fingerprintable, will produce very specific results. What does it mean to contrast these two set-ups with each other, they somehow make cartoon-figures of each other? More interesting maybe to think of these set-ups as both personalized, but in different levels/dimensions (also, to re-appropriate the practice of "personalisation")? the notion of "collaboration" treated does not really open up for non-human collaboration and thereby it's hard not to think of the relevance of 'general intellect' and how google's project is to capture this in more effective ways (cf. Fragment on Machines). Contagious agency. nice. Brian: keywords you chose "pre-bubble filtering" and which words do diverge most? Rnee: Google trends is boring. I chose words that have trended recently for a while so they built a certain base of results. still comparing to see if my mistake/ads bias for differences I get in Google/Tor isn't this something to do with auto-ethnography (and your background as an artist) inasmuch as these are your terms that you would search for - again probably something that your supervisors might have a problem with?. Søren addresses the legitimacy of sources in research communities. tension with method (and knowledge regimes) more generally search and research that you acknowledge of course in your title. you endlessly use your work to reflect back on your institutional situation as someone transitioning from one system to another (artist becoming academic). Roel: can't discount (DIY) SEO blogs SEO blogs are only way to get to know what is going on - but not SEO blogs by Google necessarily Role of SEO in pushing for certain results, and counter-filter SEO and the changes in the search algorithm should be seen as mutually producive, while being in theory antagonistic. Search algorithm optimizations are not for better searching, but to countering the 'skewing' that might happen because of seo. Also to keep the myth of the neutral search results. There is two competing types of efficiency at work, the one of the individual search result placement and the perceived global functioning of search as neutral and objective. 'I can barely read SEO stuff' you're assigned a 'character' in a group, can be multiple identities of myself *Cadence Kinsey and Emily Rosamond work on this topic *Kinsey, C. (2014) ‘Matrices of Embodiment: Re-Thinking Binary and the Politics of Digital Representation’, in Signs: Journal of Womenin Culture and Society, Vol.39 (pp.897-925) *https://www.academia.edu/18541455/Technologies_of_attribution_characterizing_the_citizen-consumer_in_surveillance_performance An: use of big 5 personality test: obsolete everywhere else but still current for big-data-ish processes *http://www.truity.com/test/big-five-personality-test *https://en.wikipedia.org/wiki/Big_Five_personality_traits Haunted data Nicolas: related to last question David Lowe, Sift algorithm for computer vision https://en.wikipedia.org/wiki/Scale-invariant_feature_transform Stitching if you rotate, image would stay the same Why humans have to implement (?) their way of seeing (?) "the only proof we have that vision works". Neural networks date to the 1940s. How did they come back to the foreground? Tradition of mythologies: Roland Barthes! Computer semiotics "Setting in motion frozen labour" --- title: Renée Ridgway – From Page Rank to RankBrain slug: renee-ridgway-title id: 103 link: https://machineresearch.wordpress.com/2016/09/26/renee-ridgway-title/ guid: https://machineresearch.wordpress.com/2016/09/26/renee-ridgway-title/ status: publish terms: Uncategorized --- One might ponder, is searching only about finding things one knows to search for, because one knows about the existence of such things? Take for instance, the recent referendum on June 23, 2017, when the UK voted to exit the EU[1]. With the 52-48 margin results, one could argue for the voice of the people who expressed what they really wanted and were a well-informed public going into the polls. Like many users who frequently employ search engines for information regarding businesses, medical advice or their own rankings, people used Google Search to find answers to their questions. However, the search terms ‘What does it mean to leave the EU?’ and ‘What is the EU?’ occurred after the polls were closed. It then became apparent that people were wondering what they actually had just voted for, if they had voted. These queries were measured by Google Trends, a so-called ‘public web facility’ of Google, Inc.[2] which is based on Google Search results and reflects how often a keyword, or search term, is entered in the search box from around the world.[3] http://etherbox.local/var/www/txt/renee-ridgway-title/topquestionsontheeuropeanunion_googletrends.png In an era of ‘big data’ conclusions are often based on correlations but closer scrutiny at data for interpretation is desired. Included in the troller of big data are not only queries made by users but search results. “As these algorithms nestle into people’s daily lives and mundane information practices, users shape and rearticulate the algorithms they encounter; and algorithms impinge on how people seek information, how they perceive and think about the contours of knowledge, and how they understand themselves in and through public discourse”(Gillespie 183). This reciprocal relationship of human interaction with a machine was already mapped out by Introna and Nissenbaum in their seminal text: Shaping the Web: Why the politics of search engines matter. Written at the dawn of the development of ‘gateway platforms’ for the internet, one of their key statements concerns access, for “those with something to say and offer, as well as those wishing to hear and find”(Introna, L.D. & Nissenbaum 169-85). What has become clear is that corporations gather user data yet the filtering or ‘curation’ process is not transparent.[4] Whereas early net programmers and users with their ‘bulletin board’ postings, chat rooms or networks in the 1990s envisioned a ‘digital democracy’, in the early 2000s the political discourse was already censored as it emerged. Matthew Hindman’s book, The Myth of Digital Democracy (2009) elucidates how political information is filtered through ‘Googlearchy’[5], and that ‘deliberative democracy’ has been prohibited by internet technologies and infrastructure itself, such as “the social, economic, political and even cognitive processes that enable it” (Hindman 130). Corporations have now become complicit in the censoring, blocking the plurality of discourses as they collate users’ data. Once Silicon-Valley companies and their ‘liberal’ approach took a defensive posture to state interference, nowadays they willingly hand over users’ data to secret services entities of various nations, becoming actors of what is presently called ‘surveillance capitalism’ (Zuboff).[6] Platforms such as Google intervene (Gillespie) with its ‘Adwords’ service,[7] by serving up ads that influence the user’s experience and detouring their path to information. It is this type of ‘curation’ that I will elucidate in the following essay by looking specifically at the search algorithms responsible for such filtering of knowledge and their potential consequences. http://etherbox.local/var/www/txt/renee-ridgway-title/movetogibraltar_googletrends.png The rise of Page Rank The concept of Page Rank has its basis in the Scientific Citation Index (SCI), a form of academic hierarchy that has now been grafted as a conceptual paradigm for the way we find information and how that information is prioritised for us, designed by a monopoly, a corporation called Google a.k.a. Alphabet. It is not surprising then that the present CEO, Larry Page and President, Sergey Brin of Alphabet were two academics at Stanford who drew upon the SCI by recognizing that hyperlinked structures of citations show how an article is valued by other authors. The eponymous Page Rank algorithm was developed in 1998 and is basically a popularity contest based on votes. A link coming from a node with a high rank has more value than a link coming from a node with low rank. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. “When Google developed PageRank, factoring in incoming links to a page as evidence of its value, it built in a different logic: a page with many incoming links, from high-quality sites, is seen as ‘ratified’ by other users, and is more likely to be relevant to this user as well” (Gillespie 178). The more important or worthwhile websites are likely to receive more links from other websites and this cycle is repeated because popular sites are linked to other popular sites. “The hyperlink as a key natively digital object is considered to be the fabric of the web and in this role has the capacity to create relations, constitute networks and organize and rank content” (Helmond). These connective hyperlinks are used for navigating the web and the ‘algorithmization of the hyperlink’ turns a navigational object into an analytical device that ‘automatically submits and retrieves data’(Helmond).[8] Secret recipes Presently, ‘keyword search’ is still the way Google Search organises the internet by crawling and indexing[9], which determines the importance of a website based on the words it contains, how often other sites link to it, and dozens of other measures. “The process by which an index is established, and the attributes that are tracked, make up a large part of the ‘secret recipes’ of the various search engines” (Halavais 18). With Google Search the emphasis is to keep the attention of the user and to have them click on the higher rankings, effortlessly. However as Gillespie points out, the exact workings are opaque and vary for diverse users, “the criteria and code of algorithms are generally obscured—but not equally or from everyone” (Gillespie 185). Based on users’ histories, location and search terms, the searcher is ‘personalised’ through a set of criteria.[10] Not only are the creators of content of web pages kept in check by search engines, but the tracking of different factors, or signals, determine the ranking of an individual page. Mostly through reverse engineering, a whole ‘Search Engine Optimisation’ (SEO) industry has developed around ‘gaming’ the algorithm to figure out its recipe or signals. These “search engine optimizers have identified their own set of signals that seem to affect search engines directly” (Fishkin & Pollard 2007 qtd. by Halavais 83). http://etherbox.local/var/www/txt/renee-ridgway-title/periodic-table-of-seo-2015.png Signals During the past 18 years, Google has constantly tweaked their proprietary algorithm, containing around 200 ingredients or ‘signals’ in the recipe.[11] “Signals are typically factors that are tied to content, such as the words on a page, the links pointing at a page, whether a page is on a secure server and so on. They can also be tied to a user, such as where a searcher is located or their search and browsing history.”[12] Links, content, keyword density, words in bold, duplicate content, domain registration duration and outbound link quality are some other examples of factors, or ‘clues’. One of the major changes in 2010 to the core algorithm of Page Rank is the ‘Caffeine’ update, which enabled an improvement in the gathering of information or indexing, instead of just sorting. Described as a change to the indexing architecture, this new web ecosystem facilitates the searching of content immediately after it is crawled, providing a 50% fresher index. ‘Panda’ was an update that was implemented in 2011 that downranks sites, which are considered lower quality, enabling higher quality pages to rise. In April 2012 Google launched the ‘Penguin’ update that attempts to catch sites, which are ‘spamming’, e.g. buying or obtaining links through networks and boosting Google rankings. It now devalues spam instead of demoting (adjusting the rank) of the entire site and as of September 30, 2016, updates in real time as part of the core algorithm.[13] Analogous to the components of engine that has had it parts replaced, where Penguin and Panda might be the oil filter and gas pump respectively, the launch of ‘Hummingbird’ in August 2013 was Google’s largest overhaul since 2001. With the introduction of a brand new engine the emphasis has shifted to the contextual — it’s less now about the keyword and more about the intention behind it — the semantic capabilities are what are at stake. Whereas previously certain keywords were the focus, at the moment it’s about the other words in the sentence and their meaning. The complexity level of the queries has gone up, resulting in an improvement of indexing web documents. Within this field of ‘semantic search’ the ‘relationality linking search queries and web documents’[14] is reflected with the ‘Knowledge Graph’[15], along with ‘conversational search’ that incorporates voice activated enquiries. http://etherbox.local/var/www/txt/renee-ridgway-title/itsmerelypostmodernsemioticsappliedtosearch.png If Hummingbird is the new Google engine from 2013, the latest replacement part is then ‘RankBrain’. Launched around early 2015 it ostensibly ‘interprets’ what people are searching for, even though they may have not entered the exact keywords. ‘RankBrain’ is rumoured to be the third most important signal, after links and content (words) and infers the use of a keyword by applying synonyms or stemming lists[16]. User’s queries have also changed and are now not only keywords but also multi-words, phrases and sentences that could be deemed ‘long-tail’ queries. These need to be translated to a certain respect, from ‘ambiguous to specific’ or ‘uncommon to common,’ in order to be processed and analysed.[17] This reciprocal adaptability between the users and interface has been verified by previous research. Therefore it is probable that Google assigns these complex queries to groups with similar interests in order to ‘collaboratively filter’ them.[18] Bias “A number of commentators (e.g. Wiggins 2003) have become concerned with the potential for bias in Google’s secret ranking algorithm” (Halavais 77). Ironically the bias awareness started with the creators themselves. Upon reading Page and Brin’s seminal text (1998) one arrives at Appendix A: Advertising and Mixed Motives and discovers almost an afterthought about advertising and search engines. It’s incredibly revealing, because they state that search engines that are advertising-driven are “inherently biased towards the advertisers and away from the needs of the consumers” (Page and Brin). They cite The Media Monopoly by Ben Bagdikian[19], where the historical experience of the media shows that the concentration of ownership leads to imbalances. In turn, Alexander Halavais references both of these citations along with pointing out that their critique instead could have referenced the writings of Robert McChesney, who described how radio was not commercialised until the RCA(Radio Corporation of America) came along and the federal government changed regulation (Halavais 77). “McChesney suggested that in the 1990s the internet seemed to be following the same pattern, and although the nature of the technology might preclude its complete privatization, the dominance of profit-oriented enterprise could make the construction of an effective public sphere impossible” (Halavais 78). To return to radio, it only has a limited bandwidth of its broadcast spectrum. With the internet there is also limited bandwidth of the user and her ability to filter the overload of information. A certain power is assigned then to the commercial value of a search engine that delivers ‘relevant’ results, “or ‘better’ results than its provider’s competitors, which posits customer satisfaction over some notion of accuracy (van Couvering 2007 qtd. by Gillespie 182). http://etherbox.local/var/www/txt/renee-ridgway-title/googlegarage.png Machine learning “Algorithms are not always neutral. They’re built by humans, and used by humans, and our biases rub off on the technology. Code can discriminate.”[20] In this short essay I have attempted to debunk some of the mythology surrounding Google’s proprietary ‘Page Rank’ “— as the defining feature of the tool, as the central element that made Google stand out above its then competitors, and as a fundamentally democratic computational logic—even as the algorithm was being redesigned to take into account hundreds of other criteria” (Gillespie 180). I have briefly described some of the signals involved in how this algorithm ‘ranks’, based on hyperlinks and their algorithmatization that have become devices for the collation of data, which in turn is sold to third parties. “If broadcasters were providing not just content to audiences but also audiences to advertisers (Smythe 2001), digital providers are not just providing information to users but also users to their algorithms. And algorithms are made and remade in every instance of their use because every click, every query, changes the tool incrementally” (Gillespie 173). Online advertisements structure the workings, directing and ‘affecting’ the consumer, prosumer or user even if they do not click on them, as they are already personalised when using Google Search, notwithstanding if they are not signed into a Google account. As of June 2016 ‘RankBrain’ is being implemented for every Google Search query and the SEO industry speculates it’s summarising the page’s content. The murmur is that the algorithm is adapting, or ‘learning’ as it were from people’s mistakes and its surroundings. According to Google the algorithm learns offline, being fed historical batched searches from which it makes predictions. This cycle is constantly repeated and if the predictions are correct, the latest versions of ‘RankBrain’ go live.[21] Previously there were not computers powerful or fast enough, or the data sets were too small to carry out this type of testing. Nowadays the computation is distributed over many machines, enabling the pace of the research to quicken. This progress in technology facilitates a constellation or coming together of different capabilities from various sources, through models and parameters. Eventually the subject, or learner, in this case the algorithm, is able to predict, through repetition. Where is the human curator in all of this? “There is a case to be made that the working logics of these algorithms not only shape user practices, but also lead users to internalize their norms and priorities” (Gillespie 187). The question then is to what extent is there human adaption to algorithms in this filtering or curation process, how much do algorithms affect human learning and whether not only discrimination, but also agency, can be contagious.[22] Works Cited Feuz, Martin; Fuller, Matthew; Stalder, Felix. “Personal Web Searching in the age of Semantic Capitalism: Diagnosing the Mechanics of Personalisation”. First Monday, peer-reviewed journal on the internet. Volume 16, Number 2-7, February 2011. Web. http://firstmonday.org/article/view/3344/2766 Fishkin, R. and J. Pollard. April 2, 2007. “Search Engine Ranking Factors Version 2.” SEOMoz.org. Web. http://www.seomoz.org/article/search-ranking-factors. Gesenhues, Amy. “Google’s Hummingbird Takes Flight: SEOs Give Insight On Google’s New Algorithm”. Search Engine Land. 2013. Web. http://searchengineland.com/hummingbird-has-the-industry-flapping-its-wings-in-excitement-reactions-from-seo-experts-on-googles-new-algorithm-173030 Gillespie, Tarleton. “The Relevance of Algorithms”. Media Technologies, ed. Tarleton Gillespie, Pablo Boczkowski, and Kirsten Foot. Cambridge, MA: MIT Press, 2014, pp. 167-193. Print. Gillespie, Tarleton. “Platforms Intervene”. Social Media + Society, April-June 2015. pp 1–2. Sage Publishers. Print Halavais, Alexander. Search Engine Society. Cambridge: Polity, 2008. Book. Print. Helmond, Anne. “The Algorithmization of the Hyperlink.” Computational Culture 3(3). 2013. Hindman, Matthew. The Myth of Digital Democracy. Princeton: Princeton University Press 2009. Print. Introna, Lucas D. and Nissenbaum, Helen. “Shaping the Web: Why the Politics of Search Engines Matters”. The Information Society, 2000, 16:169–185. Taylor & Francis. Print Page, Lawrence and Brin, Sergey. The Anatomy of a Large-Scale Hypertextual Web Search Engine (1999). Web. http://infolab.stanford.edu/~backrub/google.html Pariser, Eli. The Filter Bubble. New York: Penguin Books, 2012. Print. Selyukh, Alina. “After Brexit Vote, Britain Asks Google: 'What Is The EU?'” NPR. 2016. Web. http://www.npr.org/sections/alltechconsidered/2016/06/24/480949383/britains-google-searches-for-what-is-the-eu-spike-after-brexit-vote Sullivan, Danny. “Dear Bing, We Have 10,000 Ranking Signals To Your 1,000. Love, Google.” Search Engine Land. 2010. Web. http://searchengineland.com/bing-10000-ranking-signals-google-55473 Sullivan, Danny. “FAQ: All about the Google RankBrain algorithm.” Search Engine Land. 2016. Web. http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 Schwart, Barry. “Google Penguin doesn’t penalize for bad links – or does it?” Search Engine Land. 2016. Web. http://searchengineland.com/google-penguin-doesnt-penalize-bad-links-259981 Turk, Victoria. “When Algorithms are sexist”. Motherboard. 2015. Web. http://motherboard.vice.com/en_uk/read/when-algorithms-are-sexist Wikipedia: https://en.wikipedia.org/wiki/Google Wikipedia: https://en.wikipedia.org/wiki/Knowledge_Graph Wikipedia: https://en.wikipedia.org/wiki/Stemming Zuboff, Shoshana. “The Secrets of Surveillance Capitalism”. Frankfurter Allgemeine Zeitung. 2016. Web. http://www.faz.net/aktuell/feuilleton/debatten/the-digital-debate/shoshana-zuboff-secrets-of-surveillance-capitalism-14103616.html [1] https://www.google.com/trends/story/GB_cu_EoBj9FIBAAAj9M_en [2] Google is now the ‘leading subsidiary’ of the company Alphabet, Inc. as well as the ‘parent for Google’s internet interests'. https://en.wikipedia.org/wiki/Google [3] Interestingly enough, on June 23 at 23:54 GMT after polls had closed and predictions of the outcome surfaced in the media, Londoners searching for ‘move to Gibraltar’ spiked heavily (+680%). http://www.npr.org/sections/alltechconsidered/2016/06/24/480949383/britains-google-searches-for-what-is-the-eu-spike-after-brexit-vote [4] Eli Pariser has deemed this ‘The Filter Bubble’, which I address in more detail in my PhD. [5] Those most heavily linked ‘rule’, in other words. [6] http://www.faz.net/aktuell/feuilleton/debatten/the-digital-debate/shoshana-zuboff-secrets-of-surveillance-capitalism-14103616.html [7] A complete description of Adwords is beyond the scope of this essay. Adwords is an online advertising system that enables competition between bidders based on keywords, or search terms, cookies to display certain webpages and advertisers pay when users click on the ads. It is Google’s main source of revenue, which is why it is actually an advertising company not a search engine. [8] Ranking algorithms reduce social relations to a specific dimension of commercialisation, the placing of a reference, a hyperlink, which is modern capitalism’s current form of socialisation, networking and quite possibly the most sought after currency of the internet. [9] Since 2013, Google.com is the most visited website in the world, according to Alexa. “Google processes over 40,000 search queries every second which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.” In 1999, it took Google one month to crawl and build an index of about 50 million pages. In 2012, the same task was accomplished in less than one minute. 16% to 20% of queries that get asked every day have never been asked before. Every query has to travel on average 1,500 miles to a data centre and back to return the answer to the user. A single Google query uses 1,000 computers in 0.2 seconds to retrieve an answer. http://www.internetlivestats.com/google-search-statistics/. [10] No space here to elaborate but will explain ‘personalisation’ in Chapter 3 of my thesis, or see here: http://www.aprja.net/?p=2531 [11] Google usually describes that is has around 200 major ranking signals, yet there have been discussions of 1000 or even 10000 sub-signals. http://searchengineland.com/bing-10000-ranking-signals-google-55473 [12] http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 [13] “Some sites want to do this because they’ve purchased links, a violation of Google’s policies, and may suffer a penalty if they can’t get the links removed. Other sites may want to remove links gained from participating in bad link networks or for other reasons.” http://searchengineland.com/google-penguin-doesnt-penalize-bad-links-259981 [14] According to David Amerland, author of Google Semantic Search. http://searchengineland.com/hummingbird-has-the-industry-flapping-its-wings-in-excitement-reactions-from-seo-experts-on-googles-new-algorithm-173030 [15] Knowledge Graph was launched in 2012 and combines ‘semantic search’ information added to search results so that users do not query further. However this has lead to a decrease of page views on Wikipedia of different languages. https://en.wikipedia.org/wiki/Knowledge_Graph [16] In regard to information retrieval, ‘stemming’ is when words are reduced to their ‘stem’ or root form. “Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation”. https://en.wikipedia.org/wiki/Stemming [17] http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 [18] http://firstmonday.org/article/view/3344/2766 [19] Bagdikian published later updated and revised editions called The New Media Monopoly, which subsequently became part of the ‘Amazon Noir’ a.k.a. ‘Pirates of the Amazon’ art project: http://www.amazon-noir.com/index0000.html [20] Victoria Turk. http://motherboard.vice.com/en_uk/read/when-algorithms-are-sexist [21] http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 [22] During the writing of my PhD I use Google Search for my research and have allowed myself to be personalized on my Apple computer without installing plugins, etc. that would attempt to prevent it. ------------ One might embody search only about find matter matchless know to search one know about universe such assume recent referendum on june united kingdom vote to exit. Margin one argue voice people world health organization express in truth cherished cost a intelligent public plump poll. Wish many user world health organization frequently use search engine data see medical advice or own people use google search to find suffice to interrogate. Research footing act it mean to leave embody occur after poll be closed. Information technology then become apparent citizenry be wonder actually have just vote consume vote. Question be deliberate aside google a alleged web iraqi national congress. Be establish on google research resultant role reflect much a or search embody enter indiana search box round world. Associate in nursing era conclusion be often free-base on correlation coefficient merely closer scrutiny astatine data interpretation exist coveted. Admit in angler big datum cost not only question make by drug user merely search leave. Algorithm nestle daily life mundane information exploiter shape algorithm algorithm impinge on people seek perceive think about shape understand indium through populace. Reciprocal kinship homo interaction a machine was already map out by in seminal formation why politics search engine matter. Write at dawn development one key affirmation concern to say as well as wish to hear l. Have become clear constitute corporation gather drug user data yet percolate or serve be not guileless. Early on final programmer drug user chat room or network in nineties envisioned a in early political discourse was already ban as it emerge. Matthew myth digital majority rule clarify political information embody trickle through have equal forbidden by internet engineering infrastructure such as political even cognitive work enable. Pot have now become in forget plurality discourse as collate data. Once party approach take a defensive model to submit present willingly hand over datum to unavowed service entity respective becoming actor be soon bid. Platform such as google intervene information technology aside serve up ad influence experience detour path to data. Information technology be type I will clear inch follow try aside look specifically at search algorithm responsible such filter cognition potential consequence. Rebel page concept page rank get information technology footing in scientific citation index a form academic hierarchy have nowadays be graft as a conceptual substitution class direction receive information information constitute prioritize design by a a corporation call google a. Potassium. Alphabet. Information technology constitute not surprise then present page rudiment exist deuce academician astatine stanford university world health organization reap aside recognize structure citation appearance associate in nursing article embody valued aside other writer. Eponymous page rank algorithm was develop in be basically a popularity contest base on vote. Angstrom connect come a node a eminent rank have more value a liaison issue forth a node low absolute. Scheme consequently assign deuce mark each information technology estimate value content information technology hub estimate value information technology link to early page. Google developed factor in incoming link to a page as evidence information technology it construct in a different a page many entrance cost see as aside other be more likely to be relevant to exploiter as. More important or worthwhile web site be probably to meet more connect other web site cycle be reprise popular web site embody yoke to other popular seat. Hyperlink as a samara digital object equal view to be fabric world wide web indiana character have capacity to create form network unionize rank. Connective hyperlink be used navigate world wide web turn a navigational object associate in nursing analytic device put in retrieve. Be even way google research organize internet by crawl specify importance a web site based on password it much other baby-sit yoke to twelve other measure. Work aside associate in nursing index be impute cost make up a large separate assorted search. Google search vehemence be to keep attention exploiter to have snap on high effortlessly. However as gillespie sharpen exact working be opaque deviate divers standard code algorithm be by and large not evenly or. Based on placement search searcher exist through a dress standard. Not alone exist creator message network page keep in arrest aside search merely traverse different or determine ranking associate in nursing individual page. By and large through reversion a whole engine industry have build up about algorithm to design out information technology recipe or signal. Engine have identify own rig sign seem to involve search engine poll. Aside. Past eighteen google hold constantly fine-tune proprietorship contain around two hundred component or in recipe. Exist typically factor be tie to such as word on a associate target astatine a a page be on a plug waiter indeed on. Toilet besides be tie to a such as a searcher be locate or search browse history. Quarrel indium duplicate sphere adjustment duration outbound link timbre be some other exemplar or. One major change in to effect algorithm page membership equal enable associate in nursing improvement in gather data or alternatively just classification. Describe as a change to index new web ecosystem help searching contented immediately subsequently it be put up a bracing index. Was associate in nursing update was implement in be consider lower berth enable eminent quality page to rise. In april google establish update try to overtake be vitamin e. Buying or obtain link through network hike google rate. Information technology immediately devalue spam rather demote entire site as september update indium real meter as part core algorithm. To component locomotive have get it contribution penguin lesser panda might be vegetable oil percolate gas pump plunge inch august was big overhaul. Introduction a stigmatize new locomotive emphasis have shift to contextual less immediately about more about intention behind it semantic capability be be at stake. Previously certain be at moment about other news in prison term intend. Complexity tied question take belong leave in associate in nursing improvement indexing world wide web text file. Inside field connect search question network be reflect along incorporate voice activate inquiry. Hummingbird be new google engine late substitution part exist then. Launch around early it apparently people be search even though may give birth not enter claim. Be rumor to be third gear most crucial subsequently yoke contentedness understand use a by give synonym or stem. Question own besides switch exist now not lone merely besides idiom prison term be deem question. Indigence to be translate to a certain to or to indiana order to be refined analyze. Multiplicative inverse adaptability between exploiter interface experience be affirm by previous research. Therefore it be probable google assign building complex question to group like matter to indium order to. Number observer. Guanine. Hold become concern potential bias indium secret rank. Bias awareness originate creator. Interpretation page germinal text matchless arrive at appendix advertising mix motif discover about associate in nursing afterthought about advertising search engine. Fabulously express search engine equal be bias advertiser away need. Summon medium monopoly aside ben historical feel medium show concentration ownership lead to asymmetry. In alexander reference both quotation along point out criticism rather get reference write robert world health organization described radio was not commercialize pot come along federal government exchange regulation. Propose in nineties internet seem to be follow same nature technology might prevent information technology complete dominance enterprise stool construction associate in nursing effective public sphere. Tax return to it only take a limited bandwidth information technology air spectrum. Internet there be besides limit bandwidth drug user ability to filter clog information. A certain power be assign then to commercial value a search engine deliver result information technology postulate customer atonement all over some notion accuracy. By gillespie. Be not constantly neutral. Construct by used by bias hang-up off on engineering. Code can discriminate. Light try I experience undertake to debunk some mythology wall proprietorship as specify feature as central element make google stand out above information technology then as a basically democratic computational as algorithm was exist redesign to take bill hundred other. I hold concisely trace some signal involve indium algorithm based on hyperlink have become device bite in turn equal sell to third party. Broadcaster be supply not just contented to hearing merely besides hearing to advertiser digital supplier equal not barely provide information to user merely besides user to algorithm. Algorithm embody make remake inch every case use every every change joyride. On-line ad structure direct or user tied suffice not click on as embody already personalize use google however be not gestural a google account. June be be implement every google search question industry chew over sum up contentedness. Grumble be algorithm be or as it be err information technology smother. According to google algorithm learn constitute feed historical batch search it reach prediction. Motorbike be constantly repeat prediction be latest interpretation go live. Previously there be not computer brawny or flying or data set be besides minor to carry out type test. Present calculation be circulate complete many enable pace research to whet. Advance in technology help a configuration or come together different capability versatile through model argument. Finally or in case equal able to through repetition. Be human curator in all embody a event to be make work logic algorithm not lone shape drug user merely besides lead user to internalize average. Wonder then equal to extent be there homo adaptation to algorithm inch filter or much act algorithm feign human learning not alone merely besides toilet be contagious. From Page Rank to RankBrain The rise of Page Rank The concept of Page Rank has its basis in the Scientific Citation Index (SCI), a form of academic hierarchy that has now been grafted as a conceptual paradigm for the way we find information and how that information is prioritised for us. The eponymous Page Rank algorithm was developed in 1998 and is basically a popularity contest based on votes. A link coming from a node with a high rank has more value than a link coming from a node with low rank. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Secret recipes Presently, ‘keyword search’ is still the way Google Search organises the internet by crawling and indexing1, which determines the importance of a website based on the words it contains, how often other sites link to it, and dozens of other measures. With Google Search the emphasis is to keep the attention of the user and to have them click on the higher rankings, effortlessly. However as Gillespie points out, the exact workings are opaque and vary for diverse users, “the criteria and code of algorithms are generally obscured—but not equally or from everyone” (Gillespie 185). Based on users’ histories, location and search terms, the searcher is ‘personalised’ through a set of criteria.2 Not only are the creators of content of web pages kept in check by search engines, but the tracking of different factors, or signals, determine the ranking of an individual page. Mostly through reverse engineering, a whole ‘Search Engine Optimisation’ (SEO) industry has developed around ‘gaming’ the algorithm to figure out its recipe or signals. Signals During the past 18 years, Google has constantly tweaked their proprietary algorithm, containing around 200 ingredients or ‘signals’ in the recipe.3 “Signals are typically factors that are tied to content, such as the words on a page, the links pointing at a page, whether a page is on a secure server and so on. They can also be tied to a user, such as where a searcher is located or their search and browsing history.”4 Links, content, keyword density, words in bold, duplicate content, domain registration duration and outbound link quality are some other examples of factors, or ‘clues’. One of the major changes in 2010 to the core algorithm of Page Rank is the ‘Caffeine’ update, which enabled an improvement in the gathering of information or indexing, instead of just sorting. ‘Panda’ was an update that was implemented in 2011 that downranks sites, which are considered lower quality, enabling higher quality pages to rise. In April 2012 Google launched the ‘Penguin’ update that attempts to catch sites, that now devalues spam instead of demoting (adjusting the rank) of the entire site and as of September 30, 2016, updates in real time as part of the core algorithm.5 Analogous to the components of engine that has had it parts replaced, where Penguin and Panda might be the oil filter and gas pump respectively, the launch of ‘Hummingbird’ in August 2013 was Google’s largest overhaul since 2001. With the introduction of a brand new engine the emphasis has shifted to the contextual — it’s less now about the keyword and more about the intention behind it — the semantic capabilities are what are at stake. Whereas previously certain keywords were the focus, at the moment it’s about the other words in the sentence and their meaning. Within this field of ‘semantic search’ the ‘relationality linking search queries and web documents’6 is reflected with the ‘Knowledge Graph’7, along with ‘conversational search’ that incorporates voice activated enquiries. If Hummingbird is the new Google engine from 2013, the latest replacement part is then ‘RankBrain’. Launched around early 2015 it ostensibly ‘interprets’ what people are searching for, even though they may have not entered the exact keywords. ‘RankBrain’ is rumoured to be the third most important signal, after links and content (words) and infers the use of a keyword by applying synonyms or stemming lists8. The complexity level of the queries has gone up, resulting in an improvement of indexing web documents. User’s queries have also changed and are now not only keywords but also multi-words, phrases and sentences that could be deemed ‘long-tail’ queries. These need to be translated to a certain respect, from ‘ambiguous to specific’ or ‘uncommon to common,’ in order to be processed and analysed.9 This reciprocal adaptability between the users and interface has been verified by previous research. Therefore it is probable that Google assigns these complex queries to groups with similar interests in order to ‘collaboratively filter’ them.10 Machine learning “Algorithms are not always neutral. They’re built by humans, and used by humans, and our biases rub off on the technology. Code can discriminate.”11 As of June 2016 ‘RankBrain’ is being implemented for every Google Search query and the SEO industry speculates it’s summarising the page’s content. The murmur is that the algorithm is adapting, or ‘learning’ as it were from people’s mistakes and its surroundings. According to Google the algorithm learns offline, being fed historical batched searches from which it makes predictions. And algorithms are made and remade in every instance of their use because every click, every query, changes the tool incrementally” (Gillespie 173). This cycle is constantly repeated and if the predictions are correct, the latest versions of ‘RankBrain’ go live.12 Previously there were not computers powerful or fast enough, or the data sets were too small to carry out this type of testing. Nowadays the computation is distributed over many machines, enabling the pace of the research to quicken. This progress in technology facilitates a constellation or coming together of different capabilities from various sources, through models and parameters. Eventually the subject, or learner, in this case the algorithm, is able to predict, through repetition. Where is the human curator in all of this? “There is a case to be made that the working logics of these algorithms not only shape user practices, but also lead users to internalize their norms and priorities” (Gillespie 187). The question then is to what extent is there human adaption to algorithms in this filtering or curation process, how much do algorithms affect human learning and whether not only discrimination but also agency can be contagious.13 Works Cited Feuz, Martin; Fuller, Matthew; Stalder, Felix. “Personal Web Searching in the age of Semantic Capitalism: Diagnosing the Mechanics of Personalisation”. First Monday, peer-reviewed journal on the internet. Volume 16, Number 2-7, February 2011. Web. http://firstmonday.org/article/view/3344/2766 Fishkin, R. and J. Pollard. April 2, 2007. “Search Engine Ranking Factors Version 2.” SEOMoz.org. Web. http://www.seomoz.org/article/search-ranking-factors. Gesenhues, Amy. “Google’s Hummingbird Takes Flight: SEOs Give Insight On Google’s New Algorithm”. Search Engine Land. 2013. Web. http://searchengineland.com/hummingbird-has-the-industry-flapping-its-wings-in-excitement-reactions-from-seo-experts-on-googles-new-algorithm-173030 Gillespie, Tarleton. “The Relevance of Algorithms”. Media Technologies, ed. Tarleton Gillespie, Pablo Boczkowski, and Kirsten Foot. Cambridge, MA: MIT Press, 2014, pp. 167-193. Print. Gillespie, Tarleton. “Platforms Intervene”. Social Media + Society, April-June 2015. pp 1–2. Sage Publishers. Print Halavais, Alexander. Search Engine Society. Cambridge: Polity, 2008. Book. Print. Helmond, Anne. “The Algorithmization of the Hyperlink.” Computational Culture 3(3). 2013. Hindman, Matthew. The Myth of Digital Democracy. Princeton: Princeton University Press 2009. Print. Introna, Lucas D. and Nissenbaum, Helen. “Shaping the Web: Why the Politics of Search Engines Matters”. The Information Society, 2000, 16:169–185. Taylor & Francis. Print Page, Lawrence and Brin, Sergey. The Anatomy of a Large-Scale Hypertextual Web Search Engine (1999). Web. http://infolab.stanford.edu/~backrub/google.html Pariser, Eli. The Filter Bubble. New York: Penguin Books, 2012. Print. Selyukh, Alina. “After Brexit Vote, Britain Asks Google: 'What Is The EU?'” NPR. 2016. Web. http://www.npr.org/sections/alltechconsidered/2016/06/24/480949383/britains-google-searches-for-what-is-the-eu-spike-after-brexit-vote Sullivan, Danny. “Dear Bing, We Have 10,000 Ranking Signals To Your 1,000. Love, Google.” Search Engine Land. 2010. Web. http://searchengineland.com/bing-10000-ranking-signals-google-55473 Sullivan, Danny. “FAQ: All about the Google RankBrain algorithm.” Search Engine Land. 2016. Web. http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 Schwart, Barry. “Google Penguin doesn’t penalize for bad links – or does it?” Search Engine Land. 2016. Web. http://searchengineland.com/google-penguin-doesnt-penalize-bad-links-259981 Turk, Victoria. “When Algorithms are sexist”. Motherboard. 2015. Web. http://motherboard.vice.com/en_uk/read/when-algorithms-are-sexist Wikipedia: https://en.wikipedia.org/wiki/Google Wikipedia: https://en.wikipedia.org/wiki/Knowledge_Graph Wikipedia: https://en.wikipedia.org/wiki/Stemming Zuboff, Shoshana. “The Secrets of Surveillance Capitalism”. Frankfurter Allgemeine Zeitung. 2016. Web. http://www.faz.net/aktuell/feuilleton/debatten/the-digital-debate/shoshana-zuboff-secrets-of-surveillance-capitalism-14103616.html 1 Since 2013, Google.com is the most visited website in the world, according to Alexa. “Google processes over 40,000 search queries every second which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.” In 1999, it took Google one month to crawl and build an index of about 50 million pages. In 2012, the same task was accomplished in less than one minute. 16% to 20% of queries that get asked every day have never been asked before. Every query has to travel on average 1,500 miles to a data centre and back to return the answer to the user. A single Google query uses 1,000 computers in 0.2 seconds to retrieve an answer. http://www.internetlivestats.com/google-search-statistics/. 2 No space here to elaborate but will explain ‘personalisation’ in Chapter 3 of my thesis, or see here: http://www.aprja.net/?p=2531 3 Google usually describes that is has around 200 major ranking signals, yet there have been discussions of 1000 or even 10000 sub-signals. http://searchengineland.com/bing-10000-ranking-signals-google-55473 4 http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 5 “Some sites want to do this because they’ve purchased links, a violation of Google’s policies, and may suffer a penalty if they can’t get the links removed. Other sites may want to remove links gained from participating in bad link networks or for other reasons.” http://searchengineland.com/google-penguin-doesnt-penalize-bad-links-259981 6 According to David Amerland, author of Google Semantic Search. http://searchengineland.com/hummingbird-has-the-industry-flapping-its-wings-in-excitement-reactions-from-seo-experts-on-googles-new-algorithm-173030 7 Knowledge Graph was launched in 2012 and combines ‘semantic search’ information added to search results so that users do not query further. However this has lead to a decrease of page views on Wikipedia of different languages. https://en.wikipedia.org/wiki/Knowledge_Graph 8 In regard to information retrieval, ‘stemming’ is when words are reduced to their ‘stem’ or root form. “Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation”. https://en.wikipedia.org/wiki/Stemming 9 http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 10 http://firstmonday.org/article/view/3344/2766 11 Victoria Turk. http://motherboard.vice.com/en_uk/read/when-algorithms-are-sexist 12 http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 13 During the writing of my PhD I use Google Search for my research and have allowed myself to be personalized on my Apple computer without installing plugins, etc. that would attempt to prevent it. ------- Page rank to heighten page concept page social station receive information technology basis in scientific citation index a form academic hierarchy hour angle now be transplant as a conceptual paradigm way detect information information be prioritize u. Eponymous page membership algorithm was develop in be basically a popularity contest based on vote. Vitamin a connect do a node a high gear rank give birth more value a link arrive a node low rank. Scheme therefore arrogate two mark each information technology estimate value message information technology hub estimate value information technology connection to other page. Five hundred. Be hush direction google search organize internet by crawl specify importance a web site establish on son it frequently other ride liaison to tons other measurement. Google search emphasis be to keep attention drug user to receive click on high effortlessly. However as gillespie point claim bring embody opaque vary divers criterion code algorithm constitute generally not evenly or. Based on location search seeker equal through a set criterion. Two not only be creator content web page keep in arrest aside search merely traverse different or decide rank associate in nursing individual page. Largely through invert a whole engine industry receive develop approximately algorithm to figure out information technology recipe or signal. Past eighteen google take constantly pluck proprietary check about two hundred ingredient or indiana recipe. Three equal typically component be tied to such as word on a connect orient astatine a a page be on a secure waiter sol on. Can besides be bind to a such as a searcher equal locate or search shop history. Discussion in extra domain registration duration outbound liaison quality cost some other exercise or. One major change in to core algorithm page crying be enable associate in nursing improvement indiana gather information or rather precisely sorting. Was associate in nursing update was implement in be think low enabling high choice foliate to heighten. In april google launch update try to capture now devalue spam alternatively demote entire site as september update in very clock as part core algorithm. Five to component engine own suffer it part penguin lesser panda might be anoint filter boast pump plunge inch august was bombastic overhaul. Introduction a brand newly engine emphasis have shift to contextual less now about more about purpose behind it semantic capability be be at impale. Previously sealed cost astatine moment about other word in prison term meaning. Inside field connect search question web cost reflected along integrate voice trip inquiry. Hummingbird be new google locomotive belated surrogate part be then. Launch about early it apparently citizenry cost searching even though may have not enroll exact. Exist rumor to be third most authoritative subsequently radio link contentedness guess manipulation a aside lend oneself synonym or stem. Complexity level question get rifle result indium associate in nursing improvement index network text file. Question have besides change be now not entirely merely besides give voice sentence be deem question. Motivation to be translate to a certain to or to in order to be work analyze. Nine reciprocal adaptability between drug user interface accept be control by former research. Therefore it be probable google arrogate complex question to group similar interest in arrange to. Ten learn be not always neutral. Build aside use aside bias hang-up off on engineering. Code can discriminate. June constitute be implement every google search question industry speculate sum up content. Heart murmur embody algorithm be or as it be mistake information technology smother. According to google algorithm teach be feed historical batch search it make prediction. Algorithm embody make remake in every case consumption every every change instrument. Hertz be constantly repeat prediction be late adaptation adam hot. Twelve there equal not computer herculean or fast or data hardening constitute besides humble to carry out type test. Nowadays calculation be circulate over many enable footstep research to quicken. Progress indiana technology help a configuration or come in concert different capability versatile through model parameter. Finally or in case exist able to through repetition. Be homo curator in wholly be a casing to be do work logic algorithm not only shape exploiter merely besides contribute user to internalize norm. Wonder then be to extent be there human adaptation to algorithm in trickle or much dress algorithm affect homo learning not merely discrimination merely besides agency toilet be catching.