fbpx
Wikipedia

Data journalism

Data journalism or data-driven journalism (DDJ) is journalism based on the filtering and analysis of large data sets for the purpose of creating or elevating a news story.

Data journalism reflects the increased role of numerical data in the production and distribution of information in the digital era. It involves a blending of journalism with other fields such as data visualization, computer science, and statistics, "an overlapping set of competencies drawn from disparate fields".[1]

Data journalism has been widely used to unite several concepts and link them to journalism. Some see these as levels or stages leading from the simpler to the more complex uses of new technologies in the journalistic process.[2]

Many data-driven stories begin with newly available resources such as open source software, open access publishing and open data, while others are products of public records requests or leaked materials. This approach to journalism builds on older practices, most notably on computer-assisted reporting (CAR) a label used mainly in the US for decades. Other labels for partially similar approaches are "precision journalism", based on a book by Philipp Meyer,[3] published in 1972, where he advocated the use of techniques from social sciences in researching stories. Data-driven journalism has a wider approach. At the core the process builds on the growing availability of open data that is freely available online and analyzed with open source tools.[4] Data-driven journalism strives to reach new levels of service for the public, helping the general public or specific groups or individuals to understand patterns and make decisions based on the findings. As such, data driven journalism might help to put journalists into a role relevant for society in a new way.

Telling stories based on the data is the primary goal. The findings from data can be transformed into any form of journalistic writing. Visualizations can be used to create a clear understanding of a complex situation. Furthermore, elements of storytelling can be used to illustrate what the findings actually mean, from the perspective of someone who is affected by a development. This connection between data and story can be viewed as a "new arc" trying to span the gap between developments that are relevant, but poorly understood, to a story that is verifiable, trustworthy, relevant and easy to remember.

Definitions edit

 
The data-driven journalism process.

Veglis and Bratsas defined data journalism as "the process of extracting useful information from data, writing articles based on the information, and embedding visualizations (interacting in some cases) in the articles that help readers understand the significance of the story or allow them to pinpoint data that relate to them"[5]

Antonopoulos and Karyotakis define the practice of data journalism as "a way of enhancing reporting and news writing with the use and examination of statistics in order to provide a deeper insight into a news story and to highlight relevant data. One trend in the digital era of journalism has been to disseminate information to the public via interactive online content through data visualization tools such as tables, graphs, maps, infographics, microsites, and visual worlds. The in-depth examination of such data sets can lead to more concrete results and observations regarding timely topics of interest. In addition, data journalism may reveal hidden issues that seemingly were not a priority in the news coverage".[6]

According to architect and multimedia journalist Mirko Lorenz, data-driven journalism is primarily a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing and making a story.[7] This process can be extended to provide results that cater to individual interests and the broader public.

Data journalism trainer and writer Paul Bradshaw describes the process of data-driven journalism in a similar manner: data must be found, which may require specialized skills like MySQL or Python, then interrogated, for which understanding of jargon and statistics is necessary, and finally visualized and mashed with the aid of open-source tools.[8]

A more results-driven definition comes from data reporter and web strategist Henk van Ess (2012).[9] "Data-driven journalism enables reporters to tell untold stories, find new angles or complete stories via a workflow of finding, processing and presenting significant amounts of data (in any given form) with or without open tools." Van Ess claims that some of the data-driven workflow leads to products that "are not in orbit with the laws of good story telling" because the result emphases on showing the problem, not explaining the problem. "A good data driven production has different layers. It allows you to find personalized that are only important for you, by drilling down to relevant but also enables you to zoom out to get the big picture."

In 2013, Van Ess came with a shorter definition in[10] that doesn't involve visualisation per se:"Data journalism can be based on any data that has to be processed first with tools before a relevant story is possible. It doesn't include visualization per se."

However, one of the problems for defining data journalism is that many definitions are not clear enough and focus on describing the computational methods of optimization, analysis, and visualization of information.[11]

Emergence as a concept edit

The term "data journalism" was coined by political commentator Ben Wattenberg through his work starting in the mid-1960s layering narrative with statistics to support the theory that the United States had entered a golden age.[12][13]

One of the earliest examples of using computers with journalism dates back to a 1952 endeavor by CBS to use a mainframe computer to predict the outcome of the presidential election, but it wasn't until 1967 that using computers for data analysis began to be more widely adopted.[14]

Working for the Detroit Free Press at the time, Philip Meyer used a mainframe to improve reporting on the riots spreading throughout the city. With a new precedent set for data analysis in journalism, Meyer collaborated with Donald Barlett and James Steele to look at patterns with conviction sentencings in Philadelphia during the 1970s. Meyer later wrote a book titled Precision Journalism that advocated the use of these techniques for combining data analysis into journalism.

Toward the end of the 1980s, significant events began to occur that helped to formally organize the field of computer assisted reporting. Investigative reporter Bill Dedman of The Atlanta Journal-Constitution won a Pulitzer Prize in 1989 for The Color of Money, his 1988 series of stories using CAR techniques to analyze racial discrimination by banks and other mortgage lenders in middle-income black neighborhoods.[15] The National Institute for Computer Assisted Reporting (NICAR)[16] was formed at the Missouri School of Journalism in collaboration with the Investigative Reporters and Editors (IRE). The first conference dedicated to CAR was organized by NICAR in conjunction with James Brown at Indiana University and held in 1990. The NICAR conferences have been held annually since and is now the single largest gathering of data journalists.

Although data journalism has been used informally by practitioners of computer-assisted reporting for decades, the first recorded use by a major news organization is The Guardian, which launched its Datablog in March 2009.[17] And although the paternity of the term is disputed, it is widely used since Wikileaks' Afghan War documents leak in July, 2010.[18]

The Guardian's coverage of the war logs took advantage of free data visualization tools such as Google Fusion Tables, another common aspect of data journalism. Facts are Sacred[19] by The Guardian's Datablog editor Simon Rogers describes data journalism like this:

"Comment is free," wrote Guardian editor CP Scott in 1921, "but facts are sacred". Ninety years later, publishing those sacred facts has become a new type of journalism in itself: data journalism. And it is rapidly becoming part of the establishment.

Investigative data journalism combines the field of data journalism with investigative reporting. An example of investigative data journalism is the research of large amounts of textual or financial data. Investigative data journalism also can relate to the field of big data analytics for the processing of large data sets.[20]

Since the introduction of the concept a number of media companies have created "data teams" which develop visualizations for newsrooms. Most notable are teams e.g. at Reuters,[21] Pro Publica,[22] and La Nacion (Argentina).[23] In Europe, The Guardian[24] and Berliner Morgenpost[25] have very productive teams, as well as public broadcasters.

As projects like the MP expense scandal (2009) and the 2013 release of the "offshore leaks" demonstrate, data-driven journalism can assume an investigative role, dealing with "not-so open" aka secret data on occasion.

The annual Data Journalism Awards[26] recognize outstanding reporting in the field of data journalism, and numerous Pulitzer Prizes in recent years have been awarded to data-driven storytelling, including the 2018 Pulitzer Prize in International Reporting[27] and the 2017 Pulitzer Prize in Public Service[28]

Taxonomies edit

Many scholars have proposed different taxonomies of data journalism projects. Megan Knight suggested a taxonomy that is based on the level of interpretations and analysis that is needed in order to produce a data journalism project. Specifically the taxonomy included: number pullquote, static map, list and timelines, table, graphs and charts, dynamic map, textual analysis, and info graphics.[29]

Simon Rogers proposed five types of data journalism projects: By just the facts, Data-based news stories, Local data telling stories, Analysis and background, and Deep dive investigations.[30] Martha Kang discussed seven types of data stories, namely: Narrate change over time, Start big and drill down, Start small and zoom out, Highlight contrasts, Explore the intersection, Dissect the factors, and Profile the outliers.[31]

Veglis and Bratsas proposed another taxonomy that is based on the method of presenting the information to the audience. Their taxonomy had an hierarchical structure and included the following types: data journalism articles with just numbers, with tables, and with visualizations (interactive and non-interactive). Also in the case of stories with interactive visualizations they proposed 3 distinct types, namely transmitional, consultational, and conversational.[32]

Data quality edit

In many investigations the data that can be found might have omissions or is misleading. As one layer of data-driven journalism a critical examination of the data quality is important. In other cases the data might not be public or is not in the right format for further analysis, e.g. is only available in a PDF. Here the process of data-driven journalism can turn into stories about data quality or refusals to provide the data by institutions. As the practice as a whole is in early development steps, examinations of data sources, data sets, data quality and data format are therefore an equally important part of this work.

Data-driven journalism and the value of trust edit

Based on the perspective of looking deeper into facts and drivers of events, there is a suggested change in media strategies: In this view the idea is to move "from attention to trust". The creation of attention, which has been a pillar of media business models has lost its relevance because reports of new events are often faster distributed via new platforms such as Twitter than through traditional media channels. On the other hand, trust can be understood as a scarce resource. While distributing information is much easier and faster via the web, the abundance of offerings creates costs to verify and check the content of any story create an opportunity. The view to transform media companies into trusted data hubs has been described in an article cross-published in February 2011 on Owni.eu[33] and Nieman Lab.[34]

Process of data-driven journalism edit

The process to transform raw data into stories is akin to a refinement and transformation. The main goal is to extract information recipients can act upon. The task of a data journalist is to extract what is hidden. This approach can be applied to almost any context, such as finances, health, environment or other areas of public interest.

Inverted pyramid of data journalism edit

In 2011, Paul Bradshaw introduced a model he called "The Inverted Pyramid of Data Journalism".

Steps of the process edit

In order to achieve this, the process should be split up into several steps. While the steps leading to results can differ, a basic distinction can be made by looking at six phases:

  1. Find: Searching for data on the web
  2. Clean: Process to filter and transform data, preparation for visualization
  3. Visualize: Displaying the pattern, either as a static or animated visual
  4. Publish: Integrating the visuals, attaching data to stories
  5. Distribute: Enabling access on a variety of devices, such as the web, tablets and mobile
  6. Measure: Tracking usage of data stories over time and across the spectrum of uses.

Description of the steps edit

Finding data edit

Data can be obtained directly from governmental databases such as data.gov, data.gov.uk and World Bank Data API[35] but also by placing Freedom of Information requests to government agencies; some requests are made and aggregated on websites like the UK's What Do They Know. While there is a worldwide trend towards opening data, there are national differences as to what extent that information is freely available in usable formats. If the data is in a webpage, scrapers are used to generate a spreadsheet. Examples of scrapers are: WebScraper, Import.io, QuickCode, OutWit Hub and Needlebase (retired in 2012[36]). In other cases OCR software can be used to get data from PDFs.

Data can also be created by the public through crowd sourcing, as shown in March 2012 at the Datajournalism Conference in Hamburg by Henk van Ess.[37]

Cleaning data edit

Usually data is not in a format that is easy to visualize. Examples are that there are too many data points or that the rows and columns need to be sorted differently. Another issue is that once investigated many datasets need to be cleaned, structured and transformed. Various tools like OpenRefine (open source), Data Wrangler and Google Spreadsheets[38] allow uploading, extracting or formatting data.

Visualizing data edit

To visualize data in the form of graphs and charts, applications such as Many Eyes or Tableau Public are available. Yahoo! Pipes and Open Heat Map[39] are examples of tools that enable the creation of maps based on data spreadsheets. The number of options and platforms is expanding. Some new offerings provide options to search, display and embed data, an example being Timetric.[40]

To create meaningful and relevant visualizations, journalists use a growing number of tools. There are by now, several descriptions what to look for and how to do it. Most notable published articles are:

  • Joel Gunter: "#ijf11: Lessons in data journalism from the New York Times"[41]
  • Steve Myers: "Using Data Visualization as a Reporting Tool Can Reveal Story’s Shape", including a link to a tutorial by Sarah Cohen[42]

As of 2011, the use of HTML 5 libraries using the canvas tag is gaining in popularity. There are numerous libraries enabling to graph data in a growing variety of forms. One example is RGraph.[43] As of 2011 there is a growing list of JavaScript libraries allowing to visualize data.[44]

Publishing data story edit

There are different options to publish data and visualizations. A basic approach is to attach the data to single stories, similar to embedding web videos. More advanced concepts allow to create single dossiers, e.g. to display a number of visualizations, articles and links to the data on one page. Often such specials have to be coded individually, as many Content Management Systems are designed to display single posts based on the date of publication.

Distributing data edit

Providing access to existing data is another phase, which is gaining importance. Think of the sites as "marketplaces" (commercial or not), where datasets can be found easily by others. Especially of the insights for an article where gained from Open Data, journalists should provide a link to the data they used for others to investigate (potentially starting another cycle of interrogation, leading to new insights).

Providing access to data and enabling groups to discuss what information could be extracted is the main idea behind Buzzdata,[45] a site using the concepts of social media such as sharing and following to create a community for data investigations.

Other platforms (which can be used both to gather or to distribute data):

  • Help Me Investigate (created by Paul Bradshaw)[46]
  • Timetric[47]
  • ScraperWiki[48]

Measuring the impact of data stories edit

A final step of the process is to measure how often a dataset or visualization is viewed.

In the context of data-driven journalism, the extent of such tracking, such as collecting user data or any other information that could be used for marketing reasons or other uses beyond the control of the user, should be viewed as problematic.[according to whom?] One newer, non-intrusive option to measure usage is a lightweight tracker called PixelPing. The tracker is the result of a project by ProPublica and DocumentCloud.[49] There is a corresponding service to collect the data. The software is open source and can be downloaded via GitHub.[50]

Examples edit

There is a growing list of examples how data-driven journalism can be applied. The Guardian, one of the pioneering media companies in this space (see "Data journalism at the Guardian: what is it and how do we do it?"[51]), has compiled an extensive list of data stories, see: "All of our data journalism in one spreadsheet".[52]

Other prominent uses of data-driven journalism are related to the release by whistle-blower organization WikiLeaks of the Afghan War Diary, a compendium of 91,000 secret military reports covering the war in Afghanistan from 2004 to 2010.[53] Three global broadsheets, namely The Guardian, The New York Times and Der Spiegel, dedicated extensive sections[54][55][56] to the documents; The Guardian's reporting included an interactive map pointing out the type, location and casualties caused by 16,000 IED attacks,[57] The New York Times published a selection of reports that permits rolling over underlined text to reveal explanations of military terms,[58] while Der Spiegel provided hybrid visualizations (containing both graphs and maps) on topics like the number deaths related to insurgent bomb attacks.[59] For the Iraq War logs release, The Guardian used Google Fusion Tables to create an interactive map of every incident where someone died,[60] a technique it used again in the England riots of 2011.[61]

See also edit

References edit

  1. ^ Thibodeaux, Troy (6 October 2011), , archived from the original on 9 October 2011, retrieved 11 October 2011
  2. ^ Michelle Minkoff (24 March 2010). . Archived from the original on 10 March 2020. Retrieved 6 October 2011.
  3. ^ . festivaldelgiornalismo.com. Archived from the original on 4 March 2016. Retrieved 31 January 2019.
  4. ^ Lorenz, Mirko (2010) Data driven journalism: What is there to learn? Edited conference documentation, based on presentations of participants, 24 August 2010, Amsterdam, the Netherlands
  5. ^ Veglis, Andreas; Bratsas, Charalampos (1 June 2017). "Reporters in the age of data journalism". Journal of Applied Journalism & Media Studies. 6 (2): 225–244. doi:10.1386/ajms.6.2.225_1. ISSN 2001-0818.
  6. ^ Antonopoulos, Nikos; Karyotakis, Minos-Athanasios (2020). The SAGE International Encyclopedia of Mass Media and Society. Thousands Oaks, CA: SAGE Publications, Inc. p. 440. ISBN 9781483375533.
  7. ^ Lorenz, Mirko. (2010). Data driven journalism: What is there to learn?[permanent dead link] Presented at IJ-7 Innovation Journalism Conference, 7–9 June 2010, Stanford, CA
  8. ^ Bradshaw, Paul (1 October 2010). "How to be a data journalist"[permanent dead link]. The Guardian
  9. ^ van Ess, Henk. (2012). Gory of data driven journalism[permanent dead link]
  10. ^ van Ess, Henk and Van der Kaa, Hille (2012). Handboek Datajournalistiek 21 October 2013 at the Wayback Machine
  11. ^ Houston, Brant (2019). Data Journalism. In the International Encyclopedia of Journalism Studies. Wiley-Blackwell. pp. 1–9. doi:10.1002/9781118841570.iejs0119. ISBN 9781118841570. S2CID 243233501.
  12. ^ "Prophet of Hope". www.nationalaffairs.com. Retrieved 10 September 2021.
  13. ^ Langer, Emily (29 June 2015). "Ben J. Wattenberg, writer and television commentator, dies at 81". Washington Post. ISSN 0190-8286. Retrieved 10 September 2021.
  14. ^ Houston, Brant (2015). Computer-Assisted Reporting: A Practical Guide, Fourth Edition. New York City: Routledge. p. 9. ISBN 978-0-7656-4219-6.
  15. ^ "The Color of Money".
  16. ^ "About NICAR". National Institute for Computer Assisted Reporting. Investigative Reporters and Editors. Retrieved 9 February 2016.
  17. ^ Rogers, Simon (28 July 2011), "Data journalism at the Guardian: what is it and how do we do it?", The Guardian, London, retrieved 25 October 2012
  18. ^ Kayser-Bril, Nicolas (19 July 2011), Les données pour comprendre le monde (in French), retrieved 6 October 2011
  19. ^ Rogers, Simon (2013). Facts are Sacred: the power of data. Faber and Faber. ISBN 9780571301614. OCLC 815364561.
  20. ^ "Investigative Data Journalism in a Globalized World". Journalism research. 31 October 2019. Retrieved 18 January 2021.
  21. ^ "Special Reports from Reuters journalists around the world". Reuters. Retrieved 31 January 2019.[dead link]
  22. ^ "News Apps". ProPublica. Retrieved 31 January 2019.
  23. ^ "How the Argentinian daily La Nación became a data journalism powerhouse in Latin America". niemanlab.org. Retrieved 31 January 2019.
  24. ^ "Data". The Guardian. Retrieved 31 January 2019.
  25. ^ Berlin, Berliner Morgenpost-. "Portfolio Interaktiv-Team". morgenpost. Retrieved 31 January 2019.
  26. ^ . datajournalismawards.org. Archived from the original on 21 July 2018. Retrieved 31 January 2019.
  27. ^ "The Pulitzer Prizes". Pulitzer.org. Retrieved 31 January 2019.
  28. ^ "The Pulitzer Prizes". Pulitzer.org. Retrieved 31 January 2019.
  29. ^ Knight, Megan (2 January 2015). "Data journalism in the UK: a preliminary analysis of form and content". Journal of Media Practice. 16 (1): 55–72. doi:10.1080/14682753.2015.1015801. ISSN 1468-2753. S2CID 143863693.
  30. ^ "Video course: Doing Journalism with Data: First Steps, Skills and…". DataJournalism.com. Retrieved 30 December 2022.
  31. ^ Wilson, Jason (15 June 2015). "Exploring the 7 Different Types of Data Stories". MediaShift. Retrieved 30 December 2022.
  32. ^ Veglis, Andreas; Bratsas, Charalampos (10 September 2017). "Towards A Taxonomy of Data Journalism". Journal of Media Critiques. 3 (11): 109–121. doi:10.17349/jmc117309.
  33. ^ Media Companies Must Become Trusted Data Hubs » OWNI.eu, News, Augmented 2011-08-24 at the Wayback Machine. Owni.eu (28 February 2011). Retrieved on 2013-08-16.
  34. ^ Voices: News organizations must become hubs of trusted data in a market seeking (and valuing) trust » Nieman Journalism Lab. Niemanlab.org (9 August 2013). Retrieved on 2013-08-16.
  35. ^ "Developer Information – World Bank Data Help Desk". datahelpdesk.worldbank.org. Retrieved 31 January 2019.
  36. ^ "Renewing old resolutions for the new year". googleblog.blogspot.com. Retrieved 31 January 2019.
  37. ^ Crowdsourcing: how to find a crowd (Presented at ARD/ZDF Academy in. Slideshare.net (17 September 2010). Retrieved on 2013-08-16.
  38. ^ Hirst, Author Tony (14 October 2008). "Data Scraping Wikipedia with Google Spreadsheets". ouseful.info. Retrieved 31 January 2019. {{cite web}}: |first= has generic name (help)
  39. ^ "OpenHeatMap". openheatmap.com. Retrieved 31 January 2019.
  40. ^ . timetric.com. Archived from the original on 31 January 2019. Retrieved 31 January 2019.
  41. ^ Gunter, Joel (16 April 2011). "#ijf11: Lessons in data journalism from the New York Times". journalism.co.uk. Retrieved 31 January 2019.
  42. ^ "Using Data Visualization as a Reporting Tool Can Reveal Story's Shape". Poynter.org. Retrieved 31 January 2019.
  43. ^ "RGraph is a Free and Open Source JavaScript charts library for the web". rgraph.net. Retrieved 31 January 2019.
  44. ^ JavaScript libraries
  45. ^ . Archived from the original on 12 August 2011. Retrieved 17 August 2011.
  46. ^ "Help Me Investigate - A network helping people investigate questions in the public interest". helpmeinvestigate.com. Retrieved 31 January 2019.
  47. ^ . timetric.com. Archived from the original on 31 January 2019. Retrieved 31 January 2019.
  48. ^ "ScraperWiki". Retrieved 31 January 2019.
  49. ^ Larson, Jeff. (8 September 2010) Pixel Ping: A node.js Stats Tracker. ProPublica. Retrieved on 2013-08-16.
  50. ^ documentcloud/pixel-ping ¡ GitHub. Retrieved on 2013-08-16.
  51. ^ Rogers, Simon (28 July 2011). "Data journalism at the Guardian: what is it and how do we do it?". The Guardian. Retrieved 31 January 2019 – via theguardian.com.
  52. ^ Evans, Lisa (27 January 2011). "All of our data journalism in one spreadsheet". The Guardian. Retrieved 31 January 2019.
  53. ^ Kabul War Diary, 26 July 2010, WikiLeaks
  54. ^ Afghanistan The War Logs, 26 July 2010, The Guardian
  55. ^ The War Logs, 26 July 2010 The New York Times
  56. ^ The Afghanistan Protocol: Explosive Leaks Provide Image of War from Those Fighting It, 26 July 2010, Der Spiegel
  57. ^ Afghanistan war logs: IED attacks on civilians, coalition and Afghan troops, 26 July 2010, The Guardian
  58. ^ Text From a Selection of the Secret Dispatches, 26 July 2010, The New York Times
  59. ^ Deathly Toll: Death as a result of insurgent bomb attacks, 26 July 2010, Der Spiegel
  60. ^ Wikileaks Iraq war logs: every death mapped, 22 October 2010, Guardian Datablog
  61. ^ UK riots: every verified incident - interactive map, 11 August 2011, Guardian Datablog

Further reading edit

  • Hahn, Oliver; Stalph, Florian (2018). Digital investigative journalism : data, visual analytics and innovative methodologies in international reporting (1 ed.). Cham, Switzerland: Palgrave Macmillan. ISBN 9783319972824. OCLC 1050782792.

External links edit

  • National Institute for Computer-Assisted Reporting website
  • DataJournalism.com, learn Data Journalism by reading, watching and discussions
  • List of data journalism university courses and programmes from around the world
  • The Data Journalism Handbook: Towards A Critical Data Practice - open access handbook on data journalism around the world
  • awesome-data-journalism - "curated list of publicly available, free/open source and open access resources for learning and doing data journalism"

data, journalism, confused, with, database, journalism, data, driven, journalism, journalism, based, filtering, analysis, large, data, sets, purpose, creating, elevating, news, story, reflects, increased, role, numerical, data, production, distribution, inform. Not to be confused with Database journalism Data journalism or data driven journalism DDJ is journalism based on the filtering and analysis of large data sets for the purpose of creating or elevating a news story Data journalism reflects the increased role of numerical data in the production and distribution of information in the digital era It involves a blending of journalism with other fields such as data visualization computer science and statistics an overlapping set of competencies drawn from disparate fields 1 Data journalism has been widely used to unite several concepts and link them to journalism Some see these as levels or stages leading from the simpler to the more complex uses of new technologies in the journalistic process 2 Many data driven stories begin with newly available resources such as open source software open access publishing and open data while others are products of public records requests or leaked materials This approach to journalism builds on older practices most notably on computer assisted reporting CAR a label used mainly in the US for decades Other labels for partially similar approaches are precision journalism based on a book by Philipp Meyer 3 published in 1972 where he advocated the use of techniques from social sciences in researching stories Data driven journalism has a wider approach At the core the process builds on the growing availability of open data that is freely available online and analyzed with open source tools 4 Data driven journalism strives to reach new levels of service for the public helping the general public or specific groups or individuals to understand patterns and make decisions based on the findings As such data driven journalism might help to put journalists into a role relevant for society in a new way Telling stories based on the data is the primary goal The findings from data can be transformed into any form of journalistic writing Visualizations can be used to create a clear understanding of a complex situation Furthermore elements of storytelling can be used to illustrate what the findings actually mean from the perspective of someone who is affected by a development This connection between data and story can be viewed as a new arc trying to span the gap between developments that are relevant but poorly understood to a story that is verifiable trustworthy relevant and easy to remember Contents 1 Definitions 2 Emergence as a concept 3 Taxonomies 4 Data quality 4 1 Data driven journalism and the value of trust 5 Process of data driven journalism 5 1 Inverted pyramid of data journalism 5 2 Steps of the process 5 3 Description of the steps 5 3 1 Finding data 5 3 2 Cleaning data 5 3 3 Visualizing data 5 3 4 Publishing data story 5 3 5 Distributing data 5 3 6 Measuring the impact of data stories 6 Examples 7 See also 8 References 9 Further reading 10 External linksDefinitions edit nbsp The data driven journalism process Veglis and Bratsas defined data journalism as the process of extracting useful information from data writing articles based on the information and embedding visualizations interacting in some cases in the articles that help readers understand the significance of the story or allow them to pinpoint data that relate to them 5 Antonopoulos and Karyotakis define the practice of data journalism as a way of enhancing reporting and news writing with the use and examination of statistics in order to provide a deeper insight into a news story and to highlight relevant data One trend in the digital era of journalism has been to disseminate information to the public via interactive online content through data visualization tools such as tables graphs maps infographics microsites and visual worlds The in depth examination of such data sets can lead to more concrete results and observations regarding timely topics of interest In addition data journalism may reveal hidden issues that seemingly were not a priority in the news coverage 6 According to architect and multimedia journalist Mirko Lorenz data driven journalism is primarily a workflow that consists of the following elements digging deep into data by scraping cleansing and structuring it filtering by mining for specific information visualizing and making a story 7 This process can be extended to provide results that cater to individual interests and the broader public Data journalism trainer and writer Paul Bradshaw describes the process of data driven journalism in a similar manner data must be found which may require specialized skills like MySQL or Python then interrogated for which understanding of jargon and statistics is necessary and finally visualized and mashed with the aid of open source tools 8 A more results driven definition comes from data reporter and web strategist Henk van Ess 2012 9 Data driven journalism enables reporters to tell untold stories find new angles or complete stories via a workflow of finding processing and presenting significant amounts of data in any given form with or without open tools Van Ess claims that some of the data driven workflow leads to products that are not in orbit with the laws of good story telling because the result emphases on showing the problem not explaining the problem A good data driven production has different layers It allows you to find personalized that are only important for you by drilling down to relevant but also enables you to zoom out to get the big picture In 2013 Van Ess came with a shorter definition in 10 that doesn t involve visualisation per se Data journalism can be based on any data that has to be processed first with tools before a relevant story is possible It doesn t include visualization per se However one of the problems for defining data journalism is that many definitions are not clear enough and focus on describing the computational methods of optimization analysis and visualization of information 11 Emergence as a concept editThe term data journalism was coined by political commentator Ben Wattenberg through his work starting in the mid 1960s layering narrative with statistics to support the theory that the United States had entered a golden age 12 13 One of the earliest examples of using computers with journalism dates back to a 1952 endeavor by CBS to use a mainframe computer to predict the outcome of the presidential election but it wasn t until 1967 that using computers for data analysis began to be more widely adopted 14 Working for the Detroit Free Press at the time Philip Meyer used a mainframe to improve reporting on the riots spreading throughout the city With a new precedent set for data analysis in journalism Meyer collaborated with Donald Barlett and James Steele to look at patterns with conviction sentencings in Philadelphia during the 1970s Meyer later wrote a book titled Precision Journalism that advocated the use of these techniques for combining data analysis into journalism Toward the end of the 1980s significant events began to occur that helped to formally organize the field of computer assisted reporting Investigative reporter Bill Dedman of The Atlanta Journal Constitution won a Pulitzer Prize in 1989 for The Color of Money his 1988 series of stories using CAR techniques to analyze racial discrimination by banks and other mortgage lenders in middle income black neighborhoods 15 The National Institute for Computer Assisted Reporting NICAR 16 was formed at the Missouri School of Journalism in collaboration with the Investigative Reporters and Editors IRE The first conference dedicated to CAR was organized by NICAR in conjunction with James Brown at Indiana University and held in 1990 The NICAR conferences have been held annually since and is now the single largest gathering of data journalists Although data journalism has been used informally by practitioners of computer assisted reporting for decades the first recorded use by a major news organization is The Guardian which launched its Datablog in March 2009 17 And although the paternity of the term is disputed it is widely used since Wikileaks Afghan War documents leak in July 2010 18 The Guardian s coverage of the war logs took advantage of free data visualization tools such as Google Fusion Tables another common aspect of data journalism Facts are Sacred 19 by The Guardian s Datablog editor Simon Rogers describes data journalism like this Comment is free wrote Guardian editor CP Scott in 1921 but facts are sacred Ninety years later publishing those sacred facts has become a new type of journalism in itself data journalism And it is rapidly becoming part of the establishment Investigative data journalism combines the field of data journalism with investigative reporting An example of investigative data journalism is the research of large amounts of textual or financial data Investigative data journalism also can relate to the field of big data analytics for the processing of large data sets 20 Since the introduction of the concept a number of media companies have created data teams which develop visualizations for newsrooms Most notable are teams e g at Reuters 21 Pro Publica 22 and La Nacion Argentina 23 In Europe The Guardian 24 and Berliner Morgenpost 25 have very productive teams as well as public broadcasters As projects like the MP expense scandal 2009 and the 2013 release of the offshore leaks demonstrate data driven journalism can assume an investigative role dealing with not so open aka secret data on occasion The annual Data Journalism Awards 26 recognize outstanding reporting in the field of data journalism and numerous Pulitzer Prizes in recent years have been awarded to data driven storytelling including the 2018 Pulitzer Prize in International Reporting 27 and the 2017 Pulitzer Prize in Public Service 28 Taxonomies editMany scholars have proposed different taxonomies of data journalism projects Megan Knight suggested a taxonomy that is based on the level of interpretations and analysis that is needed in order to produce a data journalism project Specifically the taxonomy included number pullquote static map list and timelines table graphs and charts dynamic map textual analysis and info graphics 29 Simon Rogers proposed five types of data journalism projects By just the facts Data based news stories Local data telling stories Analysis and background and Deep dive investigations 30 Martha Kang discussed seven types of data stories namely Narrate change over time Start big and drill down Start small and zoom out Highlight contrasts Explore the intersection Dissect the factors and Profile the outliers 31 Veglis and Bratsas proposed another taxonomy that is based on the method of presenting the information to the audience Their taxonomy had an hierarchical structure and included the following types data journalism articles with just numbers with tables and with visualizations interactive and non interactive Also in the case of stories with interactive visualizations they proposed 3 distinct types namely transmitional consultational and conversational 32 Data quality editIn many investigations the data that can be found might have omissions or is misleading As one layer of data driven journalism a critical examination of the data quality is important In other cases the data might not be public or is not in the right format for further analysis e g is only available in a PDF Here the process of data driven journalism can turn into stories about data quality or refusals to provide the data by institutions As the practice as a whole is in early development steps examinations of data sources data sets data quality and data format are therefore an equally important part of this work Data driven journalism and the value of trust edit Based on the perspective of looking deeper into facts and drivers of events there is a suggested change in media strategies In this view the idea is to move from attention to trust The creation of attention which has been a pillar of media business models has lost its relevance because reports of new events are often faster distributed via new platforms such as Twitter than through traditional media channels On the other hand trust can be understood as a scarce resource While distributing information is much easier and faster via the web the abundance of offerings creates costs to verify and check the content of any story create an opportunity The view to transform media companies into trusted data hubs has been described in an article cross published in February 2011 on Owni eu 33 and Nieman Lab 34 Process of data driven journalism editThe process to transform raw data into stories is akin to a refinement and transformation The main goal is to extract information recipients can act upon The task of a data journalist is to extract what is hidden This approach can be applied to almost any context such as finances health environment or other areas of public interest Inverted pyramid of data journalism edit In 2011 Paul Bradshaw introduced a model he called The Inverted Pyramid of Data Journalism Steps of the process edit In order to achieve this the process should be split up into several steps While the steps leading to results can differ a basic distinction can be made by looking at six phases Find Searching for data on the web Clean Process to filter and transform data preparation for visualization Visualize Displaying the pattern either as a static or animated visual Publish Integrating the visuals attaching data to stories Distribute Enabling access on a variety of devices such as the web tablets and mobile Measure Tracking usage of data stories over time and across the spectrum of uses Description of the steps edit Finding data edit Data can be obtained directly from governmental databases such as data gov data gov uk and World Bank Data API 35 but also by placing Freedom of Information requests to government agencies some requests are made and aggregated on websites like the UK s What Do They Know While there is a worldwide trend towards opening data there are national differences as to what extent that information is freely available in usable formats If the data is in a webpage scrapers are used to generate a spreadsheet Examples of scrapers are WebScraper Import io QuickCode OutWit Hub and Needlebase retired in 2012 36 In other cases OCR software can be used to get data from PDFs Data can also be created by the public through crowd sourcing as shown in March 2012 at the Datajournalism Conference in Hamburg by Henk van Ess 37 Cleaning data edit Usually data is not in a format that is easy to visualize Examples are that there are too many data points or that the rows and columns need to be sorted differently Another issue is that once investigated many datasets need to be cleaned structured and transformed Various tools like OpenRefine open source Data Wrangler and Google Spreadsheets 38 allow uploading extracting or formatting data Visualizing data edit To visualize data in the form of graphs and charts applications such as Many Eyes or Tableau Public are available Yahoo Pipes and Open Heat Map 39 are examples of tools that enable the creation of maps based on data spreadsheets The number of options and platforms is expanding Some new offerings provide options to search display and embed data an example being Timetric 40 To create meaningful and relevant visualizations journalists use a growing number of tools There are by now several descriptions what to look for and how to do it Most notable published articles are Joel Gunter ijf11 Lessons in data journalism from the New York Times 41 Steve Myers Using Data Visualization as a Reporting Tool Can Reveal Story s Shape including a link to a tutorial by Sarah Cohen 42 As of 2011 the use of HTML 5 libraries using the canvas tag is gaining in popularity There are numerous libraries enabling to graph data in a growing variety of forms One example is RGraph 43 As of 2011 there is a growing list of JavaScript libraries allowing to visualize data 44 Publishing data story edit There are different options to publish data and visualizations A basic approach is to attach the data to single stories similar to embedding web videos More advanced concepts allow to create single dossiers e g to display a number of visualizations articles and links to the data on one page Often such specials have to be coded individually as many Content Management Systems are designed to display single posts based on the date of publication Distributing data edit Providing access to existing data is another phase which is gaining importance Think of the sites as marketplaces commercial or not where datasets can be found easily by others Especially of the insights for an article where gained from Open Data journalists should provide a link to the data they used for others to investigate potentially starting another cycle of interrogation leading to new insights Providing access to data and enabling groups to discuss what information could be extracted is the main idea behind Buzzdata 45 a site using the concepts of social media such as sharing and following to create a community for data investigations Other platforms which can be used both to gather or to distribute data Help Me Investigate created by Paul Bradshaw 46 Timetric 47 ScraperWiki 48 Measuring the impact of data stories edit A final step of the process is to measure how often a dataset or visualization is viewed In the context of data driven journalism the extent of such tracking such as collecting user data or any other information that could be used for marketing reasons or other uses beyond the control of the user should be viewed as problematic according to whom One newer non intrusive option to measure usage is a lightweight tracker called PixelPing The tracker is the result of a project by ProPublica and DocumentCloud 49 There is a corresponding service to collect the data The software is open source and can be downloaded via GitHub 50 Examples editThere is a growing list of examples how data driven journalism can be applied The Guardian one of the pioneering media companies in this space see Data journalism at the Guardian what is it and how do we do it 51 has compiled an extensive list of data stories see All of our data journalism in one spreadsheet 52 Other prominent uses of data driven journalism are related to the release by whistle blower organization WikiLeaks of the Afghan War Diary a compendium of 91 000 secret military reports covering the war in Afghanistan from 2004 to 2010 53 Three global broadsheets namely The Guardian The New York Times and Der Spiegel dedicated extensive sections 54 55 56 to the documents The Guardian s reporting included an interactive map pointing out the type location and casualties caused by 16 000 IED attacks 57 The New York Times published a selection of reports that permits rolling over underlined text to reveal explanations of military terms 58 while Der Spiegel provided hybrid visualizations containing both graphs and maps on topics like the number deaths related to insurgent bomb attacks 59 For the Iraq War logs release The Guardian used Google Fusion Tables to create an interactive map of every incident where someone died 60 a technique it used again in the England riots of 2011 61 See also editAutomated journalism Database journalism Computational journalism Open science data Open source Open knowledge Freedom of information legislation Information visualizationReferences edit Thibodeaux Troy 6 October 2011 5 tips for getting started in data journalism archived from the original on 9 October 2011 retrieved 11 October 2011 Michelle Minkoff 24 March 2010 Bringing data journalism into curricula Archived from the original on 10 March 2020 Retrieved 6 October 2011 Philipp Meyer festivaldelgiornalismo com Archived from the original on 4 March 2016 Retrieved 31 January 2019 Lorenz Mirko 2010 Data driven journalism What is there to learn Edited conference documentation based on presentations of participants 24 August 2010 Amsterdam the Netherlands Veglis Andreas Bratsas Charalampos 1 June 2017 Reporters in the age of data journalism Journal of Applied Journalism amp Media Studies 6 2 225 244 doi 10 1386 ajms 6 2 225 1 ISSN 2001 0818 Antonopoulos Nikos Karyotakis Minos Athanasios 2020 The SAGE International Encyclopedia of Mass Media and Society Thousands Oaks CA SAGE Publications Inc p 440 ISBN 9781483375533 Lorenz Mirko 2010 Data driven journalism What is there to learn permanent dead link Presented at IJ 7 Innovation Journalism Conference 7 9 June 2010 Stanford CA Bradshaw Paul 1 October 2010 How to be a data journalist permanent dead link The Guardian van Ess Henk 2012 Gory of data driven journalism permanent dead link van Ess Henk and Van der Kaa Hille 2012 Handboek Datajournalistiek Archived 21 October 2013 at the Wayback Machine Houston Brant 2019 Data Journalism In the International Encyclopedia of Journalism Studies Wiley Blackwell pp 1 9 doi 10 1002 9781118841570 iejs0119 ISBN 9781118841570 S2CID 243233501 Prophet of Hope www nationalaffairs com Retrieved 10 September 2021 Langer Emily 29 June 2015 Ben J Wattenberg writer and television commentator dies at 81 Washington Post ISSN 0190 8286 Retrieved 10 September 2021 Houston Brant 2015 Computer Assisted Reporting A Practical Guide Fourth Edition New York City Routledge p 9 ISBN 978 0 7656 4219 6 The Color of Money About NICAR National Institute for Computer Assisted Reporting Investigative Reporters and Editors Retrieved 9 February 2016 Rogers Simon 28 July 2011 Data journalism at the Guardian what is it and how do we do it The Guardian London retrieved 25 October 2012 Kayser Bril Nicolas 19 July 2011 Les donnees pour comprendre le monde in French retrieved 6 October 2011 Rogers Simon 2013 Facts are Sacred the power of data Faber and Faber ISBN 9780571301614 OCLC 815364561 Investigative Data Journalism in a Globalized World Journalism research 31 October 2019 Retrieved 18 January 2021 Special Reports from Reuters journalists around the world Reuters Retrieved 31 January 2019 dead link News Apps ProPublica Retrieved 31 January 2019 How the Argentinian daily La Nacion became a data journalism powerhouse in Latin America niemanlab org Retrieved 31 January 2019 Data The Guardian Retrieved 31 January 2019 Berlin Berliner Morgenpost Portfolio Interaktiv Team morgenpost Retrieved 31 January 2019 Data Journalism Awards datajournalismawards org Archived from the original on 21 July 2018 Retrieved 31 January 2019 The Pulitzer Prizes Pulitzer org Retrieved 31 January 2019 The Pulitzer Prizes Pulitzer org Retrieved 31 January 2019 Knight Megan 2 January 2015 Data journalism in the UK a preliminary analysis of form and content Journal of Media Practice 16 1 55 72 doi 10 1080 14682753 2015 1015801 ISSN 1468 2753 S2CID 143863693 Video course Doing Journalism with Data First Steps Skills and DataJournalism com Retrieved 30 December 2022 Wilson Jason 15 June 2015 Exploring the 7 Different Types of Data Stories MediaShift Retrieved 30 December 2022 Veglis Andreas Bratsas Charalampos 10 September 2017 Towards A Taxonomy of Data Journalism Journal of Media Critiques 3 11 109 121 doi 10 17349 jmc117309 Media Companies Must Become Trusted Data Hubs OWNI eu News Augmented Archived 2011 08 24 at the Wayback Machine Owni eu 28 February 2011 Retrieved on 2013 08 16 Voices News organizations must become hubs of trusted data in a market seeking and valuing trust Nieman Journalism Lab Niemanlab org 9 August 2013 Retrieved on 2013 08 16 Developer Information World Bank Data Help Desk datahelpdesk worldbank org Retrieved 31 January 2019 Renewing old resolutions for the new year googleblog blogspot com Retrieved 31 January 2019 Crowdsourcing how to find a crowd Presented at ARD ZDF Academy in Slideshare net 17 September 2010 Retrieved on 2013 08 16 Hirst Author Tony 14 October 2008 Data Scraping Wikipedia with Google Spreadsheets ouseful info Retrieved 31 January 2019 a href Template Cite web html title Template Cite web cite web a first has generic name help OpenHeatMap openheatmap com Retrieved 31 January 2019 Home Timetric timetric com Archived from the original on 31 January 2019 Retrieved 31 January 2019 Gunter Joel 16 April 2011 ijf11 Lessons in data journalism from the New York Times journalism co uk Retrieved 31 January 2019 Using Data Visualization as a Reporting Tool Can Reveal Story s Shape Poynter org Retrieved 31 January 2019 RGraph is a Free and Open Source JavaScript charts library for the web rgraph net Retrieved 31 January 2019 JavaScript libraries BuzzData BuzzData Retrieved on 2013 08 16 Archived from the original on 12 August 2011 Retrieved 17 August 2011 Help Me Investigate A network helping people investigate questions in the public interest helpmeinvestigate com Retrieved 31 January 2019 Home Timetric timetric com Archived from the original on 31 January 2019 Retrieved 31 January 2019 ScraperWiki Retrieved 31 January 2019 Larson Jeff 8 September 2010 Pixel Ping A node js Stats Tracker ProPublica Retrieved on 2013 08 16 documentcloud pixel ping Aˇ GitHub Retrieved on 2013 08 16 Rogers Simon 28 July 2011 Data journalism at the Guardian what is it and how do we do it The Guardian Retrieved 31 January 2019 via theguardian com Evans Lisa 27 January 2011 All of our data journalism in one spreadsheet The Guardian Retrieved 31 January 2019 Kabul War Diary 26 July 2010 WikiLeaks Afghanistan The War Logs 26 July 2010 The Guardian The War Logs 26 July 2010 The New York Times The Afghanistan Protocol Explosive Leaks Provide Image of War from Those Fighting It 26 July 2010 Der Spiegel Afghanistan war logs IED attacks on civilians coalition and Afghan troops 26 July 2010 The Guardian Text From a Selection of the Secret Dispatches 26 July 2010 The New York Times Deathly Toll Death as a result of insurgent bomb attacks 26 July 2010 Der Spiegel Wikileaks Iraq war logs every death mapped 22 October 2010 Guardian Datablog UK riots every verified incident interactive map 11 August 2011 Guardian DatablogFurther reading editHahn Oliver Stalph Florian 2018 Digital investigative journalism data visual analytics and innovative methodologies in international reporting 1 ed Cham Switzerland Palgrave Macmillan ISBN 9783319972824 OCLC 1050782792 External links editNational Institute for Computer Assisted Reporting website DataJournalism com learn Data Journalism by reading watching and discussions List of data journalism university courses and programmes from around the world The Data Journalism Handbook Towards A Critical Data Practice open access handbook on data journalism around the world awesome data journalism curated list of publicly available free open source and open access resources for learning and doing data journalism Retrieved from https en wikipedia org w index php title Data journalism amp oldid 1216800730, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.