Open Data in New York City

After the economic recession at the beginning of the last century, the United States government provided various different services within cities, such as social services. Unfortunately, because of poor management (Desai et al. 2012; Axinn, Stern 2011; Marx 2003; Reid 1995), these services often failed. Therefore, people asked for freedom and independence from these services. With the introduction of digital technology, managing the information flow became possible and the quality of adjustability of governmental scaling as the economy continued to grow increased, said Noel Hidalgo, an open data advokat from Code for America. Partly in answer to the socialist movements in the last century, the USA has had a tendency towards more conservative, smaller government as well as more agile and profitable businesses. Following a path focusing on lean government and information flow technology, American people recognized that if the technology to manage very large-scale production of social services through data or information is in place, this increases the quality of governmental services. As Hidalgo points out, the concept of ‘open government’ is an extension in many ways to this idea. Open government is a very conservative approach to governance, with the central aim being the creation of a leaner, more agile and more effective government. It is possible to have a leaner government if the information about the production of services is available.

History

To achieve the above mentioned goal, open data enables the citizens to understand what is going on within a system, like the governmental system, and how this system could be improved. From a governmental perspective it is also important to realize that, on a national scale, one has to deal with so much information that affects so many lives. Therefore it is essential to understand the information that is available and being produced by the system, not only for management reasons but also for ethical ones.

The interested citizen in America saw the effectiveness in the private sector and came to the conclusion that the methods that were used in the private sector could also be used in the governmental sector. Open government enables, through the acrruing transparencey, the citizens to be the checks and balances of the governmental operations. This is the fundament of every democratic system, but in the United States ties in with a common mistrust of governmental operations in general.

The information provided through an open governmental approach enables the citizens to, for example in case of a non-functional government, find the cause and how to deal with it. While some people would argue to stop non working policies, like the disasterous start of the Health Care Bill in the U.S., one could use available data to analyze and fix the problem. The second approach is then a more evicence-based one then the first.

New York has about 10 million residents and grows to about 15 million people during a normal workday and even more for occasional events like the New York Marathon (NYC OpenData 2013). The people expect, rightly, according to Michael Flowers, who is the director of the Office of Policy and Strategic Planning, that the city of New York provides the services and operations (like public transportation, security, energy, etc.) for all, not just for their residents, as they are paid for by tax revenues and therefore by all.

For decades, political activist groups were looking at the performance of the different agencies. Their argument, as Noel Hidalgo says, was that they wanted to know what is going on within the city. From an accountability perspective, they wanted to know how public transportation is operating, how effective the public, social and healthcare infrastructure is, and how the schools were operating. They especially wanted to know how the tax money is deployed to finance the streets, police and fire departments, in order to be able to hold them accountable.

Mayor Bloomberg sees himself as a technologist. He started a technological company that makes profits with media, data and information analysis. According to several people (The New York Times 2013b), he was the one who brought the accountability practice into the municipal system to the disappointment of the unions said Flowers. He implemented this approach at first in his own office, but successively in many others. If we want to improve the governmental services, we need to know what is performing and what is underperforming, as Christopher Corcoran, who is Michael Flowers assistant, says. The foundation of such an approach is to have an analytically driven government and therefore data that can be analyzed. One can recognize the importance of Mayor Bloomberg’s approach by looking at the city’s organigram. The 2009 founded analytical task force, led by Michael Flowers, is directly attached to the mayor’s office.

However, the open data initiative existed long before Bloomberg. The basis of the open data initiative is the city charter, which is the city’s constitution. This charter is re-writtenevery ten years. In 1989 there were already forward-thinking advocates that saw that the idea of an open government requires open data. Thus, they lobbied for the inclusion of a paragraph into the city charter that required the government to create a data catalogue of all data sets that were available in the municipality. This data catalogue was to be maintained by a public advocacy office.

In 1989 the technology was not available to manage the large data volumes. The advocacy office, and an understanding of how this data could be used, were not yet developed. As the technology became available and less expensive to deploy, things started to change in the technological industry. In 2009 the city realized that the idea of an open data catalogue was enshrined in the constitution, but hadn’t been acted upon. By 2013, according to Noel Hidalgo, all three necessary components were available: there is an active community of activists, citizens, and businesses, the technology is affordable for almost everybody and the political understanding and awareness to make this open government are working together.

Thus, the open data initiative is based on an initiative that started decades ago, but through Obama’s open government initiative on the federal level, New York was able to take these arguments and apply them to the city level. An interesting side note is that quite a few authors of the federal opengovernment initiative came from New York with a deep knowledge of how the city is functioning. So, one can assume that NYC is one of the cores of the open government movement in the United States.

By Bloomberg’s 3rd term as New York’s mayor, after being successful in many other areas on his agenda, Bloomberg was able to say there was enough constituency, enough consumers and people advocating for the concept of open data, that he initiated a collaboration between the mayor’s office, the city council and the good government groups to pass the so-called ‘local law 11’ legislation said Hidalgo. This bill framed and enshrined the previous activitiesthat were going on. Now, finally, the technology, the community and the awareness were at the same level of maturity.

Approach

One example of how governmental decision-making affects a great number of people is transportation policy, said Hidalgo. It affects everybody regardless of their life circumstance: whether people have children or not, whether they are married or not, whether they are gay or straight, or black or white. The public streets, the sidewalks, etc. are fundamental for urban living, but NYC does not have a DOT that has the ability to collect all the data across the city to manage transportation policies based on the actual needs of the people. The expertise is not evenly distributed within the districts of the city, but based on, for example, the financial potential of the district. The open data approach can help to bridge this gap by providing data from other parts of the urban system to fill the gaps and build a data-driven information system to inform policy makers and provide a more analytical view of how to approach urban planning.

Michael Flowers and Christopher Corcoran explained that the key for this approach is to use data in an applied way. All data in an urban system has a frame of reference into which the data belongs. The frame of reference is not the same for all datasets, but because there is an ontological structure in geospatial data, it is possible to map more specific geolocations to more general ones. One of the most important geospatial keys is the Building Identification Number (BIN), which belongs to a block. However, as Flowers stated, big data is useless unless it is used for a real world problem.

A good example is the increase in efficiency of NYC’s tax controllers. The city has a several million businesses but only a few tax controllers. If these few controllers engage only in random checks to ensure businesses are paying the correct amount of tax, their efficiency is not very high. It would be better to send them to only the most likely perpetrators; but the question was how to find them. The Flowers’ data task force set out to develop a system for identifying tax evaders. What they came up with is very simply but effective. They developed an indicator of business activity and concluded that wherever a business is, there is waste (garbage, wastewater, etc.). This data is available per BIN and can be matched with tax data. If there is a lot of waste but no tax income, this represents an anomaly. Although it alone is not proof that someone is engaged in tax fraud, it is more likely that this person or business is attempting tax fraud than someone whose tax and waste values match. This indicator can be used to select which businesses the controllers should be sent to check on. With this basic system the controllers’ success rate was improved from about 10% to almost 90%. One of the central points is that this does not have much influence on the controller. They do the same work, but are provided with a weighted list instead of a random list. This also makes the approach more successful because, as Flowers claims, the controllers were not forced to change their work habits. It is important to keep in mind that Flowers’ department is focused on the correlation, not the causality, between indicators. The analysis doesn’t need to be perfect, it simply needs to be good enough to be used, and this means better than before without analysis.

The main objective, as Nate Silver, statistician and editor-in-chief of ESPN’s FiveThirtyEight blog, said, is that data analysis follows the Pareto principle of prediction. This means that with 20% effort you gain about 80% accuracy. Therefore the “20% often begins with having the right data, the right technology, and the right incentives. You need to have some information – more of it rather than less, ideally – and you need to make sure that it is quality-controlled. You need to have some familiarity with the tools of your trade – having top-shelf technology is nice, but it’s more important that you know how to use what you have. You need to care about accuracy – about getting at the objective truth – rather than about making the most pleasing or convenient prediction” (Silver 2012). This is exactly what Flowers’ team did. They started with what they had and used it very wisely – they started with Excel sheets.

The question is how the available data is used. Hidalgo said that the government know there are digital divides across socio-economic variables within the city, but that they believe technology can be used to bridge these types of divides. The question is how to implement this in the urban planning process. How should the technology be applied in order to bridge that divide? Open data will lead urban agencies and operators from a trial and error principle to more scientific and empirically driven decision-making. This may reduce the number of wrong decisions made by these agencies.

Hackathons, an event in which computer programmers and others involved in software development, and other such events are a central element of the success of the open data initiative and are, according to Nathanael Bassett, a media researcher at The New Scool conducting research about open data activists, based on the old idea of hack labs, a community-operated workspace where people with common interests in computers, technology, digital art or electronic art, can meet, socialize and collaborate. The participating people who invest their free time to analyze the data to identify possibilities for improvement in the city. The participants are volunteers like other volunteers as well, just with a different focus. The participants in such events are surprisingly not just young people or males; it is a very mixed subculture that based on equality. Hackathons are about learning to solve problems, learning APIs (application programming interface), networking and having a good time. Differing from occupy events, which are goal oriented, a hackathon is more about the social gathering like at a LAN party, a temporary gathering of people with computers, between which they establish a local area network (LAN), primarily for the purpose of playing multiplayer video games. Sometimes, however, the results of such an event even become the foundation of a business. Since hackathons are often about data and analysis, data that is needed is not always available. In such cases the data is sometimes created by crowd-sourcing, apractice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, or is put together from different sources like the ones that can be found under NYC’s open data portal (NYC OpenData 2013).

Even if urban planning uses an evidence-based approach, the community will always be within their own self-interest. There will always be members of this community who havemore political power and are able to organize and express themselves in a more concrete and vocal way, and therefore disproportionately represent themselves as the voice of the community. Open data will leverage the differences provided that they have the education to understand the data. Now we are starting to use data to combat some of these traditional perspectives, but they will always exist. This makes education a central element for the future.

Players

The open data initiative is not a specific project, but a strategy and procedure. Therefore, it is very difficult to determine the key players within it. One person that was mentioned almost every interview is Michael R. Bloomberg, mayor of NYC from 2002 and 2013 and founder of the Bloomberg Empire. His tech company is about media and data and analyzing information. He has brought these accountability practices into NYC’s municipal infrastructure. This cuts through all the different departments like performance-based operations. In order to hold people accountable based on their performance, it is necessary to have an analytically driven government. If the goal is to improve governmental services, it is important to know what is performing and what is underperforming, and hold people accountable to that.

Hidalgo said that the mayor’s management report, that include governmental key performance indicators (KPIs) had already been a part of the city’s architecture long before Bloomberg become mayor. He just continued that in different aspects of the city municipal operations. He was able to do this because he doesn’t need to report this to any union or investors. As the 13th richest man in the world, according to Forbes (Forbes 2013b), and an independent politician, he is not accountable to unions or political parties. His political power comes from the fact that he is financially and politically independent. He doesn’t need to placate any political interest of other groups for re-election.

Within the NYC government, Gale Arnot Brewer, a city council member for the 6th district, was one of the main characters who re-wrote the city charter and integrated the open data initiative in the charter. Another key actor is Beth Novak, who served as United States Deputy Chief Technology Officer for open government and led President Obama’s Open Government Initiative. For Local Law 11, Philip Ashlock was one of the persons who analyzed all the legislative options for this bill. The open data initiative would probably not exist if public interest groups had not been advocating for it for decades. Two central individuals in this group are John Keny from OpenPlans and Jean Grashnow from NeighborWorks. Another very important actor in NYC who is also well connected is Noel Hidalgo, a founding member of the New York City Transparency Working Group (nycTWG), a network of NYC civic groups who advocate for greater transparency in city government. In 2012, nycTWG lobbied for the passing of NYC Local Law 11 of 2012, then America’s premier municipal Open Data law. He also works for Code for America as NYC program manager.

In the applied data analysis sector of the city is the office of Policy and Strategic Planning, lead by Mike Flowers. This is often also called the Mayor’s Geek Squad. This department serves as a service provider for data analysis problems for all city departments. In the security sector the key person is Raymond Kelly. He introduced COMSTAT and laid the foundation for the Smart Public Safety approach for the New York Police Department (see Smart Public Safety). In the health sector, one of the most important actors, when it comes to data driven approaches in public health, is Dr. Thomas Farley, who was appointed NYC Health Commissioner in May 2009. He is the person behind innovative initiatives such as the comprehensive tobacco control program, the elimination of trans fats in restaurant food, a requirement for chain restaurants to post calorie information on menu boards, and development of an electronic health record.

Challenges

Over all, Corcoran sees challenges in four main areas: politics, cultur, law and technology. Because of the NYC’s administrative structure, only the mayor’s office can merge offices and data. Bloomberg’s philosophy is “if you can’t measure it, it doesn’t exist”, but a different mayor may think differently. If the mayor’s office does not support this strategy, evidence-based urban planning becomes very difficult or even impossible. Even if the mayor’s office supports this approach, the involved agencies need to do so as well, if it is to be successful. This is because the data is produced implemented and used on the agency level. One needs to understand the agencies in order to understand what to expect from the data. However, the available data cannot always be used as a result of legal restrictions. One example is tax data. In Germany, it is not permisable for the city to use tax data for urban planning purposes. This data is purpose-only data, which means that it cannot be used for purposes other than what the data is compiled for.

The smallest problem is the technology. Sometimes all the available data is accessible, but the amount of data is so big that the city does not have the computing power or the human resources to compute meaningful results. Nowadays, at least the computing part becomes, thanks to Moore’s law, a minor one. As Aaron Ogle and Noel Hidalgo pointed out, educated people that can deal with the data and have an understanding of it as well as an active community to use the advantages of open data are key components in such an initiative.

Not everybody is thrilled about this development, as it results in holding people accountable for their performance. Especially within the unions and city departments there are groups fighting against these approaches. One of the reasons, according to the unions, is that this strategy puts, too much pressure on the employees. Their fear is that these strategies will be used to reduce the number of jobs and increase the workload of those remaining.

The open data initiative can have a positive effect on start ups which are able to use the available data and create something new with it, but it can also have some negative effects on existing companies, especially those who rely on an information advantage. For example, back when the CatholicChurch had the power to decide who was taught to read and who was not, and only permited members of the clergy to acquire this ability, the invention of mass-produced reading material undermined the power of the church. This eventually led to the point where everybody was allowed to learn to read, however, this came at a price: the church lost part of its power (Buttler 2007; Dewar 1998).

It is often said that there is need for more data; that we need to collect everything that is possible. This is actually not true. At the moment, there is a large amount ofdata out there to be analyzed. The main problem, according to Steve Koonin, director of the Center for Urban Science and Progress at New York University, is that the information is not used, as yet. Another point is that Smart Cities need to focus on the people and therefore policies need to be problem-driven. For Susan Christopherson, professor at Cornell University, this means that the following questions must be asked: What do we need to solve the challanges? Do we have the data needed, or the structures to solve these issues, or do we need new create new structures and data?

Important factors

As described earlier, the availability of data itself is not a guarantee that the data will be used or that the usage is beneficial for the city. The main success factor is the engagement of educated people who understand the possible impact using this data would have for the city. These individuals need to be capable of understanding and analyzing available data. Therefore, it is necessary to have an open-minded culture which is enthusiastic about data and an environment that supports this culture. Events, such as hackathons, can be a part of it, but are not the ‘holy grail’ for creating this type of environment. Brad Feld, an early stage investor and entrepreneur since 1987, said that the most important point is that someone starts (Feld 2012). Investors or municipalities cannot do this because the people need to start themselves. A city can only support them by, for example, providing shared workspace.

Criteria for success

The success criteria depend on the sector that is using the data. One can think of measuring the download rates of data files as a criteria to measure the success of open data usage, but this may not be the best indicator. A better approach is to evaluate whether the usage of data in a certain sector is successful. One can, for example, measure the cost reduction by weather depending delivering (something that is quite important in the US) of supply chain company’s who using services that are based on available open weather data and compare the results with companies who don’t use such services or with the prior costs. The success criteria could then be the duration of deliverables, average cost for a delivering, etc., as a result of improved supply chain delivery through data analysis.

However, open data doesn’t necessarily mean digital data. Open data can also be data that is presented to the public in order to make better decisions. A very good example is the public health sector. The local government inspects restaurants within a city. This data can be presented to the customers. Studies on this topic suggest that doing so will most likely decrease food-borne illnesses by 20 to 30% (Irwin et al. 1998; Jones et al. 2004; Simon et al. 2005). In this case one could measure the success of open data by measuring the ICD-10 (the International Statistical Classification of Diseases and Related Health Problems) cases related to foodborne illnesses.

The mayor’s office defined such indicators for NYC in its annual report and the PlaNYC. Mayor Bloomberg also started to manage the city like a company. Dennis Smith described it this way: “Basically they [a manager of a company ed.] can convert the performance of different parts of the business into profit. I think with cities it is a lot harder because there really are sanitation outcomes, there are health outcomes, there are safety outcomes. […] This management report last September 2012, for the first time the mayor’s management report has about ten pages with indicators that are not agency specific. The things that citizens in New York expect this city to do for them and how we do it. I really want to think about what are the things, the performance that they are expecting of the city. And then figure how we are going to measure whether we are getting those results, those outcomes. And then what back-warding a logic model would help all the different things and what the citizens have to do.”

One can summarize the key success criteria for the open data initiative as an evidence based approach to managing the city.

References

Desai, S., Garabedian, L., & Snyder, K.. (2012). Performance-based contracts in new york city. Rockefeller Institute Brief, p. 1-31.

Axinn, J. J., & Stern, M. J.. (2011). Social welfare. Pearson Higher Ed.

Marx, J. D.. (2003). Social welfare. Allyn & Bacon.

Reid, P. N.. (1995). Social welfare history. Rl edwards (ed-in-chief).

NYC OpenData. (2013). Sustainability indicators. https://nycopendata.socrata.com/Environmental-Sustainability/Sustainability-Indicators-2012-/6r4h-c2y6

The New York Times. (2013). Talking bloomberg – video feature – nytimes.com. http://www.nytimes.com/interactive/2013/08/18/nyregion/bloomberg-voices.html

Silver, N.. (2012). The signal and the noise. New York: The Penguin Press.

Forbes. (2013). Michael bloomberg – forbes. http://www.forbes.com/profile/michael-bloomberg/

Buttler, C.. (2007). FC74: the invention of the printing press and its effects – the flow of history. http://www.flowofhistory.com/units/west/11/FC74

Dewar, J. A.. (1998). The information age and the printing press. Rand.

Feld, B.. (2012). Startup communities. John Wiley & Sons.

Irwin, K., Ballard, J., Grendon, J., & Kobayashi, J.. (1989). Results of routine restaurant inspections can predict outbreaks of foodborne illness: the seattle-king county experience. American journal of public health, 79(5), 586-590.

Jones, T. F., Pavlin, B. I., LaFleur, B. J., Ingram, A. L., & Schaffner, W.. (2004). Restaurant inspection scores and foodborne disease. Emerging infectious diseases, 10(4), 688.

Simon, P. A., Leslie, P., Run, G., Jin, G. Z., Reporter, R., Aguirre, A., & Fielding, J. E.. (2005). Impact of restaurant hygiene grade cards on foodborne-disease hospitalizations in los angeles county.

Leave a Reply