An interview about Big Data, future forestry and data gathering company's with Angel Hsu!

by Max Krombholz; UNEP commissioner

Interview: Could you shortly introduce yourself? Angel Hsu: My name is Angel Hsu and I’m an assistant professor of environmental studies at Yale-NUS College in Singapore, with a joint appointment at the Yale School of Forestry and Environmental Studies in New Haven Connecticut. I also direct a research group between the two institutes called Data-Driven Yale. In my work I seek to apply data science to pressing environmental issues like climate change, air pollution, water etc. In which sector of forestry related big data are you focusing right now? Angel Hsu: I will talk about two projects that are related to applying big data to environmental policy questions. The first one is actually related to the groundswell of non-state and subnational climate actions. When I say non-state, I mean business, civil society investors and sub nationals like regions, states and cities. Climate actions, for the Paris agreement to be secured by these actions from NGO’s were very critical in defining a bottom up process for the Paris agreement to be inclusive and engage all stakeholders from different levels of government and different sectors. But the major questions would investigate the emergence and growth of cities, states and companies who apply their own climate targets and private goals. So how is this actually related to national government’s projects? When you try to add everything up, are we talking about additional impact or additional negotiation? It is very difficult to answer this question because of the data problem. There is not really a lot of consistent data that’s been reported from these various networks. One example was when president Trump and the United States withdrew the US from the Paris agreement in June, there were 2000 cities, CEO’s and universities in the United States that said we worsening the Paris pledge. That’s actually great, but what exactly does it mean? When the state of California or the city of New York takes on their one mitigation target and how is that related to the US national efforts and other regions or states outside the US. We are applying big data in a way to try to answer these question. Using data mapping and shifting through a huge amount of data to be able to seek the signal out of that noise. And to be able to act and add up all those efforts together, so that we can model the overall additional contribution of these actors. That’s one example of the climate change base. Another example we are working on is a new spatially-explicit index for cities to understand how they are achieving environmental goals and at the same time providing social inclusion and equality. The tool is directly designed to help cities evaluate their progress for sustainable development goal eleven, which sets a target for cities to be sustainable and also inclusive.

We are applying big data in a way to try to answer these question

For example, you see every day a new ranking, rankings are very popular, ranking the best cheeseburger or country. So there are various indicators and data. The problem with a lot of these indices is that they are highly aggregated and based on very poor data. At the end of the day, a policy maker wants to know how they can approve it, it’s very difficult to understand what’s driving those scores. So this new Index uses high resolution geospatial data from satellite and ground source data, like open street map, in order to get a much greater picture of how cities are able to reduce air pollution across the city. There is a lot of literature evidence that suggests communities in the US are often exposed to the burdens of pollution. We wanted to develop an index using big data to address this question and understand how these issues vary across the urban area. This project will be released in a couple of months. We hope that this will be very helpful for policy makers. These were two examples of how we are using data science to shed light on environmental issues. You mentioned just before, that at the moment it is not easy to collect all the data to get a whole picture of the issue. So what would be the best way to collect data? Should the government, the civil society or companies collect it? Angel Hsu: That’s a great question. I mean it has to be by all fronts. You can’t be a single actor shouldering the great burden of collecting really urgent data that we need in order to have an understanding of what’s going on in the perspective of environmental issues and also to know whether or not there not a policy response will be effective. I don’t think the situation could be direr judging by the intergovernmental panel of climate change, their models of where we are going, and early announcements of what countries are pledged through the Paris agreement. I mean we are way off track, we are on track to a 3.5 degree warming world by 2100. That’s quite alarming, considering scientists have set a goal to contain global temperature rise within 2 degrees. The Paris agreement also has this equal appointment of 1.5 degree Celsius. The situation could not be more urgent. My peers leading the project at Yale called the environmental performance index where we primarily provide on government data and government reported statistics to national policy performance. What we found is that governments are really lagging behind in terms of providing timely and accurate data to be able to understand how different regions are performing on environmental issues. We have to address all sectors of society, it has to be citizens in either helping to collect data through research or through citizen science –an increasingly popular term that’s been considered in a policy context. I think also pressuring local governments or businesses to be transparent about what’s going on in there environmental impact is critical too. I think that also companies have a huge role to play. For example Coca-Cola has better water quality data than every government in the world and that’s because they have operations in over 200 countries and water directly affects their bottom line. If they don’t have a good sense of what’s going on in the way of water quality or water availability in the regions of where they operate, that becomes an direct issue for them in terms of their profit and in terms of proper sustainable responsibility and in terms of repetition. They had a huge issue in India a couple of years ago where they were unsustainably extracting ground water resources in some Indian cities, when citizens in these Indian cities didn’t have water where they had access to. Coca-Cola and other companies have a responsibility if they are collecting this kind of data to share it for the common good.

Coca-Cola has better water quality data than every government in the world

You mentioned Coca-Cola for observing the water quality. Therefore what are the best practices or the state of art for big data in the forestry sector right now for a company or country? Angel Hsu: In the best practice for big data a lot of advance has been made over the last 10 years using satellite data to drive clear pictures of what’s going on with the world forest. There are several satellites for example the Landsat that have a 30mx30m resolution. That’s quite good, so researchers from Maryland University were able to develop a global close-to-real time picture of the state of world forests; all of this data is shared openly at the World Resources Institute’s web application, which is called Global Forestry Watch. They developed some algorithms to help to detect some early deforestation, or to detect forest fires. It’s good that we finally get to this point where we have solar data to understand what’s going on. In our Index, I just mentioned, for the environmental performance party we use this too. To come back to our questions, there are a lot of countries who are doing well particularly Sweden for example. I heard a lot about Swedish scientists who work on forest and who are not so happy with some of the indicators in the 2016 EPI, because of the satellites. In the northern latitudes you get slower growth rates and regeneration cycles. After sustainable forest harvesting, the satellites record degraded forest land. Therefore these countries get penalized. Unlike in the tropics where you have much faster regeneration cycles. So some of the Swedish scientists complained about some of the indicators. In comparison to this, Malaysia, which we know has had a huge amount of forest converted to palm oil plantations, is doing pretty well, because satellites can’t really distinguish between primary old grown forest and forest which is sustainably managed versus palm oil plantations. So I would say even now we got a lot further in terms of providing solar data. But, there are still some challenges to distinguish between the spectral data of palm oil plantations and old grown forest. And how do we proceed with the data. Do you think informatics will play a bigger role in future forestry studies? Angel Hsu: Definitely, I think so, data and analytics already are. I think as this technology evolves the opportunities will become limitless. We will see even more real time monitoring in the forestry sector. But, there is still a big gap in the monitoring of forest health and biodiversity and not just monitoring the growth of the forest cover. Last but not least. Where will big data be in 5 years? Angel Hsu: Wow. If, I’m able to answer this question I probably would be a billionaire, right? Well I think we already start to see some of these trends: Artificial Intelligence and augmented reality. But, we are going to see a lot more development in just generating a huge amount of data. I think data generating is at this point not really the issue. IBM for example estimates that 90 % of the data that exists today was generated in the last 2 years. But nevertheless only a fraction of the data really gets analysed. There is a huge gap that we need to tackle within the next 5 years to train either computers through artificial intelligence (AI) to be more efficient in finding the signals through the noise. Or second, we need to train the future environmental leaders of tomorrow to be able to handle this huge amount of data.

90 % of the data that exists today was generated in the last 2 years

There is so much data in the world and we can’t keep pace at being able to smartly analyse it and to provide actual information. In the next 5 years we need to train data analysts to fill the gap. Already now a lot of our daily decisions are based on data, for example the weather app we check before we leave the house. At least we have to train the students of today to recognize the potential and opportunities of data to gain actually insights of the data. Thank you very much for this interesting and informative Interview!]]>