How the principles of multivariate analysis can help analyse complex problems from climate change to healthcare to social justice

Navo
6 min readDec 11, 2022
Photo by Thomas Kupper on Unsplash

Whether running a business or working toward improving people's lives, at its core, we are trying to solve a problem. Solving a problem doesn't mean everyone's role is to find a solution; each of us plays a different role in our society. You could be the one coming up with the solution, or you could be the one creating awareness of the problem to inspire people to find the solution. Despite the role, having clarity and a deeper understanding of the problem allows us to relate and be more connected in the right direction.

I studied statistics for four years at the university and worked in roles that involved data analysis for over eight years. And during that period, what I have learnt was that whenever I encounter a problem that requires finding a solution, I treat it as a data analysis problem. Whether it is how to increase revenue, how to reduce carbon footprint, how to be more healthy, or how to be more confident, I follow the principle that there is always data available regarding the problem, now I need to analyse and build conclusions and take action as a way forward.

This is easier said than done, though, because building conclusions is about something other than the data analysis itself; it is about your knowledge and understanding of the variables.

One of the benefits of thinking this way is that you do not feel overwhelmed by any topic, whether it is an entirely new topic that you might have never been exposed to or you want to learn about a new domain; these principles will be helpful to you.

I have always asked myself, how do I know if I know enough about something? While everything is a journey of continuous learning and growth, I realised that treating any topic as a problem of data analysis helped me to understand what I know and what I don’t know and also to be able to connect what I know to other knowledge and experiences of my life to generate better insight, knowledge and solutions.

Data analysis is all about using statistical methods to understand and explain variables. Data analysis starts to get interesting when there is more than one variable. But let’s start with just one variable; what does that mean? The simplest way to think about it is that imagine a table with just one column, and you were asked to analyse it. This is the simplest form of data analysis that is possible. In theory, this is called Univariate analysis. For example, say you have a table with one column with values regarding greenhouse gas (GHG) emissions in metric tonnes per year. Yes, you can find the average of that data, minimum and maximum, but one variable cannot tell a broader story.

Before I move further, I want to introduce the first categorisation of variables. Variables can be divided into categorical or continuous data. Say you have a table with one column with all the countries in the world; this is a categorical variable; say now you have a column with data related to GHGs as mentioned above; this is a continuous variable because these data are obtained with respect to a continuous scale. (e.g., metric scale)

Let's say we have two columns, one with the GHG emission data and the other representing the respective country. This data is far more helpful than the previous one because now you can tell a broader story, which country emits more GHGs, compare each other, and visualise the data in graphs and charts. This is called Bivariate analysis. Bivariate data has two variables, which can be categorical or continuous.

Below shows the scaled GHG emissions by each country in 2018

GHG emission in 2018 https://www.unep.org/

The exciting thing about Bivariate analysis is that it is the starting point for understanding that there is another categorisation of variables as independent and dependent variables. What makes a Bivariate analysis is that one variable is dependent, and the other is independent. That means the change in one variable (the independent variable) changes the outcome of the other variable (the dependent variable). If both are independent, it would be two Univariate analyses.

Most of the analyses people conduct in business and day-to-day life to understand problems are either Univariate or Bivariate analyses. These analyses are easy to carry out using Excel or Google Sheets.

If we go one step further, imagine there are more than two variables, say four. The process is now a bit more complicated; however, even in this case, if there is only one dependent variable (one outcome) and the remaining three are independent variables, the process is still straightforward. This process is called a Multiple-regression analysis. Now we are getting closer to real-life problems. Because in real life, things are not binary; there is so much more.

Let's dive into reality; we are interested in understanding how a set of variables (say, twelve variables) affect the outcome of a variable. For example, let's take Australia's “Black Summer” climate crisis event during the 2019–2020 summer season. We are interested in understanding which factors increase the risk of wildfires so that we can predict and control the outcomes in the future. An analysis like this would have variables as follows but not limited to;

  • Fire radiative power
  • Change in Vegetation Health Index
  • Wind speed
  • Temperature
  • Soil moisture
  • Change in Live Fuel Moisture Content
  • Change in the fractional cover of photosynthetic vegetation
  • Rainfall in the last year before the fire started
  • Fire Danger Index (FDI)
  • Lightning strikes
  • The population within 10 km of the ignition
  • Aboriginal lands (% Indigenous PA)

The above are some of the possible variables to conduct this analysis. The complexity of this analysis comes from the fact there are a lot of variables to start with, and most importantly, their variables are not independent. there are internal relationships between these variables. Thus, it is difficult to explain the outcome variable we want just by knowing the variables. This type of analysis with more than one dependent variable is known as Multivariate analysis.

Now, this is where you will sound super smart with terms like MANOVA, Correlation matrix, Random Forest, and Naïve Bayes analysis. All these are statistical techniques used to process multiple variables that depend on each other to analyse and come to conclusions.

But the idea of this article is not to draw your attention to these statistical models. People who do research can be bothered by those. But the idea was to communicate that majority of the problems that we see as complex are because those are multivariate problems. But suppose we try to understand these problems using bivariate thinking. In that case, we will not be able to fully understand the depth of the problem and will find it difficult to build connections between variables. If you struggle to find connections, you will struggle to communicate the real story.

Therefore the motivation of this article is to take out the principle in this type of analysis, which is to look at complex problems as multivariate problems. Once you define that, it is a journey of continuously looking and exploring different variables connected to the problem, having the clarity to understand that these are not independent variables, and spending time exploring the context of these variables and building connections.

Next time you encounter a complex problem, list the variables connected to it, then learn about them and continue exploring more variables while understanding their relationships.

Many problems related to social justice have many dependent variables. Often, you will come across people trying to rationalise things by filtering out either one variable or covering up dependencies between variables.

The article was never intended to teach you about multivariate analysis but instead to inspire and encourage you to think about complex problems from a different perspective so that people can make the best and most informed choices.

References

Mengual-Macenlle, N., Marcos, P. J., Golpe, R., & Gonz ́alez-Rivas, D. (2015). Multivariate analysis in thoracic research. Journal of Thoracic Disease. doi: 10.3978/j.issn.2072–1439.2015.01.43

Nimkar, H. (2022). Overview of multivariate analysis. what is multivariate analysis and model building process. Retrieved from https://www.mygreatlearning.com/blog/introduction-to-multivariate-analysis/

Olkin, & Sampson, A. (2001). Multivariate analysis: Overview. International Encyclopedia of the Social & Behavioral Sciences, 10240–10247. doi: https://doi.org/10.1016/B0-08-043076-7/00472–1

Sulova, A., & Jokar Arsanjani, J. (2020). Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine. Remote Sensing, 13(1), 10. MDPI AG. Retrieved from http://dx.doi.org/10.3390/rs13010010

--

--

Navo

Making the world a better place through data