1. Data origins

For this project I am going to use World Happiness Report 2019 data. It is taken from https://www.kaggle.com/unsdsn/world-happiness?select=2019.csv

It uses the Gallup World Poll that ranks 156 countries by how happy their citizens perceive themselves to be.

It includes the following columns:

  1. Overall rank
  2. Country or region (156)
  3. Score
  4. GDP per capita
  5. Social support
  6. Healthy life expectancy
  7. Freedom to make life choices 8. Generosity
  8. Perceptions of corruption

They demonstrate the extent to which these factors contribute to happiness of each country.

Here are the top five happiest countries:

data2019 <- read.csv("2019.csv")       #load the data

bottom <- tail(data2019, 5)            #show 5 last raws
top <- head(data2019, 5)               #show 5 first rows 
print(top)
##   Overall.rank Country.or.region Score GDP.per.capita Social.support
## 1            1           Finland 7.769          1.340          1.587
## 2            2           Denmark 7.600          1.383          1.573
## 3            3            Norway 7.554          1.488          1.582
## 4            4           Iceland 7.494          1.380          1.624
## 5            5       Netherlands 7.488          1.396          1.522
##   Healthy.life.expectancy Freedom.to.make.life.choices Generosity
## 1                   0.986                        0.596      0.153
## 2                   0.996                        0.592      0.252
## 3                   1.028                        0.603      0.271
## 4                   1.026                        0.591      0.354
## 5                   0.999                        0.557      0.322
##   Perceptions.of.corruption
## 1                     0.393
## 2                     0.410
## 3                     0.341
## 4                     0.118
## 5                     0.298

2. Research questions

My goal is to find out the defferences in contribution of various factors to happiness of countries across the world. In particular, I am going to look at the differences in contribution of the given factors to the top 5 and bottom 5 countries and compare them to the average score across all given countries.

3. Data preparation and visualization

For this project I have used the following packages:

dplyr, tidyr, ggplot2, gganimate

First, lets look at the level of happiness across all 156 countries.

library(ggplot2)

data2019 <- as.data.frame(data2019) 

t <- ggplot(data = data2019, 
            aes(x = reorder(Country.or.region, -Score), y = Score)) +     #to create scatterplot
     geom_point(stat="identity", color = "darkblue", size = 1) +
     ggtitle("World Happiness Report 2019")                               #to add title

t <- t + xlab("Countries")                                                #change x-axis label              
t <- t + ylab("Happiness Score")                                          #change y-axis label
t + theme(axis.text.x = element_text(angle = 90,size = 4, hjust = 1))     #change the size and the angle 

Next, let’s look at how happy citizens of the top 5 and bottom 5 countries perceive themselves to be.

topbottom <- rbind(top, bottom)    #to combine the first 5 and the last 5 contries

library(gganimate)                                             
tbanim <- ggplot(topbottom, 
                aes(x = reorder(Country.or.region, - Score), y = Score)) +
       geom_bar(stat = "identity",
                aes(fill = Score)) +
       transition_states(Score, transition_length = 1, state_length = 3,
       wrap = TRUE) +                                 
       shadow_mark() +                                                    
       enter_grow() +                                 
       ggtitle("The most and least happy countries") +
       theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
       xlab("Countries") +
       ylab("Happiness Score")
print(tbanim)

animate(tbanim)

Now, let’s analyze the factors that contribute to the happiness.

Firstly, let’s see what factors contribute the most to happiness of the top five countries.

library(tidyr)

top <- head(data2019, 5)
top <- top[, -c(1, 3)]      #delete unnecessary columns


gathereddata <- gather(top, Factors, Score, -Country.or.region)

grouped <- ggplot(data = gathereddata, 
                  aes(x = reorder(Factors,- Score), y = Score,  fill = Country.or.region)) +
           geom_bar(stat="identity", position ="dodge", colour = "black", width = 0.6) +
           ggtitle("The 5 happiest countries")

grouped <- grouped + xlab("Factors")

grouped + theme(axis.text.x = element_text(angle = 15, hjust = 1, size = 7))

It is clear that social support and GDP per capita contribute the most to happiness of the top 5 countries.

Secondly, let’s look at the contribution of factors to happiness of the bottom five countries.

bottom <- tail(data2019, 5)
bottom <- bottom[, -c(1, 3)] 


gathereddataB <- gather(bottom, Factors, Score, -Country.or.region)

groupedB <- ggplot(data = gathereddataB, 
                   aes(x = reorder(Factors, Score), y = Score,  fill = Country.or.region)) +
            geom_bar(stat="identity", position="dodge", colour = "black", width = 0.6) +
            ggtitle("The least happy countries")

groupedB <- groupedB + xlab("Factors")
groupedB + theme(axis.text.x = element_text(angle = 17, hjust = 1, size = 7))

For the bottom 5 countries social support and healthy life expectancy seem to contribute more than other factors.

Next, let’s see the differences in contribution of various factors to happiness of the top 5 and bottom 5 countries and compare them to the average score

average <- data2019[, -c(1, 2, 3)]

library(dplyr)

ave <- average %>% summarise_if(is.numeric, mean)     #to find the mean

gatheredave <- gather(ave, Factors, Score)
gathereddataT <- gather(top, Factors, Score, -Country.or.region)
gathereddataB <- gather(bottom, Factors, Score, -Country.or.region)

avT <- ggplot(gathereddataT, aes(x = Factors, y = Score)) +
  geom_jitter(size = 3, shape = 21,
              aes(fill = "Apoints")) +
  geom_jitter(data=gathereddataB,size = 3, shape = 21, 
              aes(fill = "Bpoints")) +
  geom_jitter(data = gatheredave, size = 3, shape = 21, 
              aes(fill = "Cpoints")) +
  scale_fill_manual(name = "", 
                    labels = c("Apoints" = "Top5",
                              "Bpoints" = "Bottom5",
                              "Cpoints" = "Average"),
                    values =c("Apoints" = "grey46",
                              "Bpoints" = "steelblue2",
                              "Cpoints" = "red")) +
  
  facet_wrap( ~ Factors, scale="free_x") +
  theme(strip.text.x = element_blank()) +
  ggtitle("Top 5 and bottom 5 countries")

print(avT)

As can be seen, GDP per capita, healthy life expectancy and social support contribute more to happiness of the top 5 countries compare to the average country.

Finally, let’s see the contribution of these factors to happiness across all 156 countries.

all <- data2019[, -c(1, 3)]

gathereddatal <- gather(all, Factors, Score, -Country.or.region)

all <- ggplot(gathereddatal, 
              aes(x = Factors, y = Score)) +
       geom_jitter(size = 0.8, shape = 19, colour = "dodgerblue4") + 
       facet_wrap( ~ Factors, scale="free_x") + 
       ggtitle("Contribution of factors") +
       theme(strip.text.x = element_blank()) 
  
all <- all + xlab("Factors")
print(all)

Here it is clearly visible that freedom to make life choices, generosity and perception of corruption contribute the least to happiness across all the countries.

4. Summary

To conclude, there is a quite big difference in the level of happiness of the top and bottom countries. However, there are some similaritites in the factors that contribute the most to happiness across the wolrd, such as social support, GDP per capita. For futher analysis it might be useful to analyze the correlation between the level of happiness and each factor.