Chapter 4 Missing values

There are 152616 rows in the dataset, and the following outputs are the number of rows missing for the specific variable. For the full explanations of each variable, please refer to the codebook.

##             gwnob             gwnoa              year            side_b 
##            151812             35321                 0                 0 
##       active_year            side_a      priogrid_gid   conflict_new_id 
##                 0                 0                 0                 0 
##        country_id            region    source_article  type_of_violence 
##                 0                 0                 0                 0 
##         date_prec   source_headline     conflict_name         dyad_name 
##                 0                 0                 0                 0 
##     event_clarity where_coordinates          date_end     side_b_new_id 
##                 0                 0                 0                 0 
##       dyad_new_id              high     source_office  deaths_civilians 
##                 0                 0                 0                 0 
##              best number_of_sources       source_date         longitude 
##                 0                 0                 0                 0 
##        where_prec     side_a_new_id                id   source_original 
##                 0                 0                 0                 0 
##             adm_1           country             adm_2          latitude 
##                 0                 0                 0                 0 
##          geom_wkt        date_start          deaths_a          deaths_b 
##                 0                 0                 0                 0 
##    deaths_unknown               low          geometry 
##                 0                 0                 0

There are only 2 variables that contains a lot of missing variables, gwnob and gwnoa. gwnoa is The Gleditsch and Ward number for Side A if the side is a state. If side A is not a state, then gwnoa will be missing. Similarly, gwnob is The Gleditsch and Ward number for Side B. It will be missing if side B is not a state. The reason why gwnob has more missing values is that in one-sided violence, side B will be recorded as “civilians” in this dataset, which is definitely not a state. Meanwhile, there are 152616 rows in the dataset. The number of missing values in gwnob column is 151812. gwnob of most rows are missing.

However, for row patterns, we already know that, in each row, if side A is not a state, gwnoa will be missing and if side B is not a state, gwnob will be missing. From the graph, we can see that most rows has at least one missing value, which correspondes to our analysis for column patterns that gwnob of most rows are missing.

As mentioned in chapter 2 “Data sources,” we are most interested in certain sets of variables, gwnoa and gwnob are not considered as “very important” in our project. Therefore, we would not worried much about missing gwnoa and gwnob values in the dataset. There are no missing values for the variables we are most interested in, which are described in chapter 2.