Chapter 4 Missing values
There are 152616 rows in the dataset, and the following outputs are the number of rows missing for the specific variable. For the full explanations of each variable, please refer to the codebook.
## gwnob gwnoa year side_b
## 151812 35321 0 0
## active_year side_a priogrid_gid conflict_new_id
## 0 0 0 0
## country_id region source_article type_of_violence
## 0 0 0 0
## date_prec source_headline conflict_name dyad_name
## 0 0 0 0
## event_clarity where_coordinates date_end side_b_new_id
## 0 0 0 0
## dyad_new_id high source_office deaths_civilians
## 0 0 0 0
## best number_of_sources source_date longitude
## 0 0 0 0
## where_prec side_a_new_id id source_original
## 0 0 0 0
## adm_1 country adm_2 latitude
## 0 0 0 0
## geom_wkt date_start deaths_a deaths_b
## 0 0 0 0
## deaths_unknown low geometry
## 0 0 0
There are only 2 variables that contains a lot of missing variables, gwnob and gwnoa. gwnoa is The Gleditsch and Ward number for Side A if the side is a state. If side A is not a state, then gwnoa will be missing. Similarly, gwnob is The Gleditsch and Ward number for Side B. It will be missing if side B is not a state. The reason why gwnob has more missing values is that in one-sided violence, side B will be recorded as “civilians” in this dataset, which is definitely not a state. Meanwhile, there are 152616 rows in the dataset. The number of missing values in gwnob column is 151812. gwnob of most rows are missing.
However, for row patterns, we already know that, in each row, if side A is not a state, gwnoa will be missing and if side B is not a state, gwnob will be missing. From the graph, we can see that most rows has at least one missing value, which correspondes to our analysis for column patterns that gwnob of most rows are missing.
As mentioned in chapter 2 “Data sources,” we are most interested in certain sets of variables, gwnoa and gwnob are not considered as “very important” in our project. Therefore, we would not worried much about missing gwnoa and gwnob values in the dataset. There are no missing values for the variables we are most interested in, which are described in chapter 2.