A Lesson in How NOT to Draw Incidence Maps

I’m sure most of you know about the Ebola outbreak that is occurring in West Africa, particularly at the tripoint of Sierra Leone, Guinea, and Liberia. I was alerted to an article in the New York Times recently which had a sort of incidence map of the areas in which there were confirmed and suspected cases. The article is here and I will post the picture below in case you cannot access the NYTimes (The labels were removed during linking).

NYTimes Map of the Ebola Outbreak without Labels

The top country is Guinea, the bottom is Liberia, and the middle is Sierra Leone. The three dots along the coast represent their respective capitals: Conakry, Monrovia, and Freetown. The deep orange areas represent where confirmed cases are, while light orange areas are where suspected cases are. Pale areas have neither. I just want to show how this is a terrible map which relays no useful information concerning the current epidemic. I will focus on the Sierra Leone portion of the map since I know the most about that area.

If we look at the map at first glance, it seems as if 90% of Sierra Leone by area has been struck with the Ebola virus. This is not a bad guess; according to the map it is in fact just under 89.5%. It is clearly not the case that the outbreak has spread so widely in Sierra Leone. Let’s take out the areas with suspected cases. That leaves out approximately 47.3% of the area of Sierra Leone. Now, only 41.2% of Sierra Leone by area is being affected by the outbreak. Better, but still way too high. Even with the removal of the areas with suspected cases, there is still a more fundamental problem with this map.

So what’s actually going on? Well, this map only shows four distinct regions of Sierra Leone; there are in fact twelve districts. The NYTimes actually combined several districts to create larger regions (this is why I am able to get area percentages to the tenth of a percent; I used Wikipedia to obtain precise land area measurements). I have named the four regions as such: Freetown/Northwestern Area, Central, South, and East; I hope it is self-evident which names correspond to which region on the map. Listed below are the districts that make up the NYTimes’s regions:

Freetown/Northwest — Kambia, Port Loko, Western Area Rural, Western Area Urban

Central — Koinadugu, Bombali, Moyamba, Tonknolili

South — Bonthe, Pujehun

East — Bo, Kono, Kenema, Kailahun

There are two NYTimes regions which have confirmed cases: Northwestern and East. I will get to the East, but let’s first take a deeper look at the Northwestern.

The part of the Northwestern region in which the suspected case occurred was in Freetown, where the dot is, on the tip of the peninsula. This case seemingly represents the entire Northwestern area, when in fact it should only represent the Western Area Urban. The Western Area Urban is only 13 sqkm, 0.018% of the total land area of Sierra Leone. That is less than one-five thousandth the area of Sierra Leone. Instead, according to the NYTimes, that single case represents a bit more than 12.3% of the area of Sierra Leone. If use districts instead of the NYTimes’s regions to calculate area, we are now down to 28.9% of the area of Sierra Leone affected by the epidemic. With greater refinement in the East, would surely cut down the percentage even more. And this shows the two major problems with this map. Firstly, the boundaries are random with respect to the Ebola epidemic. A single case in a tiny area of the country can show up as a problem that is occurring on a wide geographical scale. Secondly, it’s only shows the presence/absence of a confirmed or suspected case. Either it’s there or it isn’t. There could be only one suspected case of Ebola in the Central region which is accounting for that entire area. This presence/absence feature removes any information about how many cases are actually within an area which would allow for a better understanding of severity of the outbreak. The two problems combined on a single map make the outbreak is seem even worse than it is.

So what is this map actually telling us? By removing the areas of suspected cases and focusing on the regions where there are confirmed cases, we see that there are only five regions affected by the Ebola outbreak: the one at Freetown, the one on the coast of Guinea, and three more. Those last three regions which give the appearance of cases reaching deep into Guinea (all the way to the Mali border) and Liberia (all the way to the coast) are in face those regions that border the tripoint, where the center of the outbreak is located. So altogether, what the map is really saying that there have been confirmed cases of Ebola around the tripoint of the three countries, in Freetown, and along parts of the Guinea coast, readily available information on the internet. Essentially, it told us nothing that one couldn’t have known by reading an article, and instead trumped up the severity of the problem.

What would have been a better map? Well, a heat map. Heat maps are created by diving the area in which one is interested into small, uniform blocks and using color to represent the number of cases that have appeared within that block. This has the benefit of showing where the actual cases are and how they are truly distributed. Modifications can be made to the heat map which could add information, like showing cases per population within each block or adding a time component. Such a map would be a much more accurate and informative map than the one they put up.

I am not unsympathetic to the challenges of presenting interesting information with maps. I have struggled in my research with low quality and/or missing information that has lead to terrible maps which often give false impressions of what is going on. Certainly, it would be better to use the location where the case was contracted instead of where the deaths occurred, even though the former information is much harder to know (even for the health professionals on the ground there). But the way the New York Times decided to present the information they had, by combining regions and using presence/absence of cases (confirmed or suspected), only exaggerates the scale of an already terrible problem, one that needs the best information and not false scares to be effectively addressed.


