Archives de
Auteur : Francois Keck

Introducing editheme: Palettes and graphics matching your RStudio editor

Introducing editheme: Palettes and graphics matching your RStudio editor

I use RStudio every day at work. And every day my eyes say thank you to the developers for implementing themes support for the editor, and more recently a complete dark skin for the GUI. Dark themes are not only important for your health and the health of the planet. They are also absolutely essential to look cool in the office and make your colleagues believe you’re working on very complicated projects. 😀

However, one little thing could dispel the illusion in a second: an ugly, flashy graphic with a white background in your plot pane.

To solve this serious issue of my daily life I wrote editheme, a tiny package providing color palettes and functions to create graphics matching the RStudio editor. The following animated screenshot is worth a thousand words.

Read more about the package on its Github page.

Facebooktwitter
Exploring the CRAN social network

Exploring the CRAN social network

A few months ago, I published a post where I was trying to map the dependencies relationships between R packages. Today I want to do something similar with package contributors. My idea is to reconstruct a social graph where each node would be a person (presumably a developer), and two persons would be connected by an edge if they have collaborated on the same package. Thus I would be able to explore the CRAN social network!

This post is also a bit special for me. It’s the very first time I’m using dplyr and the tidyverse. I used to write my code in base R but the amazing work of Thomas Lin Pedersen around tiygraph and ggraph convinced me to take the plunge. And it was a lot of fun!

First, we load the packages and the data. In the first version of the project I used the XML file of each package that I retrieved from CRAN using wget and then read and parsed with xml2. Since R 3.4.0, things are much, much simpler with the function CRAN_package_db.

library(tidyverse)
library(stringr)
library(igraph)
library(tidygraph)
library(ggraph)
library(magrittr)

pdb <- tools::CRAN_package_db()
aut <- pdb$Author

That’s all we need to get the full list of contributors. But it comes in a very messy way. R provides a structured system to describe persons which should be used in the Authors@R field of the DESCRIPTION file, but most people use a simple character string.

The big cleanup

I combined a series of regular expressions and string manipulations to extract every name. It was the first time I was using stringr and I have to say, I don’t regret gsub, grepl, etc. I just miss the support for some PCRE functionalities…

aut <- aut %>%
  str_replace_all("\\(([^)]+)\\)", "") %>%
  str_replace_all("\\[([^]]+)\\]", "") %>%
  str_replace_all("<([^>]+)>", "") %>%
  str_replace_all("\n", " ") %>%
  str_replace_all("[Cc]ontribution.* from|[Cc]ontribution.* by|[Cc]ontributors", " ") %>%
  str_replace_all("\\(|\\)|\\[|\\]", " ") %>%
  iconv(to = "ASCII//TRANSLIT") %>%
  str_replace_all("'$|^'", "") %>%
  gsub("([A-Z])([A-Z]{1,})", "\\1\\L\\2", ., perl = TRUE) %>%
  gsub("\\b([A-Z]{1}) \\b", "\\1\\. ", .) %>%
  map(str_split, ",|;|&| \\. |--|(?<=[a-z])\\.| [Aa]nd | [Ww]ith | [Bb]y ", simplify = TRUE) %>%
  map(str_replace_all, "[[:space:]]+", " ") %>%
  map(str_replace_all, " $|^ | \\.", "") %>%
  map(function(x) x[str_length(x) != 0]) %>%
  set_names(pdb$Package) %>%
  extract(map_lgl(., function(x) length(x) > 1))

aut_list <- aut %>%
  unlist() %>%
  dplyr::as_data_frame() %>%
  count(value) %>% 
  rename(Name = value, Package = n) 

At this point we have a list of 13668 (unique) names, and this number is increasing almost every day. The R community is huge! Of course there are still some errors. And we can find surprising contributors:

aut_list$Name[4922]
# [1] "Her Majesty the Queen in Right of Canada"

The Social Network

The next step is to produce a two-column matrix describing all the connections of the network (edge list). An edge list can be turned into a graph object with the igraph package. Finally we can convert the igraph graph into a tidy graph so we can use the API provided by the tidygraph package. For example to filter nodes/edges or select only the main component.

edge_list <- aut %>%
  map(combn, m = 2) %>%
  do.call("cbind", .) %>%
  t() %>%
  dplyr::as_data_frame() %>%
  arrange(V1, V2) %>%
  count(V1, V2)

g <- edge_list %>%
  select(V1, V2) %>%
  as.matrix() %>%
  graph.edgelist(directed = FALSE) %>% 
  as_tbl_graph() %>%
  activate("edges") %>%
  mutate(Weight = edge_list$n) %>%
  activate("nodes") %>%
  rename(Name = name) %>% 
  mutate(Component = group_components()) %>%
  filter(Component == names(table(Component))[which.max(table(Component))])

The author of tidygraph, Thomas, also developed the ggraph package, an implementation of the grammar of graphics for relational data. Using ggraph, it is very easy to visualize our graph.

ggraph(g, layout = 'lgl') +
  geom_edge_fan(alpha = 0.1) +
  theme_graph()

R cran network

This is it! This hairy ink stain is the main connected component of the CRAN social network, representing collaborations between 6758 developers!

We can look for who’s the « most important » node in the graph according to the Google’s PageRank algorithm, although it’s not straightforward what could mean a high PR here…

g %>%
  mutate(Page_rank = centrality_pagerank()) %>% 
  top_n(10, Page_rank) %>% 
  as_tibble() %>% 
  ggplot() +
    geom_col(aes(forcats::fct_reorder(Name, Page_rank), Page_rank)) +
    coord_flip() + theme_minimal() +
    labs(title = "Top 10 contributors based on Page Rank", x = "")

Barplot_top10_pagerank

Focus on the core network

The complete graph makes a nice painting to hang on the wall of your living room. But hard to say anything about it. So, we will reduce the data and focus on the contributors involved in more than 4 packages.

g <- g %>% 
  left_join(aut_list) %>% 
  filter(Package > 4) %>% 
  mutate(Component = group_components()) %>%
  filter(Component == names(table(Component))[which.max(table(Component))])

ggraph(g, layout = 'lgl') +
  geom_edge_fan(alpha = 0.1) +
  theme_graph()

CRAN network zoom

Nice! It seems like there are two distinct clusters in the middle. Let’s see if we can identify them using a community detection algorithm.

g <- mutate(g, Community = group_edge_betweenness(),
            Degree = centrality_degree())

filter(g, Community == names(sort(table(Community), decr = TRUE))[1]) %>%
  select(Name, Package) %>%
  arrange(desc(Package)) %>%
  top_n(10, Package) %>%
  as_tibble() %>% 
  knitr::kable(format = "html", caption = "Cluster 1")
filter(g, Community == names(sort(table(Community), decr = TRUE))[2]) %>%
  select(Name, Package) %>%
  arrange(desc(Package)) %>%
  top_n(10, Package) %>%
  as_tibble() %>% 
  knitr::kable(format = "html", caption = "Cluster 2")
Cluster 1
Name Package
Kurt Hornik 56
Martin Maechler 46
Achim Zeileis 44
Dirk Eddelbuettel 41
Romain Francois 27
Ben Bolker 25
Brian Ripley 25
Torsten Hothorn 24
Roger Bivand 23
Douglas Bates 23
Cluster 2
Name Package
Hadley Wickham 95
Rstudio 92
Scott Chamberlain 44
Inc 36
Yihui Xie 34
Jeroen Ooms 33
Jj Allaire 29
R. Core Team 27
Bob Rudis 26
Gabor Csardi 22
Oliver Keyes 22
Winston Chang 22

If you know a bit R and its community, you will certainly recognize some names. The challenge is to interpret this classification. It is not a good thing to put people in boxes, but this is mandatory to make nice and colorful visualizations! So, I give it a try. For me, the first group include people of the early days who contributed to major packages which constitute the heart of R. The second group is more related to the second generation, associated with Rstudio products and who have been particularly prolific in the last years. Of course the two groups are strongly connected.

How can we label these two groups? After long consideration, I chose to go for « The Ancients » and « The Moderns », without any value judgment! This is a reference to an old debate in French literature. This dubious comparison would certainly be more appropriate to designate the ongoing debate base vs. tidyverse, but… not today!

Let’s see how we can visualize these two groups.

g <- g %>% 
  mutate(Community = case_when(Community == names(sort(table(Community),
                                                       decr = TRUE))[1] ~ "The Ancients",
                               Community == names(sort(table(Community),
                                                       decr = TRUE))[2] ~ "The Moderns",
                               Community %in% names(sort(table(Community),
                                                       decr = TRUE))[-1:-2] ~ "Unclassified")) %>% 
  mutate(Community = factor(Community))

g <- g %>%
  filter(Degree > 5) %>% 
  mutate(Degree = centrality_degree())

ggraph(g, layout = 'lgl') +
  geom_edge_fan(alpha = 0.1) +
  geom_node_point(aes(color = Community, size = Package)) +
  theme_graph() +
  scale_color_manual(breaks = c("The Ancients", "The Moderns"),
                     values=c("#F8766D", "#00BFC4", "#969696"))

CRAN network communities

ggraph really does a great job! I really like the result.

BTW, the R community is not really divided…

g %>% activate("edges") %>% 
  mutate(Community_from = .N()$Community[from],
         Community_to = .N()$Community[to]) %>% 
  filter((Community_from == "The Ancients" & Community_to == "The Moderns")|
           Community_from == "The Moderns" & Community_to == "The Ancients") %>% 
  as_tibble() %>% 
  nrow()

… there are 254 edges connecting the the Ancients and the Moderns! But I think this is a nice illustration of how social graphs keeps the imprint of history and past events. Next step would be to look at the graph dynamics over time. Maybe an idea for a future post…

Facebooktwitter
Changes in number of authors in ecology journals over time

Changes in number of authors in ecology journals over time

In a post published on Dynamic Ecology last week, Meghan Duffy presented a graph showing the number of authors of articles published in Ecology for the period 1956-2016. [SPOILER: it increases!]. She also noted that someone probably did something similar for the whole field of ecology but couldn’t find it. Neither could I.

I was thinking that it would take a lifetime to collect manually these data (even if I love to spend my time in the dirt of the lab library). But luckily I remembered that I kept a nice dataset of publication records for a good number of journals in ecology. Few years ago, my friend Ivan Mateus and I, scrapped these data to test some scientometric hypotheses. It took several months for us to collect these data, so I’m glad I have the opportunity to use it.

I was in a rush (at the airport on my way to the DNAqua-net kick-off conference) but the R code written 4 years ago to compile the data just worked perfectly! And I could quickly come up with a nice graph. I made some technical choices here: 1/ I deleted some outliers (some rare articles are featuring more than 100 authors!). 2/ I didn’t use a boxplot as Meghan did, because I had data for every year but also because the distribution becomes very skewed with time. 3/ I used a color scale to represent the number of papers, so points act like bins. I could use hexagonal bins to get a continuous surface, but I liked it with points. It just requires a manual tuning for the dot size.

The graph shows the number of authors of articles for 285951 articles published in 113 journals in ecology. It covers the period 1932-2012, but I suspect that data for many journals are missing before 1975. This can explain the sudden change in dots color.

In conclusion, I think this graph does a good job of showing both the increase in the number of paper published every year in our field and the increase in the number of authors per paper. I was expecting this trend but I was really surprised by its strength. And it looks like it’s not going to stop now.

Facebooktwitter
Movies and Series subtitles in R with subtools

Movies and Series subtitles in R with subtools

Every time I download subtitles for a movie or a series episode, I cannot help thinking about all this text that could be analyzed with text mining methods. As my PhD came to its end in April and I started a postdoc in September, I could use some time of my looong summer break to work on a new R package, subtools which aims to provide a toolbox to read, manipulate and write subtitles files in R.

In this post I will present briefly the functions of subtools. For more details you can check the documentation.

Installation

The package is available on GitHub.
You can install it using the devtools package:

devtools::install_github("fkeck/subtools")

Read subtitles

A subtitles file can be read directly from R with the function read.subtitles. Currently, four formats of subtitles are supported: SubRip, (Advanced) Substation Alpha, MicroDVD and SubViewer. The parsers are probably not optimal but they seem to do the job as expected, at least with valid files. The package also provides some wrappers to import whole directories of series subtitles (see read.subtitles.season, read.subtitles.serie and read.subtitles.multiseries)

Manipulate subtitles

The subtools package stores imported subtitles as simple S3 objects of class Subtitles. They are lists with two main components : subtitles (IDs, timecodes and texts) and optional meta-data. Multiple Subtitles objects can be stored as a list of class MultiSubtitles. The subtools package provides functions to easily manipulate Subtitles and MultiSubtitles objects.

Basically, you can :

  • combine subtitles objects with combineSubs
  • extract parts of subtitles with [.Subtitles
  • clean subtitles content with cleanTags, cleanCaptions and cleanPatterns
  • reorganize subtitles as sentences with sentencify

Extract/convert/write subtitles

Although you can conduct statistical analyses on subtitles objects, the package subtools is not designed for text mining. The following functions allow you to convert subtitles to other classes/format to analyze them. You can:

  • extract text content as a simple character string with rawText
  • convert subtitles and meta-data to a virtual corpus with tmCorpus if you want to work with the standard text mining framework tm.
  • convert subtitles and meta-data to a data.frame with subDataFrame. If you want to use tidy data principles with tidytext and dplyr, you should probably start here.
  • finally, it’s also possible to write subtitles objects to a file with write.subtitles. Though, it is unclear to me if there is any sense in doing that 😀

Application: Game of Thrones subtitles wordcloud

To illustrate how subtools can be used to get started with a subtitles analysis project, I propose to create a wordcloud showing the most frequent words in the popular TV series Game of Thrones. We will use subtools to import the subtitles, the tm and SnowballC packages to pre-process the data and finally wordcloud to generate the cloud. I will not provide the data I use here, because subtitles files are in a grey zone concerning licensing. But no worries, it’s pretty easy to find subtitles on Internet. 😉

library(subtools)
library(tm)
library(SnowballC)
library(wordcloud)

Because the subtitles are correctly organized in directories, we import them in one command line using the function read.subtitles.serie. The nice thing is that this function will try to automatically extract basic meta-data (like series title, season number and episode number) from directories/files name.

a <- read.subtitles.serie(dir = "/subs/Game of Thrones/")

a is a MultiSubtitles object with 60 Subtitles elements (episodes). We can convert it directly to a tm corpus using tmCorpus. Note that meta-data are preserved.

c <- tmCorpus(a)

And then, we can prepare the data:

c <- tm_map(c, content_transformer(tolower))
c <- tm_map(c, removePunctuation)
c <- tm_map(c, removeNumbers)
c <- tm_map(c, removeWords, stopwords("english"))
c <- tm_map(c, stripWhitespace)

Compute a term-document matrix and aggregate counts by season:

TDM <- TermDocumentMatrix(c)
TDM <- as.matrix(TDM)
vec.season <- rep(1:6, each = 10)
TDM.season <- t(apply(TDM, 1, function(x) tapply(x, vec.season, sum)))
colnames(TDM.season) <- paste("S", 1:6)

And finally plot the cloud!

set.seed(100)
comparison.cloud(TDM.season, title.size = 1, max.words = 100, random.order = T)
Subtitles wordcloud of the six seasons of Game of Thrones.
Subtitles wordcloud of the six seasons of Game of Thrones.

Few words about this plot. Like every wordcloud, I think it’s a very simple and limited descriptive way to represent the information. However, I like it. The people who have watched the TV show will look at it and say « Oh of course! ». In one hundred word, the cloud is not revealing the scenario of GoT, but for each season I can see one or two critical events popping out (wedding, kill/joffrey, queen/shame, hodor/hold/door).
What I find funny here (perhaps interesting?), is that these very important and emotional moments are supported in the dialogs by the repetition of one or two keywords. I don’t know if this is exclusive to GoT, or if it’s a trick of my mind, or something else. And I will not make any hypothesis, I’m not a linguist. But this is the first idea which came to my mind and I wanted to write it down.

Now, there are plenty of text-mining ideas and hypotheses and methods that can be tested with movies and series subtitles. So have fun 🙂

Facebooktwitter
Visualizing the R packages galaxy

Visualizing the R packages galaxy

The idea of this post is to create a kind of map of the R ecosystem showing dependencies relationships between packages. This is actually quite simple to quickly generate a graph object like that, since all the data we need can be obtained with one call to the function available.packages(). There are a few additional steps to clean the data and arrange them, but with the following code everyone can generate a graph of dependencies in less than 5s.

First, we load the igraph package and we retrieve the data from our current repository.

library(igraph)
dat <- available.packages()

For each package, we produce a character string with all its dependencies (Imports and Depends fields) separated with commas.

dat.imports <- paste(dat[, "Imports"], dat[, "Depends"], sep = ", ")
dat.imports <- as.list(dat.imports)
dat.imports <- lapply(dat.imports, function(x) gsub("\\(([^\\)]+)\\)", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub("\n", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub(" ", "", x))

Next step, we split the strings and we use the stack function to get the complete list of edges of the graph.

dat.imports <- sapply(dat.imports, function(x) strsplit(x, split = ","))
dat.imports <- lapply(dat.imports, function(x) x[!x %in% c("NA", "R")])
names(dat.imports) <- rownames(dat)
dat.imports <- stack(dat.imports)
dat.imports <- as.matrix(dat.imports)
dat.imports <- dat.imports[-which(dat.imports[, 1]==""), ]

Finally we create the graph with the list of edges. Here, I select the largest connected component because there are many isolated vertices which will make the graph harder to represent.

g <- graph.edgelist(dat.imports)
g <- decompose.graph(g)
g <- g[[which(sapply(g, vcount) == max(sapply(g, vcount)))]]

Now that we have the graph we can compute many graph-theory related statistics but here I just want to visualize these data. Plotting a large graph like that with igraph is possible but slow and tedious. I prefer to export the graph and open it in Gephi, a free and open-source software for network visualization.
The figure below was created with Gephi. Nodes are sized according to their number of incoming edges. This is pretty useless and uninformative, but still I like it. This looks like a sky map and it is great to feel that we, users and developers are part of this huge R galaxy.

R packages network
Click on the picture to see the HD version
Facebooktwitter
rleafmap: R Markdown in interactive popups

rleafmap: R Markdown in interactive popups

This is the second « big » feature coming with branch 0.2 of rleafmap (now on CRAN!). With this new version you can write popups content in R Markdown which will be processed when you generate the map. This can be useful to format popups using markdown syntax (if you need more control remind that popups can also be formatted with html tags). More interesting with R Markdown is the possibility to include outputs of R code chunks. Thus, results, simulations and plots can be easily embedded within popups content.

To activate R Markdown for a layer you just have to pass the R Markdown code to the popup argument and set popup.rmd to TRUE.

The chunkerize function

You can write your R code chunks manually but you can also use the function chunkerize which tries to make your life simpler. This function has two purposes:

  • It turns a function and its arguments into an R code chunk.
  • It can decompose the elements of a list and use them as values for a given argument.

The latter may be useful when you have different features on your map and you want to execute the same function for each of them but with different data as input. This can be done by tagging the list containing the data with a star character (*). See the following example:

Data you need for the example: download and unzip
First we load the packages we need:

library(rleafmap)
library(maptools)

Then we load the data: a shapefile of regions and a dataframe giving the evolution of the population for each region.

reg <- readShapePoly("regions-20140306-100m")
reg <- reg[c(-11,-12,-16,-19,-20),]
pop <- read.csv("population.txt", sep = "\t", row.names = 1)
pop <- pop[rev(rownames(pop)), sort(names(pop))]

We prepare some colors for the map and a legend:

gcol <- rev(heat.colors(5))
gcut <- cut(as.numeric(pop["2012", ]),
            breaks = c(0, 2000000, 3000000, 4000000, 8000000, 12000000))
reg.col <- gcol[as.numeric(gcut)]
reg.leg <- layerLegend(style = "polygons",
                       title = "Population",
                       labels = levels(gcut),
                       fill.col = gcol)

We prepare the data for chunkerize: pop is already a list, since it is a dataframe but for clarity we turn it into a simple list.

Year <- as.numeric(rownames(pop))
L <- as.list(pop)
L2 <- as.list(names(L))

Now is the trick. We create a chunk based on a plot function. We provide 5 arguments (names and values). Each arg.values is going to be recycled, except L and L2 which are tagged with a star. In that case, each element of these lists is going to be used as values.

popup <- chunkerize(FUN = plot,
                    arg.names = c("x", "y", "type", "ylab", "main"),
                    arg.values = c("Year", "*L", "'b'", "'Population'", "*L2"))

Now we just have to create the layers and compile the map:

cdbpos.bm <- basemap("cartodb.positron.nolab")
reg.map <- spLayer(reg,
                   fill.col = reg.col,
                   legend = reg.leg,
                   popup = popup,
                   popup.rmd = TRUE)
writeMap(cdbpos.bm, reg.map)

Et voilà ! There is a problem with Rstudio and Firefox. It happens that popups do not appear where they should on the first click. It works fine on Chromium.

Facebooktwitter
Rummaging through dusty books: Maucha diagrams in R

Rummaging through dusty books: Maucha diagrams in R

Do you know the Maucha diagram? If you are not an Hungarian limnologist, probably not! This diagram was proposed by Rezso Maucha in 1932 as a way to vizualise the relative ionic composition of water samples. However, as far I know this diagram had few success in the community. I never heard about it until my coworker Kalman (who is also Hungarian) asked me if I knew how to plot it in R.

First, I have to admit I was a bit skeptical… But finally, we decided it could be an interesting and funny programming exercise.

We found instructions to draw the diagram in Broch and Yake (1969) [1] but rapidly we were interested to find the original paper of Maucha [2]. This paper is apparently not available on-line, and we could only find a hard copy in the University of Grenoble (2 hours driving). Nonetheless, we had a look in the library of the lab and… miracle! We found it, between two old dusty books, probably waiting for decades!

The famous book of Rezso Maucha !
The famous book of Rezso Maucha !

Meticulously following the instructions of Maucha, we could write a function to draw the diagram. Then we added some additional options : colors, labels and the possibility to draw multiple diagrams from a matrix. Finally we put the code in a package (hosted on Github) with the dataset included in the original publication.

To install the package, install devtools from your CRAN repo and run:

devtools::install_github("fkeck/oviz")

Then you can load the dataset used by Maucha [2] to introduce his diagram:

data(ionwaters)

And then you can use the function maucha which will plot one diagram for each line of the matrix.

maucha(ionwaters)

maucha_demo

Here we are. And if you are interested in ionic composition of waters, stay tuned, we are planning to add some stuff like stiff diagram and piper diagram.

[1] Broch, E. S., & Yake, W. (1969). A modification of Maucha’s ionic diagram to include ionic concentrations. Limnology and Oceanography, 14(6), 933-935.
[2] Maucha, R. (1932). Hydrochemische Methoden in der Limnologie. Binnengewasser, 12. 173p.

Facebooktwitter
Fully customizable legends for rleafmap

Fully customizable legends for rleafmap

This is a functionality I wanted to add for some time… and finally it’s here! I just pushed on GitHub a new version of rleafmap which brings the possibility to attach legends to data layers. You simply need to create a legend object with the function layerLegend and then to pass this object when you create your data layer via the legend argument. Thus, a map can contain different legends, each of them being independent. This is cool because it means that when you mask a data layer with the layer control, the legend will also disappear.

You can create legends with five different styles to better suit your data: points, lines, polygons, icons and gradient (see the graphic).

legends_rleafmap

Legends for John Snow’s cholera map

I give as example a new version of the cholera map.
Here is the code:

devtools::install_github("fkeck/rleafmap")
devtools::install_github("Hackout2/epimap")
library(rleafmap)
library(epimap)

data(cholera)
n.chol <- ifelse(cholera$deaths$Count > 4, 6, cholera$deaths$Count) + 2

# Basemap layer
cdbpos.bm <- basemap("cartodb.positron.nolab")

# Legends
death.leg <- layerLegend(style = "points", title = "Deaths",
                         labels = c("1", "2", "3", "4", "> 4"), size = c(3, 4, 5, 6, 8),
                         fill.col = "red", fill.alpha = 0.9)
pumps.leg <- layerLegend(style = "icons", title = NA, labels = "Water pump",
                         png = "/home/francois/water.png", png.width = 31, png.height = 31)

# Data layers
death.points <- spLayer(cholera$deaths, legend = death.leg,
                        size = n.chol, fill.col =  "red", fill.alpha = 0.9)
pumps.points <- spLayer(cholera$pumps, legend = pumps.leg,
                        png = "/home/francois/water.png",
                        png.width=31, png.height=31)

my.ui <- ui(layers = "topright")

writeMap(cdbpos.bm, pumps.points, death.points, interface = my.ui,
         setView = c(51.5135, -0.137), setZoom = 17)

And here the result. Enjoy and check out the legend interactivity when you play with the layer selector 😉

Facebooktwitter
Juste un petit souvenir

Juste un petit souvenir

Revoir Andrea avec les autres, début juillet à la SEFS, m’a rappelé une mémorable soirée au château ! Andrea nous avait ramené une petite collection de bières de son Allemagne natale et nous avions organisé une séance de dégustation et notation…

beer_exp

Réalisé avec R, ade4 et… je reconnais, Inkscape pour la post-production.

Facebooktwitter
Contours and Networks with epimap and rleafmap

Contours and Networks with epimap and rleafmap

In February, I participated in a hackaton organized by Thibaut Jombart at Imperial College, London, to work on visualization tools for outbreak data. This was a great time spent with great people! Thanks again, Thibaut, for organizing. I took part in the development of epimap, an R package for statistical mapping. The aim of epimap is to provide tools to quickly and efficiently visualize spatial data. There is a set of functions designed to do that and you can check out the Github page for a demo.

This package also provides two functions to coerce complex objects to Spatial classes so that they can be easily included in a map.

  • The contour2sp function takes a SpatialGrid object and returns contour lines as a SpatialLinesDataFrame.
  • The graph2sp function takes a graph (from igraph package) with geolocated vertices and returns a list of Spatial objects (points and lines).

Following this post of Arthur Charpentier (who nicely plays with rleafmap!), I decided to include the John Snow’s Cholera dataset in epimap so it can be simply used for tests.

In this post I want to show how epimap and rleafmap can be combined to create fully customizable interactive maps with complex objects. The cholera dataset gives the locations of cholera deaths and the locations of water pumps in London. The maps will show the location of cholera deaths with points, the local density of deaths with colored contour lines and the location of water pumps with icons. Moreover, the pumps will be represented within a network where two pumps are connected if there are close enough.

library(rleafmap)
library(epimap)

data(cholera)

# Create a network of pumps
pump.adj <- as.matrix(dist(sp::coordinates(cholera$pumps)))
pump.graph <- graph.adjacency(pump.adj < 0.003, diag = FALSE)
V(pump.graph)$lat <- coordinates(cholera$pumps)[, 2]
V(pump.graph)$lon <- coordinates(cholera$pumps)[, 1]

# Convert death density SpatialGrid to contour SpatialLines
death.cont <- contour2sp(cholera$deaths.den, nlevels = 10)

# Basemap layer
cdbdark.bm <- basemap("cartodb.darkmatter.nolab")

# Data layers
death.points <- spLayer(cholera$deaths,
                        size = 1,
                        fill.col =  "white",
                        fill.alpha = 0.5,
                        stroke = FALSE)
death.contour <- spLayer(death.cont,
                         stroke.col = heat.colors(12)[cut(death.cont$level, 12)],
                         stroke.lwd = 1.5,
                         stroke.alpha = 1)
pumps.points <- spLayer(graph2sp(pump.graph)[[1]],
                        png = "/home/francois/water.png",
                        png.width=31 ,
                        png.height=31)
pumps.links <- spLayer(graph2sp(pump.graph)[[2]],
                       stroke.lwd = 3,
                       stroke.col = "white")

my.ui <- ui(layers = "topright")

writeMap(cdbdark.bm, death.points, death.contour,
         pumps.links, pumps.points, interface = my.ui)

And here is the map we get:

Facebooktwitter