Visualizing the R packages galaxy

The idea of this post is to create a kind of map of the R ecosystem showing dependencies relationships between packages. This is actually quite simple to quickly generate a graph object like that, since all the data we need can be obtained with one call to the function available.packages(). There are a few additional steps to clean the data and arrange them, but with the following code everyone can generate a graph of dependencies in less than 5s.

First, we load the igraph package and we retrieve the data from our current repository.

library(igraph)
dat <- available.packages()

For each package, we produce a character string with all its dependencies (Imports and Depends fields) separated with commas.

dat.imports <- paste(dat[, "Imports"], dat[, "Depends"], sep = ", ")
dat.imports <- as.list(dat.imports)
dat.imports <- lapply(dat.imports, function(x) gsub("\\(([^\\)]+)\\)", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub("\n", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub(" ", "", x))

Next step, we split the strings and we use the stack function to get the complete list of edges of the graph.

dat.imports <- sapply(dat.imports, function(x) strsplit(x, split = ","))
dat.imports <- lapply(dat.imports, function(x) x[!x %in% c("NA", "R")])
names(dat.imports) <- rownames(dat)
dat.imports <- stack(dat.imports)
dat.imports <- as.matrix(dat.imports)
dat.imports <- dat.imports[-which(dat.imports[, 1]==""), ]

Finally we create the graph with the list of edges. Here, I select the largest connected component because there are many isolated vertices which will make the graph harder to represent.

g <- graph.edgelist(dat.imports)
g <- decompose.graph(g)
g <- g[[which(sapply(g, vcount) == max(sapply(g, vcount)))]]

Now that we have the graph we can compute many graph-theory related statistics but here I just want to visualize these data. Plotting a large graph like that with igraph is possible but slow and tedious. I prefer to export the graph and open it in Gephi, a free and open-source software for network visualization.
The figure below was created with Gephi. Nodes are sized according to their number of incoming edges. This is pretty useless and uninformative, but still I like it. This looks like a sky map and it is great to feel that we, users and developers are part of this huge R galaxy.

R packages network

Click on the picture to see the HD version

facebooktwitter

rleafmap: R Markdown in interactive popups

This is the second « big » feature coming with branch 0.2 of rleafmap (now on CRAN!). With this new version you can write popups content in R Markdown which will be processed when you generate the map. This can be useful to format popups using markdown syntax (if you need more control remind that popups can also be formatted with html tags). More interesting with R Markdown is the possibility to include outputs of R code chunks. Thus, results, simulations and plots can be easily embedded within popups content.

To activate R Markdown for a layer you just have to pass the R Markdown code to the popup argument and set popup.rmd to TRUE.

The chunkerize function

You can write your R code chunks manually but you can also use the function chunkerize which tries to make your life simpler. This function has two purposes:

  • It turns a function and its arguments into an R code chunk.
  • It can decompose the elements of a list and use them as values for a given argument.

The latter may be useful when you have different features on your map and you want to execute the same function for each of them but with different data as input. This can be done by tagging the list containing the data with a star character (*). See the following example:

Data you need for the example: download and unzip
First we load the packages we need:

library(rleafmap)
library(maptools)

Then we load the data: a shapefile of regions and a dataframe giving the evolution of the population for each region.

reg <- readShapePoly("regions-20140306-100m")
reg <- reg1
pop <- read.csv("population.txt", sep = "\t", row.names = 1)
pop <- pop[rev(rownames(pop)), sort(names(pop))]

We prepare some colors for the map and a legend:

gcol <- rev(heat.colors(5))
gcut <- cut(as.numeric(pop["2012", ]),
            breaks = c(0, 2000000, 3000000, 4000000, 8000000, 12000000))
reg.col <- gcol[as.numeric(gcut)]
reg.leg <- layerLegend(style = "polygons",
                       title = "Population",
                       labels = levels(gcut),
                       fill.col = gcol)

We prepare the data for chunkerize: pop is already a list, since it is a dataframe but for clarity we turn it into a simple list.

Year <- as.numeric(rownames(pop))
L <- as.list(pop)
L2 <- as.list(names(L))

Now is the trick. We create a chunk based on a plot function. We provide 5 arguments (names and values). Each arg.values is going to be recycled, except L and L2 which are tagged with a star. In that case, each element of these lists is going to be used as values.

popup <- chunkerize(FUN = plot,
                    arg.names = c("x", "y", "type", "ylab", "main"),
                    arg.values = c("Year", "*L", "'b'", "'Population'", "*L2"))

Now we just have to create the layers and compile the map:

cdbpos.bm <- basemap("cartodb.positron.nolab")
reg.map <- spLayer(reg,
                   fill.col = reg.col,
                   legend = reg.leg,
                   popup = popup,
                   popup.rmd = TRUE)
writeMap(cdbpos.bm, reg.map)

Et voilà ! There is a problem with Rstudio and Firefox. It happens that popups do not appear where they should on the first click. It works fine on Chromium.

facebooktwitter

Rummaging through dusty books: Maucha diagrams in R

Do you know the Maucha diagram? If you are not an Hungarian limnologist, probably not! This diagram was proposed by Rezso Maucha in 1932 as a way to vizualise the relative ionic composition of water samples. However, as far I know this diagram had few success in the community. I never heard about it until my coworker Kalman (who is also Hungarian) asked me if I knew how to plot it in R.

First, I have to admit I was a bit skeptical… But finally, we decided it could be an interesting and funny programming exercise.

We found instructions to draw the diagram in Broch and Yake (1969) [1] but rapidly we were interested to find the original paper of Maucha [2]. This paper is apparently not available on-line, and we could only find a hard copy in the University of Grenoble (2 hours driving). Nonetheless, we had a look in the library of the lab and… miracle! We found it, between two old dusty books, probably waiting for decades!

The famous book of Rezso Maucha !

The famous book of Rezso Maucha !

Meticulously following the instructions of Maucha, we could write a function to draw the diagram. Then we added some additional options : colors, labels and the possibility to draw multiple diagrams from a matrix. Finally we put the code in a package (hosted on Github) with the dataset included in the original publication.

To install the package, install devtools from your CRAN repo and run:

devtools::install_github("fkeck/oviz")

Then you can load the dataset used by Maucha [2] to introduce his diagram:

data(ionwaters)

And then you can use the function maucha which will plot one diagram for each line of the matrix.

maucha(ionwaters)

maucha_demo

Here we are. And if you are interested in ionic composition of waters, stay tuned, we are planning to add some stuff like stiff diagram and piper diagram.

[1] Broch, E. S., & Yake, W. (1969). A modification of Maucha’s ionic diagram to include ionic concentrations. Limnology and Oceanography, 14(6), 933-935.
[2] Maucha, R. (1932). Hydrochemische Methoden in der Limnologie. Binnengewasser, 12. 173p.

facebooktwitter

Fully customizable legends for rleafmap

This is a functionality I wanted to add for some time… and finally it’s here! I just pushed on GitHub a new version of rleafmap which brings the possibility to attach legends to data layers. You simply need to create a legend object with the function layerLegend and then to pass this object when you create your data layer via the legend argument. Thus, a map can contain different legends, each of them being independent. This is cool because it means that when you mask a data layer with the layer control, the legend will also disappear.

You can create legends with five different styles to better suit your data: points, lines, polygons, icons and gradient (see the graphic).

legends_rleafmap

Legends for John Snow’s cholera map

I give as example a new version of the cholera map.
Here is the code:

devtools::install_github("fkeck/rleafmap")
devtools::install_github("Hackout2/epimap")
library(rleafmap)
library(epimap)

data(cholera)
n.chol <- ifelse(cholera$deaths$Count > 4, 6, cholera$deaths$Count) + 2

# Basemap layer
cdbpos.bm <- basemap("cartodb.positron.nolab")

# Legends
death.leg <- layerLegend(style = "points", title = "Deaths",
                         labels = c("1", "2", "3", "4", "> 4"), size = c(3, 4, 5, 6, 8),
                         fill.col = "red", fill.alpha = 0.9)
pumps.leg <- layerLegend(style = "icons", title = NA, labels = "Water pump",
                         png = "/home/francois/water.png", png.width = 31, png.height = 31)

# Data layers
death.points <- spLayer(cholera$deaths, legend = death.leg,
                        size = n.chol, fill.col =  "red", fill.alpha = 0.9)
pumps.points <- spLayer(cholera$pumps, legend = pumps.leg,
                        png = "/home/francois/water.png",
                        png.width=31, png.height=31)

my.ui <- ui(layers = "topright")

writeMap(cdbpos.bm, pumps.points, death.points, interface = my.ui,
         setView = c(51.5135, -0.137), setZoom = 17)

And here the result. Enjoy and check out the legend interactivity when you play with the layer selector 😉

facebooktwitter

Juste un petit souvenir

Revoir Andrea avec les autres, début juillet à la SEFS, m’a rappelé une mémorable soirée au château ! Andrea nous avait ramené une petite collection de bières de son Allemagne natale et nous avions organisé une séance de dégustation et notation…

beer_exp

Réalisé avec R, ade4 et… je reconnais, Inkscape pour la post-production.

facebooktwitter

Contours and Networks with epimap and rleafmap

In February, I participated in a hackaton organized by Thibaut Jombart at Imperial College, London, to work on visualization tools for outbreak data. This was a great time spent with great people! Thanks again, Thibaut, for organizing. I took part in the development of epimap, an R package for statistical mapping. The aim of epimap is to provide tools to quickly and efficiently visualize spatial data. There is a set of functions designed to do that and you can check out the Github page for a demo.

This package also provides two functions to coerce complex objects to Spatial classes so that they can be easily included in a map.

  • The contour2sp function takes a SpatialGrid object and returns contour lines as a SpatialLinesDataFrame.
  • The graph2sp function takes a graph (from igraph package) with geolocated vertices and returns a list of Spatial objects (points and lines).

Following this post of Arthur Charpentier (who nicely plays with rleafmap!), I decided to include the John Snow’s Cholera dataset in epimap so it can be simply used for tests.

In this post I want to show how epimap and rleafmap can be combined to create fully customizable interactive maps with complex objects. The cholera dataset gives the locations of cholera deaths and the locations of water pumps in London. The maps will show the location of cholera deaths with points, the local density of deaths with colored contour lines and the location of water pumps with icons. Moreover, the pumps will be represented within a network where two pumps are connected if there are close enough.

library(rleafmap)
library(epimap)

data(cholera)

# Create a network of pumps
pump.adj <- as.matrix(dist(sp::coordinates(cholera$pumps)))
pump.graph <- graph.adjacency(pump.adj < 0.003, diag = FALSE)
V(pump.graph)$lat <- coordinates(cholera$pumps)[, 2]
V(pump.graph)$lon <- coordinates(cholera$pumps)[, 1]

# Convert death density SpatialGrid to contour SpatialLines
death.cont <- contour2sp(cholera$deaths.den, nlevels = 10)

# Basemap layer
cdbdark.bm <- basemap("cartodb.darkmatter.nolab")

# Data layers
death.points <- spLayer(cholera$deaths,
                        size = 1,
                        fill.col =  "white",
                        fill.alpha = 0.5,
                        stroke = FALSE)
death.contour <- spLayer(death.cont,
                         stroke.col = heat.colors(12)[cut(death.cont$level, 12)],
                         stroke.lwd = 1.5,
                         stroke.alpha = 1)
pumps.points <- spLayer(graph2sp(pump.graph)[[1]],
                        png = "/home/francois/water.png",
                        png.width=31 ,
                        png.height=31)
pumps.links <- spLayer(graph2sp(pump.graph)[[2]],
                       stroke.lwd = 3,
                       stroke.col = "white")

my.ui <- ui(layers = "topright")

writeMap(cdbdark.bm, death.points, death.contour,
         pumps.links, pumps.points, interface = my.ui)

And here is the map we get:

facebooktwitter

Introducing rleafmap. An R package for interactive maps with Leaflet.

Obviously, I am late…

I released rleafmap about 1 year ago and I am just writing this blog post today. During this time, I presented the package to the french R-users community at the 3eme Rencontres R in Montpellier and could get some good feedbacks. Now, I would like to communicate better on the project. My idea is to post news about the development and communicate on new features illustrated with examples on this blog. The documentation and tutorials will be published on the project website (http://www.francoiskeck.fr/rleafmap/) if I can save time for that.

Purpose and philosophy

rleafmap is an R package that can be used to generate interactive maps with your data. If you manipulate spatial data in the R environment, at some point you probably want to visualize them. The most common way to visualize spatial data is maps. Like other packages (googleVis, rMaps…) rleafmap is designed to produce maps with interactivity to bring a richer experience to the end user. This is made possible by the use of Leaflet, the amazing open-source javascript library created by Vladimir Agafonkin.

There are two things important to be aware for a good start with rleafmap.

  • First, the package use exclusively input data inheriting from the Spatial class of the sp package. These classes are central and allows to work with many packages. If you don’t know it, a good place to start is the vignette of the sp package on CRAN. If you prefer a good book have look to Bivand et al. 2013 [1].
  • The second point is about how the package works. Data layers are stored in independent R object with their own symbology. The map consists in a compilation of these objects. The idea is to stick to the philosophy of the traditional GIS software.

For a more complete view of the package I strongly recommend that you have a look to the website.

[1] Bivand R.S., Pebesma E.J. & Gómez-Rubio V. (2013) Applied Spatial Data Analysis with R, 2nd edn. Springer, New York.

facebooktwitter

Notifications planifiées sur Ubuntu

Au travail nous sommes à présent tenus de badger quatre fois par jour via une interface web. Je ne reviendrai pas ici sur tout le mal que je pense de la badgeuse. Le problème à présent est de composer avec ce système et de ne pas oublier de badger alors qu’on a mille choses plus intéressantes à l’esprit. D’où l’idée d’utiliser le système de notification d’Ubuntu dans une tache cron, pour s’envoyer des petits messages de rappels.

On commencera par installer le paquet libnotify-bin.
sudo apt-get install libnotify-bin

Pour ajouter une tache planifiées on édite le fichier crontab.
crontab -e

Et on y ajoute autant de tache qu’on veut (exemple d’utilisation ici)
Dans notre cas, pour s’envoyer un petit message chaque jour à 8:30 on peut mettre :
30 8 * * * DISPLAY=:0 notify-send -u critical -i /usr/share/icons/gnome/256x256/emotes/face-wink.png "Salut, as-tu pensé à badger?"

Et voilà !
Sélection_038

facebooktwitter

RZH, une macro VBA Excel pour la délimitation de zones humides

Ce billet est l’occasion de présenter et de relacher dans la nature un petit programme VBA (hemhem…) pour faciliter la saisie et la synthèse de données dans le cadre de la délimitation administrative de zones humides sur critères botaniques. Autant dire tout de suite que ça s’adresse à un public plutôt restreint mais il en faut pour tout les goûts.

Zieg3rman on Flickr (CC BY-NC-SA)

Zieg3rman on Flickr (CC BY-NC-SA)

Le programme en lui même est relativement simple,je recommande néanmoins aux utilisateurs de bien veiller à ce que tout fonctionne bien selon leurs attentes, car je n’ai pas fait 36.000 tests…

RZH est distribué sous licence libre GPL, toutes les sources sont disponibles via l’éditeur VBA de Excel.

>>> Télécharger RZH

Je reproduit ci-dessous l’aide, plutôt réduite j’en conviens, mais qui donne un premier aperçu du programme.

1. A PROPOS DE RZH

RZH est une application intégrée à Microsoft Excel conçue pour faciliter la saisie et la synthèse de données dans le cadre de la délimitation de zones humides sur critères botaniques.

RZH s'appuie sur l'Arrêté du 24 juin 2008 précisant les critères de définition et de délimitation des zones humides en application des articles L. 214-7-1 et R. 211-108 du code de l'environnement français. Cependant RZH n'est pas reconnu par la législation et est fourni sans aucune garantie (Voir Licence).

2. RECOMMANDATIONS RELATIVES A CETTE VERSION

RZH est en développement et des bugs sont susceptibles de survenir. Pour vous en prévenir, il est conseillé d'utiliser uniquement les assistants de RZH pour saisir et supprimer des données.

De manière générale il est recommandé de ne pas modifier manuellement la structure du classeur et de ne pas l'annoter si vous ne comprenez pas le fonctionnement interne de l'application.

Une fois les tableaux générés vous pouvez les copier dans un autre classeur où vous pourrez les modifier à loisir.

3. UTILISATION DE RZH

RZH fonctionne en quatre étapes
- Génération d'un nouveau projet
- Construction d'une base de données espèces
- Saisie des données de terrain
- Synthèse

3.1 INITIALISATION D'UN NOUVEAU PROJET

Un nouveau projet est généré avec le bouton "Initialiser un nouveau projet" disponible dans l'onglet "Start".
Cette action entraîne la perte de tout projet antérieur présent sur ce document.
L'assistant vous permet de saisir le nombre de transect pour ce projet et pour chaque transect le nombre de placettes désiré.
A la fin de cette étape, les onglets résultats (un par transect) sont générés.

3.2 BASE DE DONNEES ESPECES

Avant de saisir les résultats, il convient de compléter la base de données espèces. Cette base doit contenir toutes les espèces recensées sur le projet. Elle peut en contenir plus.
Cliquez sur l'onglet "BD Espèces" pour consulter la base. Utilisez les boutons modifier la base.
Il peut être intéressant de transférer la base d'un projet à l'autre. Pour cela, un simple Copier-Coller suffit.
La base de données n'est pas effacée lors de l'initialisation d'un projet.

3.3 SAISIE DES DONNEES TERRAIN

Les données collectées sur le terrain peuvent être saisies et supprimées sur chaque onglet transect en utilisant les boutons et assistants dédiés.

3.4 SYNTHESE DES DONNEES

L'onglet "Synthèse" permet de générer un tableau de synthèse des résultats et de caractériser la végétation de chaque placette (hygrophile ou non).
La synthèse se met à jour automatiquement si des changements sont faits dans les résultats.

facebooktwitter

Convertir un tableau R vers Javascript

Pour un petit projet sur lequel je travaille, j’ai eu besoin de passer des données d’un dataframe de R vers Javascript. Les données en Javascript peuvent être chargées au format JSON que R exporte avec le package RJSON. Mais pour ce projet particulier, dans un souci de simplicité et parce que je débute en Javascript, je voulais importer mes données dans le format traditionnel des arrays de Javascript.

Un array en JS ça marche comme ça :

var montableau = [1, 2, 3, 4];

Ce qui équivaut grosso modo à un vecteur sous R :

montableau <- c(1, 2, 3, 4)

On peut aussi faire de la pseudo-2D :

var montableau = [[1, 2], [3, 4]];

Ce qui donnerai une matrice sous R :

montableau <- matrix(c(1, 2, 3, 4), ncol=2, byrow=TRUE)

Bref voici une petite fonction R pour passer un dataframe (ou un truc ressemblant) de R vers Javascipt :

toJSarray <- function(df){

  if (!is.data.frame(df)){
    df <- as.data.frame(df)
  }
  
  df.nbli <- dim(df)[1]
  df.nbco <- dim(df)[2]
  temp <- vector()
  a.1 <- vector()
  
  for (i in 1:df.nbli){
      for (j in 1:df.nbco){
          if (is.numeric(df[,j])){
            a.1[j] <- df[i,j]
          } else {
            a.1[j] <- paste("\"", df[i,j], "\"", sep="")   
          }
      }
      a.2 <- paste(a.1, collapse=", ")
      temp[i] <- paste ("[", a.2, "]", sep="")
  }
  
  temp.2 <- paste(temp, collapse=", ")
  if (df.nbli == 1){
    jsarr <- temp.2
  } else {
    jsarr <- paste ("[", temp.2, "]", sep="")
  }
  invisible(jsarr)
}

Après quoi on peut rapidement écrire un fichier JS contenant nos données.

write(paste("var montableau = ", toJSarray(mondf), sep=""), "monfichier.js")

facebooktwitter