Graphing Jack Kirby’s career using R

God bless Ray Owens, who compiled a web page itemising every page of Jack Kirby art by title, publication date etc. I decided to make a graph of the King’s entire career in comics, so I fired up R, rolled up my metaphorical sleeves and did the following:

First, I need to have the source code of the web pages loaded into R. The bibliography is split over six pages, so I use the following command to download all six and concatenate them into a single text string.

jkall <- paste(
   readLines( "http://www.marvelmasterworks.com/resources/kirby_chronology.html" 
), readLines( "http://www.marvelmasterworks.com/resources/kirby_chronology1.html" 
), readLines( "http://www.marvelmasterworks.com/resources/kirby_chronology2.html" 
), readLines( "http://www.marvelmasterworks.com/resources/kirby_chronology3.html" 
), readLines( "http://www.marvelmasterworks.com/resources/kirby_chronology4.html" 
), readLines( "http://www.marvelmasterworks.com/resources/kirby_chronology5.html" 
), collapse="")

This gives me one big character vector with all the html code in. The code looks something like this:

<font color=#CC0000><u><b>Mar 1938</b></u></font color> (2)<br>Wags # 64 
: J B Powers (UK) - <b>The Count of Monte Cristo</b> (1)<br>Wags # 65 : J
 B Powers (UK) - <b>The Count of Monte Cristo</b> (1)<P><font color=#CC0000>
<u><b>Apr 1938</b></u></font color> (4)<br>Wags # 66 : J B Powers (UK)...

And so on. The parts I’m interested in are the monthly totals, which are always preceded by that ‘CC0000′ colour code. So I dig out the first seventy characters after each instance of that code (I’ll need the functions from the stringr library to do this easily).

library(stringr)
raw <- str_extract_all(jkall,"CC0000.{70}")

I now have a 567-element character vector called raw, which looks like this:

[1] "CC0000><u><b>Mar 1938</b></u></font color> (2)<br>Wags # 64 : J B Powers (UK" 
[2] "CC0000><u><b>Apr 1938</b></u></font color> (4)<br>Wags # 66 : J B Powers (UK"
...
[567] "CC0000><u><b>Nov 1995</b></u></font color> (2)<br>Dark Horse Presents # 103 "

I now need to pull out the monthly total (the figure in parentheses) and the year for each element, so it’s time to break out the regular expressions.

year <- str_extract(raw[[1]],"19[0-9]{2}")
pages <- str_extract(str_extract(raw[[1]],"\\([0-9]+\\.?\\+?[0-9]?\\)"), 
"[0-9]+\\.?[0-9]?")

And I’ll want to store the year and page numbers in a data frame.

jkdf <- data.frame(list(pages=as.numeric(pages),year=as.numeric(year)))

Okay. So what I have now looks like this:

    pages year
1     2.0 1938
2     4.0 1938
3     2.0 1938
4     8.0 1938
5     8.0 1938
...
562   3.0 1994
563   2.0 1994
564   2.0 1994
565   2.0 1994
566   0.5 1995
567   2.0 1995

What I want is to sum all the pages by year, ready for plotting on my graph. I create a summary data frame using the aggregate function and rename the elements for simplicity.

yrtotals <- aggregate(jkdf$pages,list(jkdf$year),sum)
names(yrtotals) <- c("year","pages")

Now to plot the graph, using ggplot2.

library(ggplot2)
ggplot(yrtotals,aes(x=year,y=pages)) + geom_line()

Et voila:


Doesn’t look terrible, but I want to clean it up a bit. First the axes and plot title.

last_plot() + theme_bw()
last_plot() + scale_y_continuous("Published comic art pages per year", expand=c(0,0),
 limits=c(0,1300), breaks=seq(from=0,to=1300,by=100)) + scale_x_continuous(
"Source: marvelmasterworks.com",breaks=seq(from=1940,to=1995,by=5)) + 
opts(axis.title.x = theme_text(hjust=1, vjust=0, size=8), 
title="Jack Kirby's career in comics", plot.title=theme_text(hjust=0.5, 
vjust=1, size=16))


I think this is nearly done, but I want to add a couple of annotations. First, that sudden dip to fewer than 100 pages published in 1945 came about because Kirby was fighting in World War II. Secondly, that sustained burst of productivity in the 1960s: what we now know as the Marvel age. I’m going to add some shaded rectangles to the graph to highlight those periods.

jkhighlight <- data.frame(list(start=c(1943,1961),end=c(1945,1970), 
period=c("Military service","The Marvel Age")))
last_plot() + geom_rect(aes(NULL, NULL, xmin=start, xmax=end, fill=period),
 ymin=0, ymax=1300, data=jkhighlight,alpha=0.3) + scale_fill_manual("",
values=c("blue","red"))

Okay, done.

About these ads

2 thoughts on “Graphing Jack Kirby’s career using R

  1. [...] described in a previous post, I had made a chart of Jack Kirby’s career as a comics artist. The finished product looked [...]

  2. [...] A sottolineare l’enorme quantità di lavoro pubblicata, di per sè incredibile, ci ha pensato un blogger dedito allo studio analitico creando un grafico che rende bene il quadro [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: