Phillip Mendonça-Vieira

Follow me on twitter, flickr, github, my reading blog or my photo blog
aboutarchiverss

A timelapse of the BBC and assorted thoughts

The cron job I set up to capture the nytimes' frontpage was also set up to collect the BBC's:



Catch the Chilean miners at 0:30. The Arab Spring begins in earnest around 3:00 with Tunisia (and eventually picks up Egypt). The Japanese tsunami hits around 4:30.

The video goes from September to May; I cropped it to match the music. The song is the first movement of Philip Glass' Violin Concerto No. 1. For some reason the top red banner didn't render properly. I have no idea why youtube didn't render a 720p version.

Differences in reportage

There are several interesting things going on here.

The thing that stands out the most in comparison to the nytimes is how the BBC's editors behave more placidly in their content curation. Where the nytimes crams its homepage with as much information as possible, the BBC picks the most important story of the day and runs with it.

I suspect this difference comes down to a dramatic divergence in the sheer volume of content both organizations feel the need to showcase. Where the BBC is happy to file short, factual pieces, the nytimes house style seems to force them into multi page arrangements. Where the nytimes quantifies the BBC strives for brevity.

This can be seen most clearly when you compare their coverage of the Chilean miners:



One pleasant upshot of this, however, is that the nytimes' homepage is more dramatic. If I want to know the most important story of the day, the BBC will do me right. If I want to follow an event with bated breath I might be better served by the nytimes.

See how they covered the Egyptian revolution:



There is another great thing about the above video: you can see around 1:56 how the news cycle briefly moved on once the pace of new developments slowed down. I feel like this has nagging implications on our own perceptions of foreign events. You can see how the two sites share a large portion of photography and how quickly a subject gets dropped once it stops being news.

Woe betide protestors in Bahrain and Syria for being late to the party and living in uninteresting countries. Heaven help you once your civil war stops featuring dramatic reversals, like in Libya.

(Note: this more a limitation of our collective attention span. We just have a limited capacity for empathy. I didn't even mention the Ivory Coast.)

Anyhow, I also produced a video comparing their coverage of the Japanese tsunami:



So, which news organization should I follow?

Frankly, the nytimes features a lot of noisy, America-centric news pieces. Just by watching the timelapses I felt vastly more informed on "what is happening out there" based on what the BBC decided to feature.

That said it's important to keep in mind that the nytimes offers a "global edition" which was not captured by my timelapse, and that the BBC you get on TV in the UK has a more domestic bent than their global news website.

The BBC offers a very broad, yet shallow coverage. They're great for quick summaries, but all too often they seem almost alergic to drawing conclusions. I'm infuriated by their policy of putting everything in quotes; it's as if they're trying to absolve themselves of any responsibility for interpretation.

The nytimes on the other hand produces awesome multimedia illustrations for the stories they cover. Their biggest problem in my humble opinion might be a reticence to embrace the medium; all of this quality work gets hidden in a side bar and becomes lost to the tides of time. They also have excellent photography editors.

Could you now espouse some bullshit on the nature of media in these tough digital times?

Certainly!

Once an iPad equivalent drops below $100 it will stop making economic sense to pulp hundreds of tonnes of trees and ship them out on a daily basis. If you don't operate on a mandated subsidy it's going to be difficult to justify ongoing costs in news coverage. Especially given that there's some mounting evidence that news simply isn't profitable.

The upshot of this is that in the meantime quality news coverage is going to contract and we'll probably suffer somewhat on a intellectual, moral, and democratic level, etc, etc.

I'm not too concerned; our democracies emerged without the modern news distribution apparatus' and we're suddenly starting to see corps of semi-professional, semi-volunteers picking up the slack on a local level. The next ten years are going to be crummy on almost any vector you want to pick, but at least we'll be experiencing the last and most complete expansion in distribution and access to information.

This is incredibly exciting. This will never happen again. Everything we do now will be taken for granted and ignored by future generations. We'll figure things out eventually.

You're a web developer, though. Your opinion is meaningless. You are wasting all of my time. You can barely write as it is.

I, uh, hrm.

Whatever. For some miracle I've kept reading this far. Tell me about things you're actually qualified to talk about.

Okay!

First things first, I was wrong when it comes to automated archives. The nytimes does keep an archive. So does the Boston Globe. An enterprising fellow keeps a detailed archive of the BBC News. I'm a lot less pessimistic about preservation.

Andy Rutledge came out and redesigned the nytimes frontpage. It's pretty, he has a lot of good points and I agree with him when it comes to information saturation. Some people think he's missing out on a few important details. In the meanwhile they've all put forth interesting ideas that are better articulated than anything I can contribute.

This is probably best critique of the critique I've found to date. I'm just going to bring out what I think is the money quote:

Why not take a page out of blog design and have a running tally of your most recent major headlines? This way I can visit a news site any time of the day and see what I missed previously. Can’t you safely assume that a majority of the readers aren’t going to scan the whole front page for something that interests them, especially if you are trying your best to draw their attention with major headlines?

I can't prove it but I've been saying this for years now. I think that's probably the best move forward, but I suspect that's going to require a paradigm shift in how we think of content management systems.

A final note

This is all I have left to say about timelapses. I'm astonished you've read this far. Give me a shout if you have any questions or comments.

#

HOWTO: Make a screenshot timelapse video.

After I posted that nytimes-lapse video earlier this week a lot of people wrote in asking me about how it was accomplished.

The accident

No one just "ends up" with with 13,802 screenshots. I fully planned to make something not entirely unlike that video – but it was supposed to last two weeks, not nine months.

About a year ago I was doing my best trying to not fail at finishing my undergraduate thesis. I was also being constantly harassed by my advisor to create a demo that would justify all the office space I was occupying.

The problem is that I had what was an essentially undemoable project. It took weeks of user interaction for meaningful results and its very premise hinged on being kind of invisible.

So, I decided that I could try to make a point by illustrating just how much stuff there is on the internet and to this end I set up a cron job and... promptly forgot about it. It went on until I received an email from my hosting company telling me that my server had exhausted all of its disk space.

What does this cron job look like?

*/30 * * * * /home/phillmv/screenshots/do_it.sh
#!/bin/bash
/usr/local/bin/wkhtmltoimage --crop-h 768 http://nytimes.com /home/phillmv/screenshots/nytimes-`date +%Y%m%d-%H%M`.jpg

wkhtmltopdf is a phenomenal utility.

How long did it take you to make?

It took about two or three evenings and most of a weekend to put together, which I colloquially refer to as "about four days". The vast majority of the time was spent downloading, uploading, carefully editing out the full page ads, rendering, reordering files, and slowing down the interesting bits.

Okay, I have a folder full of jpegs. What now?

Install ffmpeg. It's a tool for manipulating video and audio data. Make sure you enable x264 and liblame support. Ubuntu users feeling frisky should check this guide out, tho compiling it from source is probably unnecessary. OSX users can install it through homebrew.

Go into your screenshot folder, rename every file so that its filename is a number that increases sequentially starting from 0, and type the following in:

ffmpeg -r 30 -i %d.jpg -vcodec libx264 -bf 0 -crf 12 -threads 2 -an -r 30 filenamehere.mp4

If you want to add an mp3 to it, after the above command completes, type in:

ffmpeg -i filenamehere.mp4 -i song.mp3 -map 0:0 -map 1:0 -r 30 -acodec copy -vcodec copy -threads 2 filenamehere_with_sound.mkv

Say what?

Don't get me started. ffmpeg is a crazy complex tool. It was written by a guy who is just frighteningly smart. It's hard to use, and most of the flags are configuration options for the codec being used.

No one really knows how to use it. I've spent a lot of time getting it to work. People trade magic incantations for it over forum posts. If you care about quality, you should be using a preset.

The key things to know:

-i defines an input file.
%d.jpg is a pattern matcher for digits.
The first -r defines input frame rate, the second -r defines an output framerate.

The defaults work well enough (for screenshots). It's hard for me to spend any more time thinking about this.

I'm going to take your word on that

You should. They're excellent words. Prime verbiage right here.

Wait, how the hell am I supposed to rename 12,000 files?

I wrote a script. It's easy, when you're a programmer.

So... how did you edit this?

Like I said, I wrote a script. It made looking up specific images and repeating certain frame sections easy. It was okay. It was tolerable.

I timed the number of frames to match up with the length of the song.

Should I use Youtube or Vimeo?

My heart says Vimeo, but they reject weirdly sized videos like mine. Youtube accepts a wide variety of formats, but I kept having issues where the first 5 seconds were corrupted and full of grey frames.

I eventually discovered that youtube has a problem with the mpeg4 container and that storing it in Matroska instead made all my pain go away. Read the following excellent guide to video encoding by Mark Pilgrim if you don't intuitively grasp the meaning of the previous sentence.

Unfortunately, it seems like I haven't ironed out the kinks. Caveat emptor.

That was a really cool video!

Thanks! I'm conflicted over how I should feel about this. On the one hand all of the attention was exhilarating and intoxicating, yet on the other I could feel the raging rampant narcissism I engaged in slowly poisoning my soul.

#

The nytimes they are a-changin'

Due to an errant cron task that ran twice an hour from September 2010 to July 2011, I accidentally collected about 12,000 screenshots of the front page of the nytimes.com (unfortunately, you can only watch the whole 7 minutes if you stick to 480p).



Working on this video was fascinating because the past year was filled with dramatic events (from the Chilean miners [0:39] to the Arab Spring [3:38] and the Japanse Tsunami [4:54]) that I got to watch unfold time and time again. Watch out for them in the video; I took special care to slow down certain time periods.

Conveniently, this might be important because:

Traditionally, the purpose of a newspaper's front page was to entice the reader into delving further into the publication. As a consequence, they are roughly equivalent with whatever the editors thought were the most relevant news items of the day.

That said, moments like this one are no longer possible:

Having worked with and developed on a number of content management systems I can tell you that as a rule of thumb no one is storing their frontpage layout data. It's all gone, and once newspapers shutter their physical distribution operations I get this feeling that we're no longer going to have a comprehensive archive of how our news-sources of note looked on a daily basis. Archive.org comes close, but there are too many gaps to my liking.

This, in my humble opinion, is a tragedy because in many ways our frontpages are summaries of our perspectives and our preconceptions. They store what we thought was important, in a way that is easy and quick to parse and extremely valuable for any future generations wishing to study our time period.

Thanks for reading! If you have an idea of something fun to do with the data, holla at me.

Interested in how this was put together? I wrote a howto.

Thought this video was cool? I made another one using the BBC.

#

People try to put us down / just 'cause we get around

phill: so
phill: I was at a party on… saturday

andrew: at your house/

phill: no no
phill: we're all sitting around

andrew: what?

phill: I'm telling you a story sheesh

andrew: ohhh
andrew: ok

phill: i'm setting up the cinema display for the movie

andrew: go go go

phill: and I forget the exact comment but it was something like
phill: "oh well, but isn't Anna the only one here with a job?" (her bf, Luke, is unemployed)
phill: << uncomfortable silence >>
phill: "Well tons of people have jobs. Uh… doesn't Mina have a job?"
phill: (there are like ten people sitting in this room)
phill: and I thought about it
phill: I think every person in that room was underemployed
phill: (Mina is a freelance editor and so has large bouts of idleness as it commonly happens)

andrew: how do they make rent
andrew: that's our generation for you

phill: and I thought to myself, what a typical generation Y conversation

andrew: ther'es an inkling of a feeling in me that somehow feels there's something good there though
andrew: our generation isn't just fodder for the corporate machiens

phill: we don't settle as easily

andrew: or something
andrew: ya

phill: Luke is borrowing money from his mom to make ends meet
phill: but when telling me about the places he's waiting to hear back from, he was happy that they all seemed cool

andrew: and i don't think our gen is just pissing it all away and wasting time

phill: well
phill: I certainly am
phill: I suspect so are most

#

Thoughts on building a feed reader

2010 may be aptly described as the 'year of the thesis'.

It outlived an apartment, three semesters, two jobs, a relationship, a full conference planning and execution cycle, and spanned many countless weekend afternoons filled with dread and quiet procrastination over lattes and croissants.

In a nutshell, Readless is a custom feed reader that uses online supervised text classification through a bayesian classifier to classify rss/atom feed entries and filter them based on that criteria.


The academic year of 2009-2010 appeared on the horizon and I found myself hatching a plan. I was tired of taking courses that were relatively unchallenging and the slim pickings offered in the course calendar weren't encouraging. In what turned out to me a momentous decision, I convinced the professor of my 'Web Applications' course to let me expand the project me and my awesome friend John Sully developed into an undergraduate thesis. My schedule thus looked incredibly enticing, and the idea of rolling my own project was kind of intoxicating.

That decision turned out to be both great and terrible. Although I quickly put together a proof of concept, I soon found myself perilously behind schedule. I eventually poked at the problem long enough to find something vaguely interesting to say, which in turn yielded this video:

Allow me to share some of the lessons I have drawn from this period.

Work habits

Building your own feed reader

Now what?

For the first time in my life, I find myself with absolutely zero long term commitments and it is both terrifying and fascinating. I'm done school, my short-term freelance gig has come to an end and I'm going to be forced to move in April.

I'm at a bit of a crossroads in life. I want to become as good at my work as I know I am capable of being. I want to keep meeting interesting people and working on interesting things. I'm going to try to enjoy my upcoming (hopefully brief) period of unemployment, as I haven't taken time off in about two years now.

I don't know, but hopefully I'm going to enjoy it.

#

New Yorker, I love you, but you're bringing me down.

I was going through my backlog of New Yorker magazines when I came across something really neat that I would like to share with a friend.

This should not be difficult.


Wait. Where can I find the previous issues?

Well. Maybe I can access them through the 'cover gallery'.


2009? Ugh. That was five months ago.


Oh, fuck off.

To be perfectly fair, if you notice carefully they warn you about it — but why have such a natural browsing interface if you're not going to direct people to more of your content?

Okay. So I know they have an index page for the current issue of the magazine. Maybe it's listed there?

Finally!

Unsurprisingly, the article is paywalled in their 'digital edition'. I am flabbergasted they call it that, considering I'm already on their 'digital website'.

Whatever. I'm a subscriber. Fine. You win. Let's do this.

Needless to say, this interface is incredibly abysmal. Clicking on the page makes it jump to a single zoom level through which you are then supposed to pan and read.

Honestly, I've tried to use it before but it's just so incredibly annoying. This might be usable if you have a 24 inch screen? But on my 13 inch macbook it's more frustrating than anything else.

If memory holds, this interface came out back in 2007. At least they didn't use flash, but it was still short sighted of them to go with a 1:1 recreation of the magazine instead of converting it to a sectioned-off part of their (easier to read) website.

Mind you, I have a feeling I know exactly what happened here. The year is 2005, and someone decided to offer New Yorker connoisseurs the opportunity to own the entire archive in an offline format.

This was a brain damaged decision even back then, but to be fair Microsoft Encarta had yet to completely fail, Wikipedia was just about to enter the cultural mainstream and you could still categorically dismiss techie web-utopians with a straight face. So they managed to scan all of their back catalogue and bam!, now you too can own eight totally useless dvds that are completely impractical to use. I think this set used to cost something like $150 too.

I'm almost willing to bet money that when it became painfully obvious that online archives are now expected subscription perks they simply foisted all of the infrastructure they collected for the DVD version and here we are today. It's especially painful when you consider that you can perform a full-text search of the archive on the site, but no, you're stuck in this weird netherworld, fuck you.

At any rate, obviously I can't just email the article to my friend. I can't select any of this text. I know, I'll print the article and send her the pdf. In a better world they would've made the interface metadata aware and make your life easy, but maybe you can imagine the pain of printing a twenty-nine page article:

This yields a 24 meg pdf whose text you can't select.

What have we learned with this huge waste of my time?

  1. I'm a complete idiot. I should've just googled it.
  2. A paying customer enthusiastically wanting to promote your product got shut down repeatedly by your interface.

This is a jaded argument. I can probably link you to a dozen Clay Shirky essays that will say this more eloquently and more succinctly. I've been in this racket for a while; I make websites for a living. I am just stunned with how much effort I had to exert.

If you're in the business of delivering content, why is google the most useful way to peer into your back catalogue? I don't even want to get into a long tail discussion; Google is poor on serendipity.

We could forgive this big mishap if the 'digital edition' wasn't being promoted so heavily on their site. As it currently stands, it's clearly intended as a kludge to solve the problem poised by the internet. For an organization with so much high quality content I have a hard time comprehending the kind of thinking that has lead to their current condition. Your website shouldn't just reflect the newsstands, where only one issue is available at a time.

Let's not even talk about how crappy their blog section is.

This pains me because I want to keep reading the New Yorker for many years to come. I just love getting it in the mail every week but can you just imagine how insanely great their iphone and ipad apps could be? Or better yet, a webkit-optimized (subscriber) site?

Instead, their archive languishes and fades into obscurity.

#

Thoughts on running a (software engineering) conference

#