brad flora.com

Jump to content

Additional information

Categories

Recent Posts


Archive for April, 2011


Updating Memetracker

Apr 2011
27

One of my favorite sites on the web is Techmeme. It’s a site that shows you the hottest trending stories across online media right now.  It’s run by a guy named Gabe Rivera who developed the initial technology, has launched a few similar sites for other topics (baseball, media, politics), has sold some ads, mostly on Techmeme, and hired a few people to boost coverage on the sites.  I have a lot of respect for what he’s built.

I’ve wanted to be able to create my own Techmeme-like sites for topics I’m interested in.  I’d love to be able to see what’s happening right now across Chicago media, for example.  In some ways Windy Citizen started as an attempt at “faking” a solution to that problem via crowdsourcing, since I lacked the ability to create something that could surface the top stories programmatically by itself.

A few years back though, someone built a Drupal module called Memetracker that lets you turn a Drupal install into a memetracker for any topic you choose.  It was written by Kyle Matthews as a Google Summer of Code project waaaay back in 2008.  You can see Kyle’s initial proposals and outlines here and here.

Kyle released a working version of Memetracker at the end of the summer and graciously helped me get an instance up and running.  His code sat on top of Drupal’s FeedAPI module.  You’d use that module to pull in feeds and then you’d use memetracker to figure out which feeds you wanted to include in your automated front page.  Then it would run a bunch of magic on the feed items and tell you which were the top stories across those feeds.

It worked! And it still does.  However, memetracker had some tricky python dependencies, relied on memory hog, cron-job killing FeedAPI, and lacked nuance.  There was no way to assign weights to sources, for example.  No way to control the sort order of related stories.  No way to… you get the idea.

Over the last few years as I’ve worked on Windy Citizen, I’ve made a few attempts to get these problems fixed.  I set up a few memetracker instances and have hired contractors to set up and work on memetracker instances.

With the instances I’ve set up myself, the problem has always been that memetracker would cause the server to overload and crash and I could never get cron to run consistently.

So I tried hiring people who know more about that stuff to fix some of the outstanding issues in the code.  This has never worked out well.  Every developer that I show memetracker to tells me immediately that we need to rewrite it from scratch to run on AppEngine or AWS or something.  I understand that it’s easier for developers to work on stuff they built from the ground up, but that doesn’t help me much as the developer’s going to move on eventually and I’ll be left to support the work they did.  I don’t have time to learn Python so I can understand the memetracker they built.  I know Drupal, so I’d like to solve the outstanding bugs in the Memetracker module rather than have to learn Python from scratch.

The problem is that most of the developers I know who have experience with algorithmic aggregatrion would rather be tied up in a burlap bag full of weasels than touch or have anything to do with Drupal.  The minute they start grumbling is the minute I know it’s over.  Meanwhile, the  Drupal people I know tend to be fellow themers and implementers and I never know if they can actually code or if they’re fellow fakers like me.

So I’ve now had 3 aborted attempts at paying someone to fix the bugs in memetracker and get it working.  No successful outcomes.  Over the last two years meanwhile, I’ve gotten more comfortable with things like using the command line (rebooting our WC box daily does that to you) and have a better understanding of the LAMP stack and how those pieces fit together.

So this past weekend, I signed up for a Linode account, set up a LAMP stack (took me 6 hours :( ), got a Drupal 6 Pressflow up and running (took me 3 hours :( ), and then got a basic memetracker instance running on it.  That last bit took about 12 hours of trial and error and I am indebted to encouragement and help from Chad Paulson.  Eventually we found the problem, I was importing the wrong Python module.  python-cluster != PyCluster. LOLWTF.

Now I’ve got that running and am going to work through the things that need to be done.  In no certain order they are:

I’ve started an issue queue over on the Memetracker page on Drupal.org for the first issue.  You can read my first update here. I’m hoping someone will come along and help me figure out the stuff I don’t get in that thread, but after two years, I realize I’m going to have to do this on my own so I have no expectations per se.

I will try to post progress updates as I go over here.  We shall see…


Paging

Archives

Credits

Template designed by praegnanz.de.