First things first: the shiny.
A friend of mine posted a picture on Facebook today a lot like this one. It's a picture of a social graph (the one above is mine): each person I'm friends with on Facebook is represented by a dot (called a "node"), and two people are connected (there's an "edge" between them) if they're friends on Facebook. Nodes are bigger depending on how connected they are (it's a notion called "betweenness centrality"), and are color-coded based on a magical grouping algorithm.
I should note that there were several friends whose privacy settings meant that their data didn't get exported, and so those people don't show up on the graph.
I knew that social graphs were fun things to play with; taking Science of the Web with Luis von Ahn and Brendan Meeder was definitely a memorable experience. However, two things jumped out at me about my friend's graph that nonetheless piqued my curiosity. First, he had color-coded various clusters in the graph as "high school friends," "college friends," and other such groups. Secondly, and perhaps more interestingly, it was his data: I didn't know there was a way of actually pulling your social graph out of Facebook in a way that can be meaningfully analyzed. Apparently there is, and it's called Netvizz.
After that, the next hour was a blur of exploring the graph using Gephi, a powerful and (for the most part) easy-to-use graph exploration program that runs on Windows, Mac, and Linux. After I figured out why it wasn't playing nice with my computer (note: don't put your files in your temporary directory and try to open them. Gephi doesn't like that), I got to grouping, color-coding, and scaling. It was awesome.
I'm not going to walk through the step-by-step of what I did to generate the image above for two reasons. First, if you're interested you can follow the same guide I did, which will walk you through it. Second, I did a bunch of random button-pushing, and I don't remember all of it. Instead, I'm going to focus more on the conclusions.
The easiest way to lay out my thoughts on the resulting graph is to go group-by-group through it. My big overarching thought, though, is just how amazing and, well, scary, graph-crunching can be. I did this in an hour on my laptop, with most of that time being zooming, highlighting, and generally exploring. It gives me a new appreciation for those who don tinfoil hats and curse the convenience and power of modern social networking. But I digress.
My high school experience was weird in a bunch of ways. Most of them aren't really relevant to my social graph, but one is particularly relevant to the social graph: I spent most of my freshman year of high school trying desperately to leave. The previous year, I had been rejected from the local magnet high school. The vast majority of my friends had gotten in and attended, leaving me with a grand total of five people in my year that I knew going in on day one. I spent my time trying to prove that I was good enough to get in for the next year, and not on forming friendships or having fun or other typical high school things.
I knew a fair number of people in the school from my Scout Troop, but they were all older than I was. I didn't really help my situation by enrolling in AP Computer Science, where I was one of something like three people who weren't seniors, and fate seemed to conspire against me making friends my own age when I was forcibly recruited to the Robotics Team (which is a funny story for another time. I would have joined anyways) and was the only Freshman there.
The point of all this is to emphasize that I started out with a higher-than-average percentage of my high school friends who were older than I was. Apparently, that percentage remained high enough that the clustering algorithm that I ran on my graph seven years later picked it up. On the version of the graph with names attached to each node (not published, for the sake of privacy), the red cluster can best be described as "high school friends who were older." This encompasses many of my Boy Scout friends, that first glorious year of Robotics, and other people I met along the way (APCS, Algebra II, and Latin II aren't really typical freshman classes).
Similarly, the darker blue blob at the top can be roughly described as "high school friends who are younger than I am." This is far and away people from the LHS Orchestra -- we were and still are pretty close-knit family. There are also various people I met from younger class-years in other situations, like Robotics Team in later years, Science Bowl, and other random encounters. The gigantic blue blob is a close friend from high school who also ended up going to CMU; her connections to CMU friends and LHS friends are what make her node so big.
Finally, the big ol' teal cluster is people who were my year. I had friends.
There isn't as much to say about my CMU subgraph. I've noticed that there are definitely clusters, but I would need to do some more digging to really uncover all of them. I can say that there are, as far as I can tell, four major sub-groups: The top-right cluster is CMU EMS, which is a tight-knit group by necessity: if we're going to save lives together, we'd better be friends. The top-left cluster, which somehow smashed itself into a long flat blob, is Roboclub. The cluster in the left-middle seems to be a combination of ScottyLabs folks and the CMU CS/Entrepreneurial community. Given how much those two groups tend to overlap, I'm not surprised that they ended up together. Finally, the big group in the bottom-right are my freshman-year floormates. Those were good times.
The little green group floating off to the left is an interesting artifact that I hadn't considered. Those are my co-workers from my brief stint as a rocket scientist. SpaceX does a fair amount of recruiting from CMU, so I'm not terribly surprised that team rocket is hanging off of the big CMU blob.
Old Friends and GLA
The purple group caught me completely off-guard when I saw it. What are 30+ people doing all connected nearly completely to each other but to almost nobody else? The answer was immediately obvious when I turned on names: GUBERNATORIS ACADEMIA LATINA!
In the summer of 2008, I had the opportunity to attend the Governor's Language Academy for Latin, a three-week state-sponsored summer academy devoted to studying Latin. It was awesome. For three weeks, 45 high school students from around the state, strictly limited at no more than 2 per high school, gathered together on the campus of VCU and bonded over a common interest. When we got back to technology at the end of those three weeks, we all immediately added each other on Facebook and have remained surprisingly close. So that was the mysterious purple group.
But what was this green thing? A relatively large group distinct enough not to have been absorbed into the high school bloc, and surprisingly well-connected to both the high school group and the CMU group. Again, names helped put it in perspective.
The green group are largely my elementary and middle school friends. Facebook became relevant to my social sphere some time in middle school, so we all added each other. Apparently I surrounded myself with some pretty smart people in my early years, or at least grew up in a pretty smart area: there are a ton of connections from my early childhood friends to Carnegie Mellon Computer Science and Engineering.
And The Rest
And, of course, there are the other groups. My family and family friends, who do most of their socializing off of Facebook, still managed a color group of their own (zoomed 5x):
And lastly are the totally random connections: a few people from my high school internship, my co-net control from the Pittsburgh Marathon, and a few other people I met along the way (magnified 10x):
All in all, it was a pretty revealing experience to play around with my social graph like this. Not only was it awesome to look at my life through the lens of the connections I've made, it was also eye-opening. Again, I did this on my laptop in an hour. If you're already imagining what could be done if you scaled this up, then you finally have some conception of what "big data" really means.
And I think that's awesome.