Personal data integration (part 1)
I’ve been toying with the idea of attempting “semantic integration” of a lot of personal data in my life. I’ll be sure to share more later, but so far I’ve managed to pull together my September phone records, my email history, my contacts, my calendar and my Facebook friends (via the API, not something sketchy!) into a single triple-store.
Using this data, I was able to create this chart, which shows my friend network (I have removed myself and Brooke, since we’re connected to everyone and it ruins the layout). The people who I emailed, texted or called in September are shown in green.

You can see tight clusters of my friend groups. The tightest is the big hairball near the bottom that makes up much of Brooke’s Stanford GSB class, but also clear are groupings for my friends from MIT, Chapel Hill, Boston (post-MIT return), my San Francisco tech friends and my family. My family is the only group that is isolated from the rest of the graph — everyone else is connected, which is partly because I’ve introduced some of these groups to each other, and partly just because it’s a small world.
Also good to see is that almost every cluster has at least one green node (my family notably doesn’t, but that’s because my parents aren’t on Facebook), so I’ve generally done a good job of keeping in touch with at least a few people from different phases of my life.
There’s a lot of talk about breaking the silos in the enterprise and, in the semantic-web community, data integration across the entire web. But right now, people don’t even have decent integration across their own personal information. The current proliferation of single-feature applications encourages you to store different aspects of your life in different places — the advantage of course, is that something highly specialized is much more pleasant to use, but the disadvantage is that there’s no way to query across these aspects. I’m interested in experimenting with ways that help people “break the silos” with their own information, in the hope that this will both yield useful applications and help us get a better grip on the bigger problems.
I now have code to keep my triple-store synced with my friend network, my contacts, my phone records, my email and my calendar. I can construct queries across all of this (who did I forget to call on their birthday? Who have I seen recently who went to Stanford?). I’ll be sharing this code at some point, but I want to see how far I can take this. I’m also interested in hearing from anyone who has tried similar experiments and wants to collaborate.
So, anyone have any thoughts on other sources of personal data or questions you might want to ask once it’s integrated?
October 14th, 2008 at 4:43 pm
Just realized, I think my IM conversation logs are also something to incorporate. I end up staying in touch with a lot of people that way.
October 14th, 2008 at 4:59 pm
Nice stuff Toby. IM conversations? I know a bit about that
October 14th, 2008 at 8:03 pm
this is awesome..definitely looking forward to more revelations.
October 14th, 2008 at 11:31 pm
So good, I am a fan
Are there some comments you could add (positive or negative) on how it might be to combine these connections with text analysis of calendar, email and IM content (as another set of connections or more info about what’s going on in that hairball)?
October 14th, 2008 at 11:34 pm
awesome job… my ideas towards this is to mash-up triples with changing time-geo-spatial info in this network of friends. For example when you set on your calendar, “Meeting with Tom in SF Tuesday”, to bring up other friends that might be in the same city or other events you might have bookmarked as interesting and happening Tuesday in SF… I’m really curious what apps you use for your online life, calendaring etc and if/how you can get RDF triples out from all of those !
@agbiotec on Twitter
October 15th, 2008 at 12:40 am
Toby,
So cool. Like Ntino, I am curious about how you generated the data
October 15th, 2008 at 11:43 am
I am developing a reverse auction website which will track product selections and reverse bid to get an optimal price. There are parallels between product selections (eg what part is similar to this part, what buying pattern is similar, what is suitable for this search pattern). So if it looks like code can be shared/re-used then contact me.
So your friend network would be a parallel to a product network, based on searches and traffic selections, hence you like fast cars therefore you like adventure holidays, hence squba gear. For example if P (person) relates to S (search pattern) with a certain probability and S relates to BL (buyer’s list) then P relates to BL with a certain probability. Clearly there are hundreds of possibilities such as P is Young, Young relates to pop music.
Extensions would be Product relations to qualities (eg colour, traditional).
This a real business and I am doing the groundwork now.
October 15th, 2008 at 12:52 pm
Where did you get your phone data? For a cell phone you might find http://skydeck.com/ useful. We parse bills/usage from your carrier and you can get the data back in JSON.
October 15th, 2008 at 1:19 pm
Hey this is awesome, is there a place where you have enlisted the steps you followed to generate the map?
October 20th, 2008 at 12:06 pm
looking forward for more information about this. thanks for sharing. Eugene
November 21st, 2008 at 1:48 pm
Seconding Chirag: this is cool, and I want details. Step-by-step instructions would be awesome, but even a list of software you used would be better than nothing.
December 7th, 2008 at 1:41 pm
Nice post. Thank you for the info. Keep it up.
January 1st, 2009 at 11:10 pm
Attention All Site Owners: The following website openly promotes unfair tactics to gain high ranking in search engines! blackhatbootcamp.com Their members use dark art scripts free of charge. Those people are ruining the web! AVOID them at all costs!
January 28th, 2009 at 3:20 pm
My only complaint is that my name should be in a bigger font with, maybe, animated gifs of shooting stars or dancing hamsters around it.
February 9th, 2009 at 2:06 am
anybody in the EU who is interested in developing technology for advanced management of personal information should consider submitting a proposal for funding under Strategic Objective ICT-2009.4.3 of the European Commission’s Framework Programme 7. Deadline for proposals is 3 November 2009. Feel free to write to stefano.bertolo@ec.europa.eu for more info.
April 6th, 2009 at 9:57 am
Hello,
Just found you via a google ad in my gmail. But I am curious what software you used to map out the various relationships?
Thanks, time to read some more posts as I have time.
April 15th, 2009 at 2:20 pm
If you ever want to read a reader’s feedback
, I rate this post for 4/5. Decent info, but I just have to go to that damn yahoo to find the missed parts. Thanks, anyway!
September 5th, 2009 at 6:18 pm
Присоединяюсь, к комментариям! Добавлю в избранное!
September 13th, 2009 at 2:30 pm
Отличная тема! Будет интересно прочитать развитие событий.
September 23rd, 2009 at 2:42 am
Спасибо) есть что то интересное))
September 24th, 2009 at 4:03 am
Интересно правда было?
November 1st, 2009 at 11:12 am
Very Nice site!! keep it up! Cheap WebHosting http://www.usnetxxx.com
November 11th, 2009 at 7:37 pm
If you are looking for any gogo hamsters then you can definetly find them here.
November 26th, 2009 at 10:50 am
To much gaming is bad for your health you will get fat! there are loads of other things you can do in life. but still a awsome blog
November 29th, 2009 at 9:08 pm
It is good to see you make a post on this topic, I have to book mark this site. Just keep up the good work.
December 20th, 2009 at 8:38 pm
If you are looking for any gogo hamsters then you can definetly find them here
December 25th, 2009 at 1:09 am
awesome post, I am wondering how you generate this map.