When Will Big Data Spill Over?

These mining firms, if you will, will eventually cave to selling to everyone.

What will we do with the data?

What will future for-profit mining operations be focused on?



The Embarrassment of Past Posts: SEARCH SUCKS

I just can’t stand how stupid I sound, even one year ago, what I wrote, what it was about, the words I used, the style, the punctuation.  I can’t really even go more than six months back in these posts before what is there makes me really uneasy.  I get really, really uneasy.  I feel embarrassed and stupid.

Meanwhile, the reason I originally believed in blogging, which I have believed since around 2005, is still a good reason for me to also believe that the Web, and especially personal archiving, can be better than some sort of vanity exercise.

Someone that argues with me a lot recently said that they had stopped blogging when they realized they were just contributing “noise.” I took this as a jab, as if I am making noise as well.  After thinking about it a bit, I have concluded that I am not making noise, not compared to the vast majority of sites that get click-thrus. On the few searches that I appear prominently on, I am the opposite of noise.  I give out helpful information.

Meanwhile, I do also talk about all sorts of non-helpful things.  I even make joke posts.  Is this bad for the Web?  Is it noise?

No.  Because we have plenty of room.  We don’t have a space problem, we have an SEO problem.  The fact that satisfaction with search results is plummeting is not a result of excess content.  You’ll notice it’s often the same sites that let us down.  The problem is that search is not improving.  Search is asleep at the wheel while the link-bate space has exploded.  Here we are.  Answers.666, about.666.com, just-666.com, these sites offer often very inaccurate information, mostly voted into place by people with no accountability!

The cluttering of the Web used to be something we could blame on people that make useless websites or spam identities.  But not anymore.  Search is failing to help us find better content.  And it’s really tempting to assume this is not an accident on the part of Search.



HashTags AreStupidAndSoAreSpacesSoWeShould AbolishSpaces

IfEveryoneJustTypedLikeThisAndWeEssentially AbolishSpaces AsOurPrimary WordSeparators WeWouldntNeed Hashtags BecauseEverythingWouldAlreadyBeSearchableTextStrings. WeCouldUseSpacesToIsolate KeyWords AndPotentially TrandingTopics. This WouldAlsoSaveCharactersWhen Tweeting EtcAndInGeneralItWouldSaveSpaceBecauseIt’s Compression. AbolishSpaces aka #abolishspaces

Revisiting Microformats/Semantic Web for Future Prediction Info

What if the content/facts of all wikipedia articles were semantically linked by a prediction modeling application?

Of course all of this linking would need to be done by the community. But I have a great deal of faith in the wikipedia community.

All that really needs to happen is for links in sentences that contain dates like “2054” or “June 11” or “05, 21, 1976″ to be declared as being a prediction, fact, speculation (or other specs) Etc (for now, via a rel=”prediction” or time=”Etc” type of thing) (I think it would have to work along with/within human language syntax for now, because I doubt people want to qualify every word they write with semantic markup, but to require lines that contain a date to have some rules isn’t too crazy)

Another piece is needed to tie in the actual information, but it could just be a link to the actual article in which it appears, which isn’t necessary because that’s where it’s coming from.  It’s a start.

In other words, “Show me predictions for 2054 based on wikipedia info” could give you articles that contain predictions for 2054. And you can easily get to those articles from the results.  Not as granular as “linked data” should be, but right now, the web is basically all about making it easier to look at selfies and bad journalism.

I think this could have a lot of research power.

Schema.org: New Semantic Markup Supported by Google and Bing (and Yahoo! (if yahoo search isn’t just bing))

The ‘Semantic Web’ is not nearly as hot of a topic as it was a few years ago, but if you remember, some of the efforts being made back in the old days (2008?) had to do with embedding semantic identifiers into regular old HTML.  The two examples that come to mind are RDFa and Microformats.  I haven’t heard a lot of buzz about embedded ‘linked data’ in HTML lately, but I heard today that a new project, called schema.org has been launched to enable developers to add markup to sites which will help search services glean meaning from markup.  Apparently, Google, Microsoft and Yahoo! are all on board with this project.

I guess we should call this Keywords 2.0

Anyway, they have a whole taxonomy of ‘things’ laid out.  Check out “The Type Hierarchy” page.  A great start.

I guess this means that a lot of SEO people are gonna start getting work again. It’ll be interesting to me to see if people start actually putting this stuff into their CMSs.  I suspect not.  I suspect that the kinds of companies that have such rich data that they can just rebuild the hooks they use as their apps render HTML will already be benefitting enough in organic search that they wont find a need to actually clutter up their code with this stuff.  I mean I find it very unlikely that a site like Disney’s would get out-ranked by some spammer because the spammer used these newer HTML attributes.

Then again, the fact that the major players are on board with this makes me wonder if there isn’t a reason that’s profitable to search companies to finally start getting rid of all the garbage from SERPs.  Touch-screen finger fatigue?  Even so, it’s all the damn spammers in eastern Europe that’ll have the resources to recode everything, at least in the near future.

Above all, I’m glad to see any attempt at making information more granular.  And deep down, I still want the universal distributed database we were all so excited about back in web2.0  when the semantic web seemed like it was on the horizon, before facebook and the mobile app-o-sphere took over.

What do we call this current era?  The API-o-sphere?  The Walled-garden-o-sphere?  Maybe we should just call it Facebook.

Intrigued and disappointed at the same time.


Tech Services Industries Areas to Watch (Pre-July 2009)

The following is a bunch of predictions.  Mark my words.  Three areas to pull out your wallet for.

  • Personal Web Hosting/Cloud/Sync/Backup Services – I’m not sure what to call this space that I think we’ll be seeing a lot of.  I don’t believe that these kinds of services will be bundled with mobile accounts anytime soon, but that’s clearly what will happen. The definition is this: Add-On ISP-like services that make mobile and desktop apps work together more effectively.  This would include backup services and services that bridge gaps across the various hardware networks we use.
  • Genealogy – The Baby Boomers love this stuff, and actually so do humans in general.  Who doesn’t want to know their own family history?  And with DNA analysis becoming more and more standardized, I think that Social-Media-Driven Genealogical Information will probably be mashed together with known hereditary data to create really compelling information services for average people.  The word “Rich” comes to mind but that’s really in the hands of designers and visionaries.  Imagine what’s going to happen in this space.  It blows my mind.
  • Library Sciences Related Anything – The so-called “Public Library” is probably about to explode into something much more tangled with our daily lives.  I believe that tax-funded Public Libraries are increasingly getting closer to being able to easily use cutting edge Information Technology to serve the public.  The abolition of hard-copy card catalogs went slowly.  But we’re in the age of Moore’s Law. It’s no stretch of the imagination that soon there will be title-to-isbn translators that cross language barriers and so on… But that’s just the beginning.  Imagine the Public Library as place that has cached, categorized databases from all sorts of sources, and Librarians as people helping you to mash data together (while you’re still at home in your underwear or on a train heading to work) …This idea is so hard to see for some people. I could go on for pages about the possibilities.  And for you asshole cynics, remember: Facts Cannot Be Copyrighted. “(b) In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.” …Libraries are worth so much to us as people.  And when they merge into a global archive of ‘verified’ sources, we’ll really start to see the Web’s potential.

What The Semantic Web Needs to Really Take Off

This is a draft version.  Suggestions welcome.

Short answer: People. 

What the Semantic Web (now officially called any number of other things besides that) needs in order to become mainstream, in my opinion, is people and the connections between them. The phrase “The Social Graph” comes to mind a la Brad Fitzpatrick‘s once famous, but now all but forgotten manifesto which even Tim Berners-Lee eventually commented on. 

The Semantic Web would catch on if it was seen as even remotely useful by the young people who are most likely going to be building the next big thing on the web.

The beautiful thing about the Web2 era is that highly useful tools can sprout up overnight simply because of the desires of more or less ordinary people with no credentials or affiliation with a company. Everyone knows someone who’s a programmer.  The next big social software application just might come from the bedroom of a teenager.  There is hardly any barrier to access anymore.  This is why Web 2.0 happened.  A new tool or service doesn’t need a business plan and a data center to launch and go viral.

The trajectory of innovation throughout the last five years or so, the “Web 2.0” years, has been around capitalizing on people, the content they create, their interests, and the value added by crowd-sourcing.  The benefits in the social media space are clear from both the perspective of normal end-users, as well as giant companies. Mostly, these benefits are about filtering noise and finding relevance on the user-side and on the giant company side, gathering metrics, targeting messages and acquiring free content.   The SemWeb standards have a lot to offer the Social Media realm, dare I say, probably even more than CSS with rounded corners does (I hope I’m not offending anyone here).  

But the way things are today, for most programmers, implementing SemWeb standards is a lot of extra work with no immediate benefit. Why not just use MySQL or cook up a new XML format?  

So why are these standards being completely ignored by the coders on the street?   RSS took off.  Why not FOAF? I think it’s because there’s no useful directory of URIs for people.  There are lots of SEmWeb geeks who have URIs, but the kids on MySpace and FaceBook don’t have URIs or FOAF files.  And those kids’ eyeballs and participation are worth real money!

One fine day, back in 2006, Tim Berners-Lee came down from the mountain and gave us a commandment (or at least he logged into his blog and made a suggestion):

“Do you have a URI for yourself? If you are reading this blog and you have the ability to publish stuff on the web, then you can make a FOAF page, and you can give yourself a URI.”

Then, apparently fifteen minutes after the first post was published, Berners-Lee really got at the importance of URIs in a post called Backward and Forward links in RDF just as important:

“One meme of RDF ethos is that the direction one choses for a given property is arbitrary: it doesn’t matter whether one defines “parent” or “child”; “employee” or “employer”. This philosophy (from the Enquire design of 1980) is that one should not favor one way over another. One day, you may be interested in following the link one way, another day, or somene else, the other way.”

For those of you who don’t yet understand the idea of the Semantic Web, here’s the deal.  If there’s one web-address that represents each person, place thing or idea, it becomes possible to crawl the Web (documents as well as databases) looking for links to that person place or thing. And if those links contain tags which specify the meaning of the links, the web-at-large begins to look more like a giant database.  This is the “Web of Data” (in contrast to the “Web of Documents” we know and love).  This is what people call The Semantic Web. So what’s stopping people from being in the “Web of Data” (AKA Semantic Web)?  Like Tim Berners-Lee suggested, we need URIs for people.  That’s where it all starts.  Once there are URIs for people, and there are semantic links (ones that contain tags explaining what they mean) pointing at the those URIs, we can start making tools that use that data.

This is a fairly simple concept.  And Berners-Lee makes it sound simple enough.  Sure, we’ll all just give ourselves URIs and viala, the Social Graph will go Semantic.  That sounds great but there are a few problems with leaving it at that.

  • Most ordinary people do not have websites or hosting of their own and instead rely on Social Networking Services’ profile pages for their web presence.  This means that most people have no way of easily publishing themselves to the Web of Data.
  • For-Profit Social Networking services have a conflict of interest with regard to providing the Web-at-large with useful, granular “Social Graph” data. Instead we see APIs that give approved developers limited access to data.  No love for the average joe like me that is not a programmer.     
  • The Web currently has no trustworthy repository for facts about ordinary people.  Trustworthy means not-for-profit at the very least.  The closest thing we have is Wikipedia, but Wikipedia does not allow entries on ordinary, non-notable people.  (keep in mind that the Wikipedia publishes the facts in its ‘info boxes’ in RDF one of the core Standards of what we have been calling ‘The Semantic Web’)  

We need to start thinking of the Web more like we think of a Public Library, but completely decentralized and with infinite shelf-space.  I think WikiMedia, the organization behind the Wikipedia is the best bet for a trusted librarian for all the information about normal people.

I think what is really needed right now is a non-profit run directory of people, possibly even modeled after the Wikipedia, especially when it comes to the concurrent DBPedia project, which publishes the contents of  Wikipedia facts to the Semantic Web.  Really I think because of WikiMedia’s established trust, they would be the ideal organization to do this.  Wikipedia could simply have another layer which reveals non-notable results or ‘all results.’

Teh Semantic Web is Dead (Linked Data is the Word)

As a major intaker of information about leading technologies, I am proud to say that at the time of the creation of this blog post, I am ahead of the game as far as declaring a change in the language we use to refer to the next phase of web evolution.

The term “web” has never been stronger. The “internet” goes on as something we mention almost every day. And the technologies that comprise the realm of what we have been calling semantic web, mainly markup standards, aren’t going anywhere.

But semantic web just fell out of favor as a [canditate for a] useful euphemism in our language.  The moment this became obvious to me was a few weeks ago  when I heard that Tim Berners-Lee spoke at TED and didn’t mention ‘the semantic web.’  A few weeks later I saw the video for myself and felt a certain sadness or abandonment when TBL talked about the geekiest dream ever, one that he created, without using the name I thought we had all agreed on for it, The Semantic Web.  Instead, he used a different euphemism for the most awesome library system ever conceived.  He called it “Linked Data.”

If you are a Semantic Web apologist like myself you might feel slightly deflated by a sudden change in terminology. I’m sorry.  I’m sure TBL is sorry too.  

But the reality is that “Semantic Web” is always going to be confused with Natural Language Processing, which is also a field of technology that is growing fast in its own right.  

No sustaining buzz has really caught on with “the semantic web,” as a catch phrase, beyond us geeks that are already sold on the idea.  Instead, we’ve recently heard more and more announcements (made usually by search companies) that include the word semantic as if the mere use of the word means that the company is doing something right.

The battle we’ve been fighting as SemWeb advocates is largely a battle for widespread awareness. TBL has said himself that the phrase semantic web wasn’t the best choice of words.   

I’m sure TBL spent at least an afternoon considering what he might say to the audience at TED which arguably consists some of the most influential people in the world.  I’ve concluded that he intentionally abandoned the phrase, in preparation for a brighter future in which the SemWeb technologies are no longer so easily confused with other technologies.  We’ve changed our name.

If you feel the re-branding is unfair, consider who has more right to the word semantic, the Natural Language people or the Interchangeable Data Format people?  

We lose.

Sorry.  We need to move on. 

The Semantic Web is now called Linked Data.  It’s official.  Take a deep breath, change your notes.  And let’s move on as Linked Data enthusiasts, not Semantic Web enthusiasts.

I will lead this effort by removing the category of “The Semantic Web” from this site and replacing it with “Linked Data.”  I’ll do it later this week.  I need some time to say goodbye.

Semantic Web is Taking Forever, Right?

As a hardcore Linked-Data/Semantic Web Enthusiast for some time now, say since pre-2007 (back then, I didn’t know what to call it but I understood that it was possible), I can’t help but feel sometimes like it’s never going to happen.  Sometimes a non-silo Web seems like a idealistic fantasy.  Sometimes it seems like nothing is happening.  During the first half of 2007, the amount of excitement in the Sem-Web Category of my feed-reader was high.  Since then, however, the excitement level seems to have diminished quite a bit.  Am I right?

I want to offer a few condolences and some evidence that the Semantic Web is not dead. In fact, I believe it’s still going to “happen.”

  1.  Tim Berners-Lee spoke at TED this year, apparently urging people to unlock their data, according to GigaOm (TED, please publish this video soon, OK?). TED has a quickly growing  amount of influence in the mainstream from what I can tell.  This is good outreach. 
  2. JavaScript support for querying more than one URL/Site/Database at a time is coming to a browser near you very soon, according to John Resig via this talk at Google. We’ve seen a lot of new APIs allowing programmers to access certain data from certain places, but more promising to me than these limited and proprietary APIs that have been sprouting up is how HTML itself is increasingly becoming more ‘semantic,’ if for no other reason, because it allows coders to do more interesting and elegant things with CSS and JavaScript… Where this is heading, I think, is toward a future where pages are basically designed to be scraped, a sort of Microformat revolution (albeit totally rag-tag). Once the cat is out of the bag, I really believe embedded HTML semantics will become more and more standardised because of the incremental benefits resulting for the publishers of the content.  What I’m talking about here is mainly Classes and ID’s in HTML.  Give it some time. Those things are basically Microformats waiting to happen.  Right? 
  3. Last but not least, remember that the emergence of “Linked Data” will probably seem to explode at a certain point, even though the buzz seems to have slowed down in the echo chamber.  There’s a great little analogy I came across where data are compared to buttons being threaded together from one to the next, randomly and one connection at a time.  How many random single connections need to be made before picking up one button will bring all the others along?  The results are reassuring. Check it out over at the Data Evolution Blog, the newest feed in my Feed-Reader.

    Zemanta: Real-Time Semantic Discovery & Blogging Tool

    Trying out Zemanta, a service for finding related resources. 

    They make Plugins for WordPress, TypePad and other blogging platforms, as well as extensions for both FireFox and IE.

    Currently, as I’m writing this, the Zemanta plugin is only giving me a “Loading Zemanta…” message… I figured Zemanta’s database would likely have plenty of articles about Zemanta.  Maybe not.

    We’ll see.  Very cool idea either way.


    I guess the first time I loaded my WordPress Dashboard’s Editing page, Zemanta took a little while to load… Ever since it’s been super fast.

    Pretty cool little Plugin. 

    Reblog this post [with Zemanta]

    Technology Predictions for 2009

    first of all, my last prediction-for-next-year was a little optimistic, as I was predicting what people in the echo chamber have since started calling ‘cloud computing…’ I predicted that we’d see a lot of online services that blur the lines between what is ‘local’ and what is an online ‘service.’  …let me just defer that prediction one year and add it to the heap of what I see coming this year.  At least give me credit for making it my major prediction before the catch-phrase ‘cloud computing’ came to the surface.

    1. Linux Will Come and Start Killing. Google Android, Ubuntu Mobile, Asus’ recent release of EEE PC’s running Linux, all point for me to the fact that Linux is finally coming to a device near you.  Of course, Linux never went away, but I’m talking about real OS Market share.  In addition, I wouldn’t be surprised if the coming popularity of Linux also dishes out a major hit to Microsft because I bet it’s easier to port software made for Linux to Mac OS X than it is to port it to Windows since OS X is built on Unix.  Just something to consider.  Also, if you haven’t been looking, take a look at Ubuntu.  It’s a pretty nice OS and will run on anything, maybe even your toaster.  And it’s free!
    2. AJaX Will Continue to Prevail as the Shiznit in Web Development (while Flash and others continue to die).  Because of the nature of touch-screen interfaces and because we will increasingly see the deployment of Navigation and Map-based services as well as virtual world type applications, where a scalable simulated 3-D space is used, I think AJaX is likely to continue to become the way things are done.  At this point, I’m starting to doubt the long term success of Flash, AIR, Silverlight because I think Javascript can do what these things do better.
    3. Affordable Smartphones. Maybe this is a no-brainer, but when I say affordable I mean $100 or less.  I’m not predicting at this time affordable connectivity for these devices. I know gadget enthusiast might hate me for saying this, but I think the Handset Race and the Netbook Race are very overlapped.  They are both fighting for certain causes together such as improvements to battery life, cheapening of Solid State storage, cheapening of Mobile Connectivity, The need for competition in the OS market and the need for “Thin” software, not to mention ‘Cloud’ services… 
    4. Ubiquity of Navigation Systems and/or GPS. From my understanding, cellular networks are already able to provide location info nearly as accurate as true GPS.  There’s no reason for the next wave of phones to not have on-board GPS capability or something similar that offers driving directions etc.
    5. Google Will Roll Out Geo-Targeted Advertising for Realz. Via GPS/Navigation devices probably, but even desktop search should see a shift in this way. Try searching for ‘pizza.’ You can see there’s big room for improvement there.
    6. Google Search to Shape Up or Start Shipping Out. Google may begin losing Search market-share in 2009 if they don’t play their cards right. Google’s Search Results haven’t changed noticeably since they started putting Wikipedia articles at the top of the stack a few years ago.  Personally, I think Google is intentionally not releasing major improvements to their results in order to avoid being an unofficial API for competing services. Again, search for ‘pizza.’ Then, add your postal code to the search. The funny thing is that Google already knows where you are, more or less, based on your IP address. Meanwhile, other search engines are actually better for many kinds of searches. Try Yahoo! for ‘pizza.’ Try Dogpile for finding an mp3. Google is capable of being better than these right now, in my opinion, but intentionally holding back, banking on the idea that their mindshare will carry them along until the next era, probably brought on by the ubiquity of GPS and Smartphones.  Even if Google loses a considerable amount of its Search traffic, it will continue to be the biggest hub of online metrics collection, as well as of course, online advertising, where Google makes all its money.  I don’t think Google is going anywhere any time soon.

    Kevin Kelly on the Next 5,000 Days of the Internet-TED, 2007:

    Kevin Kelly gave this talk at TED in 2007.  It’s worth watching.  

    He touches on a number of things ranging from history of the Internet and Moore’s Law to the future ubiquity of Cloud Computing and Kurzweil‘s “Sigularity.” 

    He covers concepts like the Semantic Web, and the give-and-take between privacy and participation with relatively light language that any lay person should be able to understand.  This is an interesting and entertaining little presentation.  Thought I’d share.

    How Will I Organize My Tags? An App? MOAT? A Feature in Delicious?

    Here’s my dilemma. I have a ton of bookmarks on my Del.icio.us account.  I love using an online bookmarking system. But still, Delicious and others’ systems for organizing bookmarks don’t really help with a need I bet most users have: Tag-Optimization.  

    What we need are tools for analyzing and perfecting the organizing of bookmarks.  Every one of these systems like Delicious, Furl, StumbleUpon etc, have the same problem: user-submitted tags are bug-y!!! The engine of the platform needs to guide the users toward better tagging!  Basically, we need built-in systems for finding the types of redundancies and other tag-errors that we all have. We need debugging software, so our bookmarks can become good, clean representations of how web-users feel about various web resources.  “Suggested Tags” and “Popular Tags” are great time-saving features but I’d like to also have a tool for correcting tag-cancer.  
    These software offerings, if/when they finally exist, are going to make it increasingly more easy to harmonize user-submitted value from folksonomies with the ‘Semantic Web,’ which is right around the corner.
    Some examples of areas where I think a robot could help users to clean up tags are:
    • Redundant Tags. Usually just alternate tenses of the same word (like the plural and singular form) but also synonyms. Example: Image, Images, Picture, Pictures, Pix
    • Arbitrary Capitalization. HTML vs html etc.
    • Vagueness. Like los or awesome (wouldn’t it be safe to assume that all the things you bookmark are ‘awesome’ to you?’). 
    This is a screen-shot of my tagging screen from Delicious.  I added the red scribbling to point out just a few of the problems my tags have.
    Del.Icio.Us Tags Gone Wild
    Del.Icio.Us Tags Gone Wild

    On several occasions, I’ve set out to clean up my tags manually, but I’ve never made it very far.  It’s just too much work.

    Maybe the coming overhaul to Del.Icio.Us will ad some of these needed features, although somehow I doubt it.

    I’ve heard of the MOAT (Meaning Of A Tag) Project, and perhaps this could save us, but like many other ‘Semantic Web’ projects, I haven’t found a way, as a lay person, to utilize it.  At some point down te road,  maybe someone will make a Delicious-MOAT-erizer Web-App that will clean-up-shop-by-proxy and make the metadata available to the Semantic Web.

    I Don’t Want FaceBook Comcast To Buy Plaxo

    Update… this was actually news back in January.  Coincidentally, today it was announced that Comcast is buying Plaxo.  Goodbye Plaxo.  Nice knowin’ ya.

    Got the rumor tip from Scoble (there’s no real info there so don’t bother)

    Plaxo? Are you listening?  Keep doing what you’re doing, stay behind the scenes, work on enabling users to publish their own data, at will, in Semantic Standards as they become timely (now?) and stay independent of the little tug-of-war between closed, albeit increasingly API-enabled social apps.   You’re better than them!  Hang in there and you’ll be worth way more!  Don’t turn to the dark side!

    Competition for traffic will get everyone using RDF and Microformats soon enough…  Semantics are like SEO 2.0… The next bandwagon everyone will want to pay way too much for.

    Plaxo, you’re in the perfect spot to make money on this.  Think Virtual Private Networks, Semantic Publishing to the Web, and Semantic Productivity Tools at home.


    Usefulness of DataPortability.org’s Rel=Me project

    In the suggested reading section of the page for the DIY Rel=”Me” project over at dataportability.org’s wiki, There’s a link to this blog post, which is an attempt to explore the usefulness of rel=”me” to the regular old web user.  The article is slightly tunnel-visioned at what you can or can’t do with your browser to exploit MicroFormats.  Of course, being able to detect locations or personal contact info thru a browser extension is useful and I’m all for it, but beyond a few obvious exceptions like those, The Semantic Web, MicroFormats included, wont be much use to us at the level of the browser.  We will still need Web based portals or “Libraries” or “repositories” or “Catalogs” or what have you, to connect to, in order to really take advantage of this stuff.  Semantic markup on pages is great. RSS is an example of how a little bit of semantics can go a long way.  But what’s of greater significance is the idea of the Web Of Data, where resources are “semantically” interconnected, by leveraging information that’s mapped to the domain of knowledge where it’s useful and the relationships between resources are also specified in a machine-understandable way.

    Rel=”me” is the equivalent of saying “The person represented by this URL is the same person as the person represented by this other URL.”  Taking that into consideration, imagine how this would effect the experience of searching the “Web of Documents.”  I argue that if enough of us implement rel=”me” (or other microformats or RDFa) in our HTML pages, we will empower the Googles and Yahoos to take advantage to knowledge expressed by this markup.  So let’s do it!  

    Quotes from the Article I mentioned:

    “…So assuming that you went through the trouble to write up your HTML with rel=me, what next, where is that information actually consumed. I don’t think the 2 most popular browsers (IE 7 and Firefox 2) at this time have native support for XFN, I hear Firefox 3 is suppose to have native microformat support but I haven’t looked for it and if it is there, it isn’t immediately obvious to me. The closest thing I can find is a Firefox plugin called Operator. Operator is a microformat capable reader and for the most part seems to be able to consume most of the above microformat standards except rel=me, kind of odd but kind of understandable…”

    “…At this time, I can honestly say that XFN rel=me proliferation is limited and experimental at best. It would take a while for mass adoption to happen and requires a lot of user education, adoption by popular social sites like Facebook, MySpace, etc, and native browser support…”


    I commented there and when I take the time to write a long comment out, that isn’t something I’ve already written in so many words here, I like to steal my own comment and put it here for anyone who reads my blog.  My response:

    I felt like I had to chime in and point out that the point of MicroFormats or RDFa isn’t really to make an overnight change in how we use the Web. It’s to create a backbone of linked data so that as Search Engines and other “Libraries” begin to have stores of these relationships between documents and other resources available to work with, they can begin to improve their services. It will be nice when Search is only partly based on scanning for text-strings or combinations of words.

    If you were looking for Andrew in Sebastopol, CA, how would you do it? Perhaps you’d google “Andrew Sebastopol CA…”
    But what if you could specify that you are looking for a person?
    What if you could specify geocoding info or otherwise specify that Sebastopol is a town in Northern California?
    What if you could filter your results by the time web-pages were created or filter by domain specifications (like show me wiki articles first or show me all MySpace profiles) or filter by type of site like say, show me blogs only, and finally, and this is where rel=”me” comes in, what if you could specify in your search results that you want to see every other document that is an expression of the same person, once you have selected from your query, a person named Andrew who lives in Sebastopol, CA? This is what it’s all about. It works because links work backward. In other words, you can already say “show me all the pages that link to this thing…” but what about being able to say “show me all the pages linking to this Twitter page that link using rel=”me” or better yet, show me all the pages linked to with rel=”me” from any page that links to this twitter page with rel=”me” …And so on…

    The Web is becoming a library. By adding microformats and other semantic markup to our documents, we are making it possible for decent “card-catalogues” to be built, whether they’re being built by google, yahoo! or the guy down the street.

    DataPortability In Motion Podcast

    A weekly roundtable discussion about the DataPortability Project in specific, and efforts involved in data portability in general. The show is produced and hosted by J. Trent Adams and Steve Greenberg.

    PodCast is HERE


    I recommend Episode 7 


    We kick off episode 7 of the DataPortability: In-Motion Podcast with the news of the week that MySpace launched “Data Availability” with Yahoo!, eBay, Photobucket, and Twitter. Following immediately on their heels was the announcement that Facebook is releasing “Facebook Connect”, an extension of their 3rd party API providing deeper access to their user’s data.


    We’re also joined by Brady Brim-Deforest, founder of Human Global Media, talking about the DataPortability Legal Entity Taskforce. He provides a good overview and update on the process underway to formalize the the project under a recognized legal banner.

    The featured interview segment is with Danny Ayers, Semantic Web Developer at Talis. He touches on moving from document linking, through microformats, to feature-rich RDF modeling to identify portable data. Contrary to popular belief, he dispels the myth that it’s hard to migrate from a standard SQL data representation into addressable semantic objects.

    Danny regularly posts on the following sites:

    Also mentioned in the episode:


  1. Planet RDF
  2. Apture! Multiple Resources From One Link. This is Really Cool.

    I heard about this through Lawrence Lessig’s blog. Professor Lessig is taking the month of May off, and off the grid, which I applaud him for.

    What this web app does is allow you to make links that, through the free Apture service for your site, link to numerous resources, all previewable via the same sort of javascript popup you get from Snap or the ZitGist “zLinks” plugin.

    You must see this in action. This is inspiring. It shows how much more dynamic web pages can and will be in the near future. I’m a bit sick of the over-use of javascript, ajax, whatever you want to call it. It tends to be resource-heavy on your machine. This is an exception.

    I wonder if these guys are going to implement any Semantic technologies into the data they store… I wonder if they’re going to make deals with bookmarking services like del.icio.us… All my words could automatically be links to mini-libraries of items I’ve bookmarked! It’d look a little ugly given the current style conventions but hey. Let’s change those.

    It’s interesting to me to ponder how this non-semantic-web service, because it’s also a library/bookmarking tool, could become hugely useful to the Semantic Web as they snatch up web user’s resources/web-bibliographies.

    Oh man. This is a hot item!

    “wrote an interesting post today” SEO, Evil Robots and One Sad Outcome of Non-Semantics in Spam-Control/Search

    I’ve mentioned before how increasingly the ‘Live Web’ or ‘Blogosphere’ (or whatever you want to call this thing) is being infiltrated by Robot Blogs. What they appear to be doing is crawling the web and scraping excerpts of blog posts and reposting the excerpts, linking back to where it came from. They usually say:

    “[KeyWord] wrote an interesting post today”

    Since they link back to the blog post they scraped, they show up as a trackback in the comments area of the original post. This way, the unsuspecting blogger is linking to the fake blog. The fake blogs seem to be set up in an attempt at monetizing traffic via adsense ads.

    I googled the phrase “wrote an interesting post today” and the top hit was (I probably am the top hit now) some blogger talking about filtering any comment that contains the phrase “wrote an interesting post today.”

    I had decided to change my little tagline thingy to this exact phrase as a sort of inside joke for bloggers, but found myself wondering if being associated with that phrase will adversely effect my findability. Perhaps Search Engines or Spam Filters will begin to look out for that phrase?

    Already, I bet there are tons of bloggers who filter out comments containing words like “viagra” or “casino,” assuming that there is absolutely no context in which these words could be used in a legitimate discussion. The fact that I am using those words here is proof that there is such a thing as a legitimate discussion which contains them.

    Filtering for a word or phrase seems to me to be a slippery slope, especially if we’re talking about Search Engines, since they act as our main interface to the Web.

    Google: Please don’t hate me because I said Viagra. I’m not a spammer.

    PHP Application Turns MySpace Friends Into CSV – View/Mine in Excel Spreadsheet Etc

    My friend threw together an app that scrapes your MySpace contacts and puts useful info into a reusable format.


    UPDATE: It’s also available as a Torrent via The Pirate Bay. Please consider seeding this. It’s a tiny, tiny file.

    Here’s the Read Me info I just put together to go with it:

    and change those.
    Save the file.
    Upload these two files to your server.
    point your web browser to http://where-you-put-the-file-on-your-server/ms_test.php
    and what will result is a CSV file of all your MySpace friends and their demographic information. Also included is the URLs to “send message” etc, and some other useful things.
    View the source of the page and copy it into a PlainText text file
    Name the text file with the extension .csv
    Now you should be able to work with your myspace friends in Excel

    There is nothing malicious about this simple application. No viruses, spyware etc. It only does what it’s supposed to do: scrape your friends so you can more easily work with your social network data.

    If you are of the camp that feels that people scraping their own myspace contacts is unethical, I suggest that you consider that all the pages are already available and the data they contain is rendered in HTML which can be freely accessed already. This is just a tool to make it easier to get the useful data separated from the clutter.

    Finally, this is possibly against MySpace’s Terms Of Service, so use at your own risk.