When Will Big Data Spill Over?

These mining firms, if you will, will eventually cave to selling to everyone.

What will we do with the data?

What will future for-profit mining operations be focused on?



HashTags AreStupidAndSoAreSpacesSoWeShould AbolishSpaces

IfEveryoneJustTypedLikeThisAndWeEssentially AbolishSpaces AsOurPrimary WordSeparators WeWouldntNeed Hashtags BecauseEverythingWouldAlreadyBeSearchableTextStrings. WeCouldUseSpacesToIsolate KeyWords AndPotentially TrandingTopics. This WouldAlsoSaveCharactersWhen Tweeting EtcAndInGeneralItWouldSaveSpaceBecauseIt’s Compression. AbolishSpaces aka #abolishspaces

Revisiting Microformats/Semantic Web for Future Prediction Info

What if the content/facts of all wikipedia articles were semantically linked by a prediction modeling application?

Of course all of this linking would need to be done by the community. But I have a great deal of faith in the wikipedia community.

All that really needs to happen is for links in sentences that contain dates like “2054” or “June 11” or “05, 21, 1976″ to be declared as being a prediction, fact, speculation (or other specs) Etc (for now, via a rel=”prediction” or time=”Etc” type of thing) (I think it would have to work along with/within human language syntax for now, because I doubt people want to qualify every word they write with semantic markup, but to require lines that contain a date to have some rules isn’t too crazy)

Another piece is needed to tie in the actual information, but it could just be a link to the actual article in which it appears, which isn’t necessary because that’s where it’s coming from.  It’s a start.

In other words, “Show me predictions for 2054 based on wikipedia info” could give you articles that contain predictions for 2054. And you can easily get to those articles from the results.  Not as granular as “linked data” should be, but right now, the web is basically all about making it easier to look at selfies and bad journalism.

I think this could have a lot of research power.

Schema.org: New Semantic Markup Supported by Google and Bing (and Yahoo! (if yahoo search isn’t just bing))

The ‘Semantic Web’ is not nearly as hot of a topic as it was a few years ago, but if you remember, some of the efforts being made back in the old days (2008?) had to do with embedding semantic identifiers into regular old HTML.  The two examples that come to mind are RDFa and Microformats.  I haven’t heard a lot of buzz about embedded ‘linked data’ in HTML lately, but I heard today that a new project, called schema.org has been launched to enable developers to add markup to sites which will help search services glean meaning from markup.  Apparently, Google, Microsoft and Yahoo! are all on board with this project.

I guess we should call this Keywords 2.0

Anyway, they have a whole taxonomy of ‘things’ laid out.  Check out “The Type Hierarchy” page.  A great start.

I guess this means that a lot of SEO people are gonna start getting work again. It’ll be interesting to me to see if people start actually putting this stuff into their CMSs.  I suspect not.  I suspect that the kinds of companies that have such rich data that they can just rebuild the hooks they use as their apps render HTML will already be benefitting enough in organic search that they wont find a need to actually clutter up their code with this stuff.  I mean I find it very unlikely that a site like Disney’s would get out-ranked by some spammer because the spammer used these newer HTML attributes.

Then again, the fact that the major players are on board with this makes me wonder if there isn’t a reason that’s profitable to search companies to finally start getting rid of all the garbage from SERPs.  Touch-screen finger fatigue?  Even so, it’s all the damn spammers in eastern Europe that’ll have the resources to recode everything, at least in the near future.

Above all, I’m glad to see any attempt at making information more granular.  And deep down, I still want the universal distributed database we were all so excited about back in web2.0  when the semantic web seemed like it was on the horizon, before facebook and the mobile app-o-sphere took over.

What do we call this current era?  The API-o-sphere?  The Walled-garden-o-sphere?  Maybe we should just call it Facebook.

Intrigued and disappointed at the same time.


Tech Services Industries Areas to Watch (Pre-July 2009)

The following is a bunch of predictions.  Mark my words.  Three areas to pull out your wallet for.

  • Personal Web Hosting/Cloud/Sync/Backup Services – I’m not sure what to call this space that I think we’ll be seeing a lot of.  I don’t believe that these kinds of services will be bundled with mobile accounts anytime soon, but that’s clearly what will happen. The definition is this: Add-On ISP-like services that make mobile and desktop apps work together more effectively.  This would include backup services and services that bridge gaps across the various hardware networks we use.
  • Genealogy – The Baby Boomers love this stuff, and actually so do humans in general.  Who doesn’t want to know their own family history?  And with DNA analysis becoming more and more standardized, I think that Social-Media-Driven Genealogical Information will probably be mashed together with known hereditary data to create really compelling information services for average people.  The word “Rich” comes to mind but that’s really in the hands of designers and visionaries.  Imagine what’s going to happen in this space.  It blows my mind.
  • Library Sciences Related Anything – The so-called “Public Library” is probably about to explode into something much more tangled with our daily lives.  I believe that tax-funded Public Libraries are increasingly getting closer to being able to easily use cutting edge Information Technology to serve the public.  The abolition of hard-copy card catalogs went slowly.  But we’re in the age of Moore’s Law. It’s no stretch of the imagination that soon there will be title-to-isbn translators that cross language barriers and so on… But that’s just the beginning.  Imagine the Public Library as place that has cached, categorized databases from all sorts of sources, and Librarians as people helping you to mash data together (while you’re still at home in your underwear or on a train heading to work) …This idea is so hard to see for some people. I could go on for pages about the possibilities.  And for you asshole cynics, remember: Facts Cannot Be Copyrighted. “(b) In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.” …Libraries are worth so much to us as people.  And when they merge into a global archive of ‘verified’ sources, we’ll really start to see the Web’s potential.

What The Semantic Web Needs to Really Take Off

This is a draft version.  Suggestions welcome.

Short answer: People. 

What the Semantic Web (now officially called any number of other things besides that) needs in order to become mainstream, in my opinion, is people and the connections between them. The phrase “The Social Graph” comes to mind a la Brad Fitzpatrick‘s once famous, but now all but forgotten manifesto which even Tim Berners-Lee eventually commented on. 

The Semantic Web would catch on if it was seen as even remotely useful by the young people who are most likely going to be building the next big thing on the web.

The beautiful thing about the Web2 era is that highly useful tools can sprout up overnight simply because of the desires of more or less ordinary people with no credentials or affiliation with a company. Everyone knows someone who’s a programmer.  The next big social software application just might come from the bedroom of a teenager.  There is hardly any barrier to access anymore.  This is why Web 2.0 happened.  A new tool or service doesn’t need a business plan and a data center to launch and go viral.

The trajectory of innovation throughout the last five years or so, the “Web 2.0” years, has been around capitalizing on people, the content they create, their interests, and the value added by crowd-sourcing.  The benefits in the social media space are clear from both the perspective of normal end-users, as well as giant companies. Mostly, these benefits are about filtering noise and finding relevance on the user-side and on the giant company side, gathering metrics, targeting messages and acquiring free content.   The SemWeb standards have a lot to offer the Social Media realm, dare I say, probably even more than CSS with rounded corners does (I hope I’m not offending anyone here).  

But the way things are today, for most programmers, implementing SemWeb standards is a lot of extra work with no immediate benefit. Why not just use MySQL or cook up a new XML format?  

So why are these standards being completely ignored by the coders on the street?   RSS took off.  Why not FOAF? I think it’s because there’s no useful directory of URIs for people.  There are lots of SEmWeb geeks who have URIs, but the kids on MySpace and FaceBook don’t have URIs or FOAF files.  And those kids’ eyeballs and participation are worth real money!

One fine day, back in 2006, Tim Berners-Lee came down from the mountain and gave us a commandment (or at least he logged into his blog and made a suggestion):

“Do you have a URI for yourself? If you are reading this blog and you have the ability to publish stuff on the web, then you can make a FOAF page, and you can give yourself a URI.”

Then, apparently fifteen minutes after the first post was published, Berners-Lee really got at the importance of URIs in a post called Backward and Forward links in RDF just as important:

“One meme of RDF ethos is that the direction one choses for a given property is arbitrary: it doesn’t matter whether one defines “parent” or “child”; “employee” or “employer”. This philosophy (from the Enquire design of 1980) is that one should not favor one way over another. One day, you may be interested in following the link one way, another day, or somene else, the other way.”

For those of you who don’t yet understand the idea of the Semantic Web, here’s the deal.  If there’s one web-address that represents each person, place thing or idea, it becomes possible to crawl the Web (documents as well as databases) looking for links to that person place or thing. And if those links contain tags which specify the meaning of the links, the web-at-large begins to look more like a giant database.  This is the “Web of Data” (in contrast to the “Web of Documents” we know and love).  This is what people call The Semantic Web. So what’s stopping people from being in the “Web of Data” (AKA Semantic Web)?  Like Tim Berners-Lee suggested, we need URIs for people.  That’s where it all starts.  Once there are URIs for people, and there are semantic links (ones that contain tags explaining what they mean) pointing at the those URIs, we can start making tools that use that data.

This is a fairly simple concept.  And Berners-Lee makes it sound simple enough.  Sure, we’ll all just give ourselves URIs and viala, the Social Graph will go Semantic.  That sounds great but there are a few problems with leaving it at that.

  • Most ordinary people do not have websites or hosting of their own and instead rely on Social Networking Services’ profile pages for their web presence.  This means that most people have no way of easily publishing themselves to the Web of Data.
  • For-Profit Social Networking services have a conflict of interest with regard to providing the Web-at-large with useful, granular “Social Graph” data. Instead we see APIs that give approved developers limited access to data.  No love for the average joe like me that is not a programmer.     
  • The Web currently has no trustworthy repository for facts about ordinary people.  Trustworthy means not-for-profit at the very least.  The closest thing we have is Wikipedia, but Wikipedia does not allow entries on ordinary, non-notable people.  (keep in mind that the Wikipedia publishes the facts in its ‘info boxes’ in RDF one of the core Standards of what we have been calling ‘The Semantic Web’)  

We need to start thinking of the Web more like we think of a Public Library, but completely decentralized and with infinite shelf-space.  I think WikiMedia, the organization behind the Wikipedia is the best bet for a trusted librarian for all the information about normal people.

I think what is really needed right now is a non-profit run directory of people, possibly even modeled after the Wikipedia, especially when it comes to the concurrent DBPedia project, which publishes the contents of  Wikipedia facts to the Semantic Web.  Really I think because of WikiMedia’s established trust, they would be the ideal organization to do this.  Wikipedia could simply have another layer which reveals non-notable results or ‘all results.’

Teh Semantic Web is Dead (Linked Data is the Word)

As a major intaker of information about leading technologies, I am proud to say that at the time of the creation of this blog post, I am ahead of the game as far as declaring a change in the language we use to refer to the next phase of web evolution.

The term “web” has never been stronger. The “internet” goes on as something we mention almost every day. And the technologies that comprise the realm of what we have been calling semantic web, mainly markup standards, aren’t going anywhere.

But semantic web just fell out of favor as a [canditate for a] useful euphemism in our language.  The moment this became obvious to me was a few weeks ago  when I heard that Tim Berners-Lee spoke at TED and didn’t mention ‘the semantic web.’  A few weeks later I saw the video for myself and felt a certain sadness or abandonment when TBL talked about the geekiest dream ever, one that he created, without using the name I thought we had all agreed on for it, The Semantic Web.  Instead, he used a different euphemism for the most awesome library system ever conceived.  He called it “Linked Data.”

If you are a Semantic Web apologist like myself you might feel slightly deflated by a sudden change in terminology. I’m sorry.  I’m sure TBL is sorry too.  

But the reality is that “Semantic Web” is always going to be confused with Natural Language Processing, which is also a field of technology that is growing fast in its own right.  

No sustaining buzz has really caught on with “the semantic web,” as a catch phrase, beyond us geeks that are already sold on the idea.  Instead, we’ve recently heard more and more announcements (made usually by search companies) that include the word semantic as if the mere use of the word means that the company is doing something right.

The battle we’ve been fighting as SemWeb advocates is largely a battle for widespread awareness. TBL has said himself that the phrase semantic web wasn’t the best choice of words.   

I’m sure TBL spent at least an afternoon considering what he might say to the audience at TED which arguably consists some of the most influential people in the world.  I’ve concluded that he intentionally abandoned the phrase, in preparation for a brighter future in which the SemWeb technologies are no longer so easily confused with other technologies.  We’ve changed our name.

If you feel the re-branding is unfair, consider who has more right to the word semantic, the Natural Language people or the Interchangeable Data Format people?  

We lose.

Sorry.  We need to move on. 

The Semantic Web is now called Linked Data.  It’s official.  Take a deep breath, change your notes.  And let’s move on as Linked Data enthusiasts, not Semantic Web enthusiasts.

I will lead this effort by removing the category of “The Semantic Web” from this site and replacing it with “Linked Data.”  I’ll do it later this week.  I need some time to say goodbye.

Semantic Web is Taking Forever, Right?

As a hardcore Linked-Data/Semantic Web Enthusiast for some time now, say since pre-2007 (back then, I didn’t know what to call it but I understood that it was possible), I can’t help but feel sometimes like it’s never going to happen.  Sometimes a non-silo Web seems like a idealistic fantasy.  Sometimes it seems like nothing is happening.  During the first half of 2007, the amount of excitement in the Sem-Web Category of my feed-reader was high.  Since then, however, the excitement level seems to have diminished quite a bit.  Am I right?

I want to offer a few condolences and some evidence that the Semantic Web is not dead. In fact, I believe it’s still going to “happen.”

  1.  Tim Berners-Lee spoke at TED this year, apparently urging people to unlock their data, according to GigaOm (TED, please publish this video soon, OK?). TED has a quickly growing  amount of influence in the mainstream from what I can tell.  This is good outreach. 
  2. JavaScript support for querying more than one URL/Site/Database at a time is coming to a browser near you very soon, according to John Resig via this talk at Google. We’ve seen a lot of new APIs allowing programmers to access certain data from certain places, but more promising to me than these limited and proprietary APIs that have been sprouting up is how HTML itself is increasingly becoming more ‘semantic,’ if for no other reason, because it allows coders to do more interesting and elegant things with CSS and JavaScript… Where this is heading, I think, is toward a future where pages are basically designed to be scraped, a sort of Microformat revolution (albeit totally rag-tag). Once the cat is out of the bag, I really believe embedded HTML semantics will become more and more standardised because of the incremental benefits resulting for the publishers of the content.  What I’m talking about here is mainly Classes and ID’s in HTML.  Give it some time. Those things are basically Microformats waiting to happen.  Right? 
  3. Last but not least, remember that the emergence of “Linked Data” will probably seem to explode at a certain point, even though the buzz seems to have slowed down in the echo chamber.  There’s a great little analogy I came across where data are compared to buttons being threaded together from one to the next, randomly and one connection at a time.  How many random single connections need to be made before picking up one button will bring all the others along?  The results are reassuring. Check it out over at the Data Evolution Blog, the newest feed in my Feed-Reader.

    Google Rolling Out ‘Semantic’ Results

    An interesting baby-step in Google improving Search Results (man are they ever holding out on us!)

    From Read/Write Web (Written by Marshall Kirkpatrick)

    Did Google Just Expose Semantic Data in Search Results?  Well did they?  No. The results pages don’t expose any “structured data”

    I really believe that Google is trying to avoid becoming everyone’s scrape-able Semantic Query Engine. There’s tons of at least semi-semantic data out there and google simply doesn’t present it to us.  They have it.  They understand it. They could give it to us. But they don’t.  I mean for crying out loud, imagine how difficult it must be for google to return image search results that are anywhere near as good as google’s image results are?   Does anyone really think that google is completely ignoring microformats or service-wide presentational semantic data (an example of this would be the html classes and ID’s assigned to elements on social network pages)?? Does anyone really think so?  While they’re looking at things like alt tags and nofollow tags and everything else?  Would google just ignore piles and piles of metadata? No.  Would they decide to not let us use it?  I think so.  

    I think they’re doing a classic ‘roll-out’ thing, saving their best search technology for when they absolutely have to whip it out for competitive reasons.  This is cause to resent google to a certain extent I think.

    Zemanta: Real-Time Semantic Discovery & Blogging Tool

    Trying out Zemanta, a service for finding related resources. 

    They make Plugins for WordPress, TypePad and other blogging platforms, as well as extensions for both FireFox and IE.

    Currently, as I’m writing this, the Zemanta plugin is only giving me a “Loading Zemanta…” message… I figured Zemanta’s database would likely have plenty of articles about Zemanta.  Maybe not.

    We’ll see.  Very cool idea either way.


    I guess the first time I loaded my WordPress Dashboard’s Editing page, Zemanta took a little while to load… Ever since it’s been super fast.

    Pretty cool little Plugin. 

    Reblog this post [with Zemanta]

    Technology Predictions for 2009

    first of all, my last prediction-for-next-year was a little optimistic, as I was predicting what people in the echo chamber have since started calling ‘cloud computing…’ I predicted that we’d see a lot of online services that blur the lines between what is ‘local’ and what is an online ‘service.’  …let me just defer that prediction one year and add it to the heap of what I see coming this year.  At least give me credit for making it my major prediction before the catch-phrase ‘cloud computing’ came to the surface.

    1. Linux Will Come and Start Killing. Google Android, Ubuntu Mobile, Asus’ recent release of EEE PC’s running Linux, all point for me to the fact that Linux is finally coming to a device near you.  Of course, Linux never went away, but I’m talking about real OS Market share.  In addition, I wouldn’t be surprised if the coming popularity of Linux also dishes out a major hit to Microsft because I bet it’s easier to port software made for Linux to Mac OS X than it is to port it to Windows since OS X is built on Unix.  Just something to consider.  Also, if you haven’t been looking, take a look at Ubuntu.  It’s a pretty nice OS and will run on anything, maybe even your toaster.  And it’s free!
    2. AJaX Will Continue to Prevail as the Shiznit in Web Development (while Flash and others continue to die).  Because of the nature of touch-screen interfaces and because we will increasingly see the deployment of Navigation and Map-based services as well as virtual world type applications, where a scalable simulated 3-D space is used, I think AJaX is likely to continue to become the way things are done.  At this point, I’m starting to doubt the long term success of Flash, AIR, Silverlight because I think Javascript can do what these things do better.
    3. Affordable Smartphones. Maybe this is a no-brainer, but when I say affordable I mean $100 or less.  I’m not predicting at this time affordable connectivity for these devices. I know gadget enthusiast might hate me for saying this, but I think the Handset Race and the Netbook Race are very overlapped.  They are both fighting for certain causes together such as improvements to battery life, cheapening of Solid State storage, cheapening of Mobile Connectivity, The need for competition in the OS market and the need for “Thin” software, not to mention ‘Cloud’ services… 
    4. Ubiquity of Navigation Systems and/or GPS. From my understanding, cellular networks are already able to provide location info nearly as accurate as true GPS.  There’s no reason for the next wave of phones to not have on-board GPS capability or something similar that offers driving directions etc.
    5. Google Will Roll Out Geo-Targeted Advertising for Realz. Via GPS/Navigation devices probably, but even desktop search should see a shift in this way. Try searching for ‘pizza.’ You can see there’s big room for improvement there.
    6. Google Search to Shape Up or Start Shipping Out. Google may begin losing Search market-share in 2009 if they don’t play their cards right. Google’s Search Results haven’t changed noticeably since they started putting Wikipedia articles at the top of the stack a few years ago.  Personally, I think Google is intentionally not releasing major improvements to their results in order to avoid being an unofficial API for competing services. Again, search for ‘pizza.’ Then, add your postal code to the search. The funny thing is that Google already knows where you are, more or less, based on your IP address. Meanwhile, other search engines are actually better for many kinds of searches. Try Yahoo! for ‘pizza.’ Try Dogpile for finding an mp3. Google is capable of being better than these right now, in my opinion, but intentionally holding back, banking on the idea that their mindshare will carry them along until the next era, probably brought on by the ubiquity of GPS and Smartphones.  Even if Google loses a considerable amount of its Search traffic, it will continue to be the biggest hub of online metrics collection, as well as of course, online advertising, where Google makes all its money.  I don’t think Google is going anywhere any time soon.

    rel=”spam” rel=”mal” MicroFormat for Spam? rel=”???”

    IDEA: A MicroFormat for when it may be necessary to link to a Malicious, Dangerous, or Unethical Site?

    Funny, the first thing that came to mind was using rel=”spam”  …but really what brought this up was a site that isn’t necessarily “Spam” in the traditional sense.  The site was a pyramid scheme, the operators of which were posting ads on my local craigslist for “social media” something or other.  This isn’t by definition, Spam.

    The Wikipedia currently says:

    “Spamming is the abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages…”

    The quantity is what makes spam spam, not the uselessness of what’s being promoted.

    Maybe rel=”mal” as in malicious??

    I’m not the only one thinking about this idea.

    Kevin Kelly on the Next 5,000 Days of the Internet-TED, 2007:

    Kevin Kelly gave this talk at TED in 2007.  It’s worth watching.  

    He touches on a number of things ranging from history of the Internet and Moore’s Law to the future ubiquity of Cloud Computing and Kurzweil‘s “Sigularity.” 

    He covers concepts like the Semantic Web, and the give-and-take between privacy and participation with relatively light language that any lay person should be able to understand.  This is an interesting and entertaining little presentation.  Thought I’d share.

    How Will I Organize My Tags? An App? MOAT? A Feature in Delicious?

    Here’s my dilemma. I have a ton of bookmarks on my Del.icio.us account.  I love using an online bookmarking system. But still, Delicious and others’ systems for organizing bookmarks don’t really help with a need I bet most users have: Tag-Optimization.  

    What we need are tools for analyzing and perfecting the organizing of bookmarks.  Every one of these systems like Delicious, Furl, StumbleUpon etc, have the same problem: user-submitted tags are bug-y!!! The engine of the platform needs to guide the users toward better tagging!  Basically, we need built-in systems for finding the types of redundancies and other tag-errors that we all have. We need debugging software, so our bookmarks can become good, clean representations of how web-users feel about various web resources.  “Suggested Tags” and “Popular Tags” are great time-saving features but I’d like to also have a tool for correcting tag-cancer.  
    These software offerings, if/when they finally exist, are going to make it increasingly more easy to harmonize user-submitted value from folksonomies with the ‘Semantic Web,’ which is right around the corner.
    Some examples of areas where I think a robot could help users to clean up tags are:
    • Redundant Tags. Usually just alternate tenses of the same word (like the plural and singular form) but also synonyms. Example: Image, Images, Picture, Pictures, Pix
    • Arbitrary Capitalization. HTML vs html etc.
    • Vagueness. Like los or awesome (wouldn’t it be safe to assume that all the things you bookmark are ‘awesome’ to you?’). 
    This is a screen-shot of my tagging screen from Delicious.  I added the red scribbling to point out just a few of the problems my tags have.
    Del.Icio.Us Tags Gone Wild
    Del.Icio.Us Tags Gone Wild

    On several occasions, I’ve set out to clean up my tags manually, but I’ve never made it very far.  It’s just too much work.

    Maybe the coming overhaul to Del.Icio.Us will ad some of these needed features, although somehow I doubt it.

    I’ve heard of the MOAT (Meaning Of A Tag) Project, and perhaps this could save us, but like many other ‘Semantic Web’ projects, I haven’t found a way, as a lay person, to utilize it.  At some point down te road,  maybe someone will make a Delicious-MOAT-erizer Web-App that will clean-up-shop-by-proxy and make the metadata available to the Semantic Web.

    Semantic Web Isn’t As Semantic As NLP, Web 3.0 Hype Overload Too!


    This a comment I posted on the Nodalities blog (or I think I posted it.  The form submit resulted in a blank page) 


    It’s ironic, really, that the Semantic Web should struggle so much with semantics!

    The problem is that if we present a mixed, complicated, and difficult concept forward, the journalists and media commentators are not going to be able to sort out the tangle of meanings for us. They will present an (over)simplified, half-understood message to the rest of the world. When even a brilliant communicator like Tim Berners-Lee’s message gets scrambled, maybe it’s time to take stock in how we present the Semantic Web, especially to the general media. Maybe, a set of metaphors could help us present these:

    The semantic web is a platform (one we already use frequently)! The semantic web is a layer of connectivity (like a concentric ring around the web itself). The semantic web is a series (more than one thing) of enablers (it makes possible, rather than it does)


    I think there’s a big problem, obviously, with the phrase “Semantic Web.”

    It’s easy for Press to confuse the intentions of SemWeb with those of Natural Language Processing.

    Talk about ironic:

    NLP is really more about human “semantics” than The Semantic Web is.  SemWeb technologies are only really semantic by comparison to the HTML-Web, and they’re only really “semantic” from a MarkUp/database/programming point of view, and still, only in comparison to older/existing systems.

    As Tim Berners-Lee has pointed out, the name “Semantic Web” wasn’t the best choice of names, but it’s too late to change it.  “The Data Web” or “Web Of Data” or “Linked Data Web” or many other names for it would be more accurate and less conducive to misrepresentation than “The Semantic Web” is.  But “Semantic Web” has already stuck, and I doubt anyone is going to change it.  

    Fortunately, “The Semantic Web” sounds lofty enough for people to think it probably is going to be the next big thing.  Unfortunately, the name is deceiving to most people and the technologies would probably seem more or less trivial to them anyhow.  

    SemWeb is a movement that would ideally be taking place among inspired, pro-active developers, but unfortunately, devs are too comfortable with the tools at hand and there’s no visible eminent market force in the development field pushing for movement beyond the same old skill-sets and practices on the ground, at least for most businesses and the programmers they hire..  For this reason, we should be glad about all the “Web 3.0” hype, and work to inform the press and the public between the lines where and when we can.

    For those of us that understand what “The Semantic Web” is, it is our duty to evangelize RDF and MicroFormats where and when we can.

    Soon we will be forgotten.  Soon we wont need to call the “semantic web” anything so this conversation will be meaningless… I hope.

    I Don’t Want FaceBook Comcast To Buy Plaxo

    Update… this was actually news back in January.  Coincidentally, today it was announced that Comcast is buying Plaxo.  Goodbye Plaxo.  Nice knowin’ ya.

    Got the rumor tip from Scoble (there’s no real info there so don’t bother)

    Plaxo? Are you listening?  Keep doing what you’re doing, stay behind the scenes, work on enabling users to publish their own data, at will, in Semantic Standards as they become timely (now?) and stay independent of the little tug-of-war between closed, albeit increasingly API-enabled social apps.   You’re better than them!  Hang in there and you’ll be worth way more!  Don’t turn to the dark side!

    Competition for traffic will get everyone using RDF and Microformats soon enough…  Semantics are like SEO 2.0… The next bandwagon everyone will want to pay way too much for.

    Plaxo, you’re in the perfect spot to make money on this.  Think Virtual Private Networks, Semantic Publishing to the Web, and Semantic Productivity Tools at home.


    Usefulness of DataPortability.org’s Rel=Me project

    In the suggested reading section of the page for the DIY Rel=”Me” project over at dataportability.org’s wiki, There’s a link to this blog post, which is an attempt to explore the usefulness of rel=”me” to the regular old web user.  The article is slightly tunnel-visioned at what you can or can’t do with your browser to exploit MicroFormats.  Of course, being able to detect locations or personal contact info thru a browser extension is useful and I’m all for it, but beyond a few obvious exceptions like those, The Semantic Web, MicroFormats included, wont be much use to us at the level of the browser.  We will still need Web based portals or “Libraries” or “repositories” or “Catalogs” or what have you, to connect to, in order to really take advantage of this stuff.  Semantic markup on pages is great. RSS is an example of how a little bit of semantics can go a long way.  But what’s of greater significance is the idea of the Web Of Data, where resources are “semantically” interconnected, by leveraging information that’s mapped to the domain of knowledge where it’s useful and the relationships between resources are also specified in a machine-understandable way.

    Rel=”me” is the equivalent of saying “The person represented by this URL is the same person as the person represented by this other URL.”  Taking that into consideration, imagine how this would effect the experience of searching the “Web of Documents.”  I argue that if enough of us implement rel=”me” (or other microformats or RDFa) in our HTML pages, we will empower the Googles and Yahoos to take advantage to knowledge expressed by this markup.  So let’s do it!  

    Quotes from the Article I mentioned:

    “…So assuming that you went through the trouble to write up your HTML with rel=me, what next, where is that information actually consumed. I don’t think the 2 most popular browsers (IE 7 and Firefox 2) at this time have native support for XFN, I hear Firefox 3 is suppose to have native microformat support but I haven’t looked for it and if it is there, it isn’t immediately obvious to me. The closest thing I can find is a Firefox plugin called Operator. Operator is a microformat capable reader and for the most part seems to be able to consume most of the above microformat standards except rel=me, kind of odd but kind of understandable…”

    “…At this time, I can honestly say that XFN rel=me proliferation is limited and experimental at best. It would take a while for mass adoption to happen and requires a lot of user education, adoption by popular social sites like Facebook, MySpace, etc, and native browser support…”


    I commented there and when I take the time to write a long comment out, that isn’t something I’ve already written in so many words here, I like to steal my own comment and put it here for anyone who reads my blog.  My response:

    I felt like I had to chime in and point out that the point of MicroFormats or RDFa isn’t really to make an overnight change in how we use the Web. It’s to create a backbone of linked data so that as Search Engines and other “Libraries” begin to have stores of these relationships between documents and other resources available to work with, they can begin to improve their services. It will be nice when Search is only partly based on scanning for text-strings or combinations of words.

    If you were looking for Andrew in Sebastopol, CA, how would you do it? Perhaps you’d google “Andrew Sebastopol CA…”
    But what if you could specify that you are looking for a person?
    What if you could specify geocoding info or otherwise specify that Sebastopol is a town in Northern California?
    What if you could filter your results by the time web-pages were created or filter by domain specifications (like show me wiki articles first or show me all MySpace profiles) or filter by type of site like say, show me blogs only, and finally, and this is where rel=”me” comes in, what if you could specify in your search results that you want to see every other document that is an expression of the same person, once you have selected from your query, a person named Andrew who lives in Sebastopol, CA? This is what it’s all about. It works because links work backward. In other words, you can already say “show me all the pages that link to this thing…” but what about being able to say “show me all the pages linking to this Twitter page that link using rel=”me” or better yet, show me all the pages linked to with rel=”me” from any page that links to this twitter page with rel=”me” …And so on…

    The Web is becoming a library. By adding microformats and other semantic markup to our documents, we are making it possible for decent “card-catalogues” to be built, whether they’re being built by google, yahoo! or the guy down the street.

    DataPortability In Motion Podcast

    A weekly roundtable discussion about the DataPortability Project in specific, and efforts involved in data portability in general. The show is produced and hosted by J. Trent Adams and Steve Greenberg.

    PodCast is HERE


    I recommend Episode 7 


    We kick off episode 7 of the DataPortability: In-Motion Podcast with the news of the week that MySpace launched “Data Availability” with Yahoo!, eBay, Photobucket, and Twitter. Following immediately on their heels was the announcement that Facebook is releasing “Facebook Connect”, an extension of their 3rd party API providing deeper access to their user’s data.


    We’re also joined by Brady Brim-Deforest, founder of Human Global Media, talking about the DataPortability Legal Entity Taskforce. He provides a good overview and update on the process underway to formalize the the project under a recognized legal banner.

    The featured interview segment is with Danny Ayers, Semantic Web Developer at Talis. He touches on moving from document linking, through microformats, to feature-rich RDF modeling to identify portable data. Contrary to popular belief, he dispels the myth that it’s hard to migrate from a standard SQL data representation into addressable semantic objects.

    Danny regularly posts on the following sites:

    Also mentioned in the episode:


  1. Planet RDF
  2. Apture! Multiple Resources From One Link. This is Really Cool.

    I heard about this through Lawrence Lessig’s blog. Professor Lessig is taking the month of May off, and off the grid, which I applaud him for.

    What this web app does is allow you to make links that, through the free Apture service for your site, link to numerous resources, all previewable via the same sort of javascript popup you get from Snap or the ZitGist “zLinks” plugin.

    You must see this in action. This is inspiring. It shows how much more dynamic web pages can and will be in the near future. I’m a bit sick of the over-use of javascript, ajax, whatever you want to call it. It tends to be resource-heavy on your machine. This is an exception.

    I wonder if these guys are going to implement any Semantic technologies into the data they store… I wonder if they’re going to make deals with bookmarking services like del.icio.us… All my words could automatically be links to mini-libraries of items I’ve bookmarked! It’d look a little ugly given the current style conventions but hey. Let’s change those.

    It’s interesting to me to ponder how this non-semantic-web service, because it’s also a library/bookmarking tool, could become hugely useful to the Semantic Web as they snatch up web user’s resources/web-bibliographies.

    Oh man. This is a hot item!

    “wrote an interesting post today” SEO, Evil Robots and One Sad Outcome of Non-Semantics in Spam-Control/Search

    I’ve mentioned before how increasingly the ‘Live Web’ or ‘Blogosphere’ (or whatever you want to call this thing) is being infiltrated by Robot Blogs. What they appear to be doing is crawling the web and scraping excerpts of blog posts and reposting the excerpts, linking back to where it came from. They usually say:

    “[KeyWord] wrote an interesting post today”

    Since they link back to the blog post they scraped, they show up as a trackback in the comments area of the original post. This way, the unsuspecting blogger is linking to the fake blog. The fake blogs seem to be set up in an attempt at monetizing traffic via adsense ads.

    I googled the phrase “wrote an interesting post today” and the top hit was (I probably am the top hit now) some blogger talking about filtering any comment that contains the phrase “wrote an interesting post today.”

    I had decided to change my little tagline thingy to this exact phrase as a sort of inside joke for bloggers, but found myself wondering if being associated with that phrase will adversely effect my findability. Perhaps Search Engines or Spam Filters will begin to look out for that phrase?

    Already, I bet there are tons of bloggers who filter out comments containing words like “viagra” or “casino,” assuming that there is absolutely no context in which these words could be used in a legitimate discussion. The fact that I am using those words here is proof that there is such a thing as a legitimate discussion which contains them.

    Filtering for a word or phrase seems to me to be a slippery slope, especially if we’re talking about Search Engines, since they act as our main interface to the Web.

    Google: Please don’t hate me because I said Viagra. I’m not a spammer.

    PHP Application Turns MySpace Friends Into CSV – View/Mine in Excel Spreadsheet Etc

    My friend threw together an app that scrapes your MySpace contacts and puts useful info into a reusable format.


    UPDATE: It’s also available as a Torrent via The Pirate Bay. Please consider seeding this. It’s a tiny, tiny file.

    Here’s the Read Me info I just put together to go with it:

    and change those.
    Save the file.
    Upload these two files to your server.
    point your web browser to http://where-you-put-the-file-on-your-server/ms_test.php
    and what will result is a CSV file of all your MySpace friends and their demographic information. Also included is the URLs to “send message” etc, and some other useful things.
    View the source of the page and copy it into a PlainText text file
    Name the text file with the extension .csv
    Now you should be able to work with your myspace friends in Excel

    There is nothing malicious about this simple application. No viruses, spyware etc. It only does what it’s supposed to do: scrape your friends so you can more easily work with your social network data.

    If you are of the camp that feels that people scraping their own myspace contacts is unethical, I suggest that you consider that all the pages are already available and the data they contain is rendered in HTML which can be freely accessed already. This is just a tool to make it easier to get the useful data separated from the clutter.

    Finally, this is possibly against MySpace’s Terms Of Service, so use at your own risk.

    Talis’ Podcast Goes the TWiT Route: The Semantic Web Gang

    Semantic Web Gang: Introductory Episode


    Some suggestions:

    1. During the conference call, use some sort of mixing program so the moderator can see who is talking at every moment during the recording via an audio level meter and make adjustments as needed.

    2. Whenever some body new starts talking, quickly talk over them stating their name (it only obscures what they’re saying for one or two syllables so it’s easy for the listener to understand what they’re saying while taking in the metadata too)

    3. Have a rotating or otherwise changing schedule of guests like TWiT does.  The occasional random apperence by a CEO or two, or other dignitaries of the Web might help to keep the discussion interesting.

    4. Don’t be afraid to spend a few hundred pounds on a decent microphone and maybe a mixer or or whatever is needed to improve the quality of the audio.  The audio of Talking With Talis has been piss poor since the beginning.  It would really serve you guys to improve on that.

    I think part of your mission is evangelism, so I hate to think you’re losing audience because of the poor audio quality.

    Looking forward to more!




    The Web Has Always Been a Semantic Web

    We started with the semantics of document structure. That’s what the World Wide Web is made of. It’s a giant network of HTML pages linking to each other. HTML (Hyper-Text Markup Language) documents have titles, links, headings and other elements that allow us to see web pages the way we do today. The whole idea of a “Hyper-Text” is referring to the power of a form of semantics. It is a matter of semantics that we see <a>this</a> as a link and


    as a heading. It is the semantics of document structure a.k.a, HTML that have made it possible for documents like this one to link to others and for all of these pages that make up the Web to be rendered by our computers in more or less the same way.

    The Idea of “The Semantic Web” is really only necessary for the sake of comparison.

    So to sort out the semantics of what we’re talking about when we use the word “semantic” with regard to the Web, The Semantic Web refers to a movement toward not just semantics that define the structure of documents or pages, but semantics being applied to how information is made available over the Net.

    Recent trends in the Web’s growth are making computer-language standards for compartmentalizing domains of data. The Semantic Web is a movement toward not just using semantics for defining document structure, but using semantics to make declarations about the context in which a linked resource or bit of information can be useful.

    Yahoo! Says Search is “Killer App” or The Semantic Web

    Maybe it’s a bit silly to say that there’s “a Killer App (as in one)” for Semantics.  Nonetheless, Yahoo! announcing its search results will soon be taking advantage of Semantic Web Standards is definitely great news.  Quote from the Yahoo! Search Blog:

    “In the coming weeks, we’ll be releasing more detailed specifications that will describe our support of semantic web standards. Initially, we plan to support a number of microformats, including hCard, hCalendar, hReview, hAtom, and XFN. Yahoo! Search will work with the web community to evolve the vocabulary framework for embedding structured data. For starters, we plan to support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others based on feedback. And, we will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, we are announcing support for the OpenSearch specification, with extensions for structured queries to deep web data sources.”

    Hmmm.  Wasn’t I just saying something about Semantics having an effect on Search Results in the near future?  I guess Yahoo! doesn’t think that’s such a crazy idea.

    Imagine that.  Standards for defining the context in which information can be used actually being used to help search engines provide users with more relevant results.  What a concept!

    And on the SEO and SEM front, can you guess what Google, AOL, MSN and all the others are probably working on right now?

    Semantic Standards are Sustainable SEO for You, Your Business & Website

    Big Rant.

    Using HTML was once a smart move for findability online.   Seems obvious to us now, but in case you don’t realize how stupid people were during the initial growth of the Web back in the late nineties, imagine this: People used to send cease and desist or take-down letters to owners of other sites because the other sites were linking to them.

    “How dare you link to my site!  You have no right to mention my existence and if you do not remove the link, I will sue you!”

    In other words, we have a hard time looking beyond the current paradigm.  Right now that paradigm is something like, in order to be findable, spend a lot of time working with the wording of your site’s copy, and make sure your metadata and you document structure are written to reflect what search results you want to win.

    It’s funny though: Still, one of the best things you can do SEO wise is to have an RSS feed.  And in case you didn’t realize this, RSS is a Semantic Standard.  Apparently RSS 2.0 is a little convoluted (the adjustments made to the standard since it’s creation are not entirely in line with the Semantic Web school), but the original RSS stood for RDF Site Summary.  Blah blah blah.  Go look it up.
    A little bit of Semantics is potentially way better for your site’s visibility than a whole lot Keyword tweaking.

    FOAF, SIOC and the countless other Semantic Markups are a way for you to get your foot in the door now!  A bit like the people that realized early on that they needed to have a website at all in the first place.

    A little bit of early adoption of Semantics for your information could really pay off as we start moving toward a smarter Web.  And we are moving toward a smarter Web.  Who will be part of it when it reaches it’s tipping point for large scale adoption?  Will you or your business?  Or will you wait until some news report announces that the rest of the world has already gone semantic? I know I’ll be there.  I already am.

    Because what Search Engines are trying to do is provide users with access to what users are looking for, the process of SEO, when it consists of tweaking Keywords and/or document structure around, according to whatever the latest rumors are on what silly and temporary way Google seems to  be currently making decisions about relevance,  is always going to be flawed and as long as these SEO rumors are floating around, people will be trying to game the engines and in turn, people are collectively increasing the need for the engines to change their parameters, repeat, repeat, repeat.  Search engines do not try to index sites based on the sites’ application of SEO techniques, engines index sites based on an attempt at creating an Information Architecture… This is hard to do because most website aren’t presented in a architecture-y way.  So we’ve come full circle.  Feeds are an architecture-y way to present your content, so it’s no wonder they help with SEO.

    You might ask “So what’s next beyond RSS?  How can I make Google love me even more?”

    My answer is: “Stop lying to them with your SEO, and start helping them with Semantics”

    And just remember what happened when a little bit of semantics got put into effect?  Remember RSS?  Well the blogosphere basically happened and in turn the “Live Web,” Podcasting and all that.  Powerful stuff, and Web 2 is just the tip of the iceburg.

    A letter To Lawrence Lessig: Government Websites

    This post is aimed at one of my personal heroes, Professor Lawrence Lessig.

    Mr Lessig,

    First, I want to thank you for all the work you’ve done already to spread awareness about ‘Net Neutrality,’ the need for Intellectual Property reform, ‘Free Culture’ and so on. Your name comes up often as I do my part to help to change the way people think about the ownership of ideas and/or culture, no doubt because many of my thoughts on these matters are derivatives of yours. And finally, as an artist, thank you for helping me to see past my own possessive instincts, and to understand that my creative efforts are best honored if I aim for my work to become part of the Public Domain, because it is there that I can really contribute to the shape of our culture in the future. So thank you. Please keep up the good work.

    It occurred to me that you may be the perfect person to spearhead the solving of a problem our government has -a small problem with major consequences. Before I go on though, I just want to urge you not to take this letter the wrong way. I don’t mean to imply that you need people like me to help you to choose your battles. But I know of no one else in the public eye that is such an advocate for the people, and who also seems to understand the implications of digital communication via the Web. You are the only public figure I can think of that generally seems to take the people’s side in all the domains where this issue manifests itself: The need for transparency in government; The need for people to be able to navigate the law to some degree without the aid of lawyers; The importance and potential of the [Read/Write] Web, especially with regard to how it can and does make our Democracy more democratic; etc… You actually seem to understand what the Web is and why it is important, and I fear that many or most of our legislators, judges and executives do not. This is why I’m writing to you.

    The problem is that government websites generally lack consistency, search-ability, interactivity and general user-friendliness. On the surface, this may seem to many people like a minor problem. But from my point of view, it is one of the most important manifestations of how our government doesn’t work for the average person. This is a huge opportunity to improve how our democracy works for us.

    Here are some of my thoughts on this.

    1. Government websites generally have no interoperability between them. It seems to me that government websites should share a common information infrastructure as well as a common basic user interface and query system. If I am looking for information on something like a law on one government site, like say a county, I should be able to expand my search to include less local results, like say the state I am in, or narrow my search to only include more local results, like the City I am in. I think that government sites should be hierachically connected wherever possible to say the very least.

    In general, I think it is time for all official government agency websites to become integrated.

    2. Government websites do not routinely take advantage of technologies that make it easy for us to get new information from them. With technologies like RSS and iCal, it seems that citizens should be able to access regular updates from all the government agencies that concern them. We should be able to anonymously subscribe to feeds of governmental news, events, changes in policy, Etc. Example: “Effective today: All automobiles must have headlights turned on when it is raining regardless of the time of day. See Vehicle Code XYZ Section abc.”

    3. By allowing existing laws to be un-findable, our government excludes us from even being able to understand what we have supposedly agreed upon through a democratic process.

    For instance, on many occasions, I have tried to find out the specifics of one law or another. I have gone to my City, County and State government websites hoping for my question to be answered by a quick search, but instead, I’ve found myself hours later with a ton of windows open still trying to figure out the answer to a specific question like “Is [somehting] against the law?”

    Again, to some people this may seem like a trivial complaint, but how in the world are we supposed to be law-abiding citizens if we cannot even be sure what the laws are? I believe that most people, in most communities in the USA have a very vague understanding of what is and isn’t legal. To many of us, The Law acts like some sort of urban mythology. We have no idea what the law actually says, and we cannot find the law if we want to learn what it actually says.

    I have even had conversations with law enforcement officers in which the officers assured me that I “Can’t do” something, but were unable to tell me what the law says, where it says it, whether it is a local, state or federal law that is in question, or where I could even begin to look to find out for myself. This is scary to me.

    I understand that Laws themselves are often confusing to lay persons. But I don’t understand why it is so hard to even find Laws in the first place. We have the technology to vastly improve this situation. It must be improved.

    4 . Government websites generally have no place for public discussion or comment. There is also generally no universal protocol for asking the government(s) questions through the Web. Really, there is practically no way to reliably get facts about policy from government agencies in general. Since we clearly have the technology to make it possible for citizens to interact with and get information from government agencies, while keeping the expense to taxpayers very low, shouldn’t this be imperative?

    So those are some of my main ideas about the digital government interface. Perhaps it is time for it to become written into law that certain standards and improvements are implemented on all government websites. Indeed, if there are already legally binding standards in place for government websites, they need to be vastly improved.

    If technologies like RSS along with Semantic Web technologies were taken advantage of by government agencies, they could lead to vast improvements in our ability to understand and take part in our democracy.

    Mr. Lessig, I wanted to write this to you because I don’t know where else to turn with these ideas. I hope you get this, and if you do, I hope you understand why I wrote this to you, rather than, say, The President or Santa Clause.

    Of course, I am more than willing to help with this cause in any way that I can.


    Andrew A. Peterson

    Trying to Manage My Social Graph – So Much Work!

    Some steps I’m taking to get all my contacts in order:


    • Syncing my phone with my Desktop Address Book using iSync.  (Unfortunately, my phone, a Nokia 6126 isn’t supported by iSync so I had to find a hack to make it work)
    • Installing Plaxo‘s extension for AddressBook on my computer so I can take advantage of Plaxo’s syncing service
    • Setting up Plaxo to retrieve as much info about the people I know as possible from the various online services on which we are connected.
    • Manually attempting to find redundancies in this master-list of connections and fix them.  I have multiple incomplete address cards for many people, often with each one containing different pieces of the puzzle.  Also, I wanto to include MySpace URL’s and Blogs that aren’t discoverable using Plaxo or any of the other tools I have at my disposal. I guess I have to do this more or less manually.
    • Re-Syncing with Plaxo then Re-Syncing with my phone.
    • With a few exceptions, I’m not going to include people who I’ve only met online thru Social Networks Etc.
    • Publishing this data, Minus Private information like emails addresses and phone numbers in Semantic Formats like FOAF, XFN Etc.  I wonder about how other people would fee about the ethics of this.  I’ve decided that since I’m not revealing anything that isn’t already findable on the public web, via a MySpace search etc, it isn’t unethical even though many of the people I know might haven’t thought through what the implications are of participating in the Social Web.  I am going to take special care to not reveal blog sites and other things that people I know are doing anonymously.
    • This will probably take me a few days at least and I’m not looking forward to the work.