These mining firms, if you will, will eventually cave to selling to everyone.
What will we do with the data?
What will future for-profit mining operations be focused on?
These mining firms, if you will, will eventually cave to selling to everyone.
What will we do with the data?
What will future for-profit mining operations be focused on?
I just can’t stand how stupid I sound, even one year ago, what I wrote, what it was about, the words I used, the style, the punctuation. I can’t really even go more than six months back in these posts before what is there makes me really uneasy. I get really, really uneasy. I feel embarrassed and stupid.
Meanwhile, the reason I originally believed in blogging, which I have believed since around 2005, is still a good reason for me to also believe that the Web, and especially personal archiving, can be better than some sort of vanity exercise.
Someone that argues with me a lot recently said that they had stopped blogging when they realized they were just contributing “noise.” I took this as a jab, as if I am making noise as well. After thinking about it a bit, I have concluded that I am not making noise, not compared to the vast majority of sites that get click-thrus. On the few searches that I appear prominently on, I am the opposite of noise. I give out helpful information.
Meanwhile, I do also talk about all sorts of non-helpful things. I even make joke posts. Is this bad for the Web? Is it noise?
No. Because we have plenty of room. We don’t have a space problem, we have an SEO problem. The fact that satisfaction with search results is plummeting is not a result of excess content. You’ll notice it’s often the same sites that let us down. The problem is that search is not improving. Search is asleep at the wheel while the link-bate space has exploded. Here we are. Answers.666, about.666.com, just-666.com, these sites offer often very inaccurate information, mostly voted into place by people with no accountability!
The cluttering of the Web used to be something we could blame on people that make useless websites or spam identities. But not anymore. Search is failing to help us find better content. And it’s really tempting to assume this is not an accident on the part of Search.
IfEveryoneJustTypedLikeThisAndWeEssentially AbolishSpaces AsOurPrimary WordSeparators WeWouldntNeed Hashtags BecauseEverythingWouldAlreadyBeSearchableTextStrings. WeCouldUseSpacesToIsolate KeyWords AndPotentially TrandingTopics. This WouldAlsoSaveCharactersWhen Tweeting EtcAndInGeneralItWouldSaveSpaceBecauseIt’s Compression. AbolishSpaces aka #abolishspaces
What if the content/facts of all wikipedia articles were semantically linked by a prediction modeling application?
Of course all of this linking would need to be done by the community. But I have a great deal of faith in the wikipedia community.
All that really needs to happen is for links in sentences that contain dates like “2054” or “June 11” or “05, 21, 1976″ to be declared as being a prediction, fact, speculation (or other specs) Etc (for now, via a rel=”prediction” or time=”Etc” type of thing) (I think it would have to work along with/within human language syntax for now, because I doubt people want to qualify every word they write with semantic markup, but to require lines that contain a date to have some rules isn’t too crazy)
Another piece is needed to tie in the actual information, but it could just be a link to the actual article in which it appears, which isn’t necessary because that’s where it’s coming from. It’s a start.
In other words, “Show me predictions for 2054 based on wikipedia info” could give you articles that contain predictions for 2054. And you can easily get to those articles from the results. Not as granular as “linked data” should be, but right now, the web is basically all about making it easier to look at selfies and bad journalism.
I think this could have a lot of research power.
The ‘Semantic Web’ is not nearly as hot of a topic as it was a few years ago, but if you remember, some of the efforts being made back in the old days (2008?) had to do with embedding semantic identifiers into regular old HTML. The two examples that come to mind are RDFa and Microformats. I haven’t heard a lot of buzz about embedded ‘linked data’ in HTML lately, but I heard today that a new project, called schema.org has been launched to enable developers to add markup to sites which will help search services glean meaning from markup. Apparently, Google, Microsoft and Yahoo! are all on board with this project.
I guess we should call this Keywords 2.0
Anyway, they have a whole taxonomy of ‘things’ laid out. Check out “The Type Hierarchy” page. A great start.
I guess this means that a lot of SEO people are gonna start getting work again. It’ll be interesting to me to see if people start actually putting this stuff into their CMSs. I suspect not. I suspect that the kinds of companies that have such rich data that they can just rebuild the hooks they use as their apps render HTML will already be benefitting enough in organic search that they wont find a need to actually clutter up their code with this stuff. I mean I find it very unlikely that a site like Disney’s would get out-ranked by some spammer because the spammer used these newer HTML attributes.
Then again, the fact that the major players are on board with this makes me wonder if there isn’t a reason that’s profitable to search companies to finally start getting rid of all the garbage from SERPs. Touch-screen finger fatigue? Even so, it’s all the damn spammers in eastern Europe that’ll have the resources to recode everything, at least in the near future.
Above all, I’m glad to see any attempt at making information more granular. And deep down, I still want the universal distributed database we were all so excited about back in web2.0 when the semantic web seemed like it was on the horizon, before facebook and the mobile app-o-sphere took over.
What do we call this current era? The API-o-sphere? The Walled-garden-o-sphere? Maybe we should just call it Facebook.
Intrigued and disappointed at the same time.
The following is a bunch of predictions. Mark my words. Three areas to pull out your wallet for.
This is a draft version. Suggestions welcome.
Short answer: People.
What the Semantic Web (now officially called any number of other things besides that) needs in order to become mainstream, in my opinion, is people and the connections between them. The phrase “The Social Graph” comes to mind a la Brad Fitzpatrick‘s once famous, but now all but forgotten manifesto which even Tim Berners-Lee eventually commented on.
The Semantic Web would catch on if it was seen as even remotely useful by the young people who are most likely going to be building the next big thing on the web.
The beautiful thing about the Web2 era is that highly useful tools can sprout up overnight simply because of the desires of more or less ordinary people with no credentials or affiliation with a company. Everyone knows someone who’s a programmer. The next big social software application just might come from the bedroom of a teenager. There is hardly any barrier to access anymore. This is why Web 2.0 happened. A new tool or service doesn’t need a business plan and a data center to launch and go viral.
The trajectory of innovation throughout the last five years or so, the “Web 2.0” years, has been around capitalizing on people, the content they create, their interests, and the value added by crowd-sourcing. The benefits in the social media space are clear from both the perspective of normal end-users, as well as giant companies. Mostly, these benefits are about filtering noise and finding relevance on the user-side and on the giant company side, gathering metrics, targeting messages and acquiring free content. The SemWeb standards have a lot to offer the Social Media realm, dare I say, probably even more than CSS with rounded corners does (I hope I’m not offending anyone here).
But the way things are today, for most programmers, implementing SemWeb standards is a lot of extra work with no immediate benefit. Why not just use MySQL or cook up a new XML format?
So why are these standards being completely ignored by the coders on the street? RSS took off. Why not FOAF? I think it’s because there’s no useful directory of URIs for people. There are lots of SEmWeb geeks who have URIs, but the kids on MySpace and FaceBook don’t have URIs or FOAF files. And those kids’ eyeballs and participation are worth real money!
One fine day, back in 2006, Tim Berners-Lee came down from the mountain and gave us a commandment (or at least he logged into his blog and made a suggestion):
“Do you have a URI for yourself? If you are reading this blog and you have the ability to publish stuff on the web, then you can make a FOAF page, and you can give yourself a URI.”
Then, apparently fifteen minutes after the first post was published, Berners-Lee really got at the importance of URIs in a post called Backward and Forward links in RDF just as important:
“One meme of RDF ethos is that the direction one choses for a given property is arbitrary: it doesn’t matter whether one defines “parent” or “child”; “employee” or “employer”. This philosophy (from the Enquire design of 1980) is that one should not favor one way over another. One day, you may be interested in following the link one way, another day, or somene else, the other way.”
For those of you who don’t yet understand the idea of the Semantic Web, here’s the deal. If there’s one web-address that represents each person, place thing or idea, it becomes possible to crawl the Web (documents as well as databases) looking for links to that person place or thing. And if those links contain tags which specify the meaning of the links, the web-at-large begins to look more like a giant database. This is the “Web of Data” (in contrast to the “Web of Documents” we know and love). This is what people call The Semantic Web. So what’s stopping people from being in the “Web of Data” (AKA Semantic Web)? Like Tim Berners-Lee suggested, we need URIs for people. That’s where it all starts. Once there are URIs for people, and there are semantic links (ones that contain tags explaining what they mean) pointing at the those URIs, we can start making tools that use that data.
This is a fairly simple concept. And Berners-Lee makes it sound simple enough. Sure, we’ll all just give ourselves URIs and viala, the Social Graph will go Semantic. That sounds great but there are a few problems with leaving it at that.
We need to start thinking of the Web more like we think of a Public Library, but completely decentralized and with infinite shelf-space. I think WikiMedia, the organization behind the Wikipedia is the best bet for a trusted librarian for all the information about normal people.
I think what is really needed right now is a non-profit run directory of people, possibly even modeled after the Wikipedia, especially when it comes to the concurrent DBPedia project, which publishes the contents of Wikipedia facts to the Semantic Web. Really I think because of WikiMedia’s established trust, they would be the ideal organization to do this. Wikipedia could simply have another layer which reveals non-notable results or ‘all results.’
As a major intaker of information about leading technologies, I am proud to say that at the time of the creation of this blog post, I am ahead of the game as far as declaring a change in the language we use to refer to the next phase of web evolution.
The term “web” has never been stronger. The “internet” goes on as something we mention almost every day. And the technologies that comprise the realm of what we have been calling semantic web, mainly markup standards, aren’t going anywhere.
But semantic web just fell out of favor as a [canditate for a] useful euphemism in our language. The moment this became obvious to me was a few weeks ago when I heard that Tim Berners-Lee spoke at TED and didn’t mention ‘the semantic web.’ A few weeks later I saw the video for myself and felt a certain sadness or abandonment when TBL talked about the geekiest dream ever, one that he created, without using the name I thought we had all agreed on for it, The Semantic Web. Instead, he used a different euphemism for the most awesome library system ever conceived. He called it “Linked Data.”
If you are a Semantic Web apologist like myself you might feel slightly deflated by a sudden change in terminology. I’m sorry. I’m sure TBL is sorry too.
But the reality is that “Semantic Web” is always going to be confused with Natural Language Processing, which is also a field of technology that is growing fast in its own right.
No sustaining buzz has really caught on with “the semantic web,” as a catch phrase, beyond us geeks that are already sold on the idea. Instead, we’ve recently heard more and more announcements (made usually by search companies) that include the word semantic as if the mere use of the word means that the company is doing something right.
The battle we’ve been fighting as SemWeb advocates is largely a battle for widespread awareness. TBL has said himself that the phrase semantic web wasn’t the best choice of words.
I’m sure TBL spent at least an afternoon considering what he might say to the audience at TED which arguably consists some of the most influential people in the world. I’ve concluded that he intentionally abandoned the phrase, in preparation for a brighter future in which the SemWeb technologies are no longer so easily confused with other technologies. We’ve changed our name.
If you feel the re-branding is unfair, consider who has more right to the word semantic, the Natural Language people or the Interchangeable Data Format people?
Sorry. We need to move on.
The Semantic Web is now called Linked Data. It’s official. Take a deep breath, change your notes. And let’s move on as Linked Data enthusiasts, not Semantic Web enthusiasts.
I will lead this effort by removing the category of “The Semantic Web” from this site and replacing it with “Linked Data.” I’ll do it later this week. I need some time to say goodbye.
As a hardcore Linked-Data/Semantic Web Enthusiast for some time now, say since pre-2007 (back then, I didn’t know what to call it but I understood that it was possible), I can’t help but feel sometimes like it’s never going to happen. Sometimes a non-silo Web seems like a idealistic fantasy. Sometimes it seems like nothing is happening. During the first half of 2007, the amount of excitement in the Sem-Web Category of my feed-reader was high. Since then, however, the excitement level seems to have diminished quite a bit. Am I right?
I want to offer a few condolences and some evidence that the Semantic Web is not dead. In fact, I believe it’s still going to “happen.”
Trying out Zemanta, a service for finding related resources.
Currently, as I’m writing this, the Zemanta plugin is only giving me a “Loading Zemanta…” message… I figured Zemanta’s database would likely have plenty of articles about Zemanta. Maybe not.
We’ll see. Very cool idea either way.
I guess the first time I loaded my WordPress Dashboard’s Editing page, Zemanta took a little while to load… Ever since it’s been super fast.
Pretty cool little Plugin.
first of all, my last prediction-for-next-year was a little optimistic, as I was predicting what people in the echo chamber have since started calling ‘cloud computing…’ I predicted that we’d see a lot of online services that blur the lines between what is ‘local’ and what is an online ‘service.’ …let me just defer that prediction one year and add it to the heap of what I see coming this year. At least give me credit for making it my major prediction before the catch-phrase ‘cloud computing’ came to the surface.
He covers concepts like the Semantic Web, and the give-and-take between privacy and participation with relatively light language that any lay person should be able to understand. This is an interesting and entertaining little presentation. Thought I’d share.
Here’s my dilemma. I have a ton of bookmarks on my Del.icio.us account. I love using an online bookmarking system. But still, Delicious and others’ systems for organizing bookmarks don’t really help with a need I bet most users have: Tag-Optimization.
On several occasions, I’ve set out to clean up my tags manually, but I’ve never made it very far. It’s just too much work.
Maybe the coming overhaul to Del.Icio.Us will ad some of these needed features, although somehow I doubt it.
I’ve heard of the MOAT (Meaning Of A Tag) Project, and perhaps this could save us, but like many other ‘Semantic Web’ projects, I haven’t found a way, as a lay person, to utilize it. At some point down te road, maybe someone will make a Delicious-MOAT-erizer Web-App that will clean-up-shop-by-proxy and make the metadata available to the Semantic Web.
Update… this was actually news back in January. Coincidentally, today it was announced that Comcast is buying Plaxo. Goodbye Plaxo. Nice knowin’ ya.
Got the rumor tip from Scoble (there’s no real info there so don’t bother)
Plaxo? Are you listening? Keep doing what you’re doing, stay behind the scenes, work on enabling users to publish their own data, at will, in Semantic Standards as they become timely (now?) and stay independent of the little tug-of-war between closed, albeit increasingly API-enabled social apps. You’re better than them! Hang in there and you’ll be worth way more! Don’t turn to the dark side!
Competition for traffic will get everyone using RDF and Microformats soon enough… Semantics are like SEO 2.0… The next bandwagon everyone will want to pay way too much for.
Plaxo, you’re in the perfect spot to make money on this. Think Virtual Private Networks, Semantic Publishing to the Web, and Semantic Productivity Tools at home.
In the suggested reading section of the page for the DIY Rel=”Me” project over at dataportability.org’s wiki, There’s a link to this blog post, which is an attempt to explore the usefulness of rel=”me” to the regular old web user. The article is slightly tunnel-visioned at what you can or can’t do with your browser to exploit MicroFormats. Of course, being able to detect locations or personal contact info thru a browser extension is useful and I’m all for it, but beyond a few obvious exceptions like those, The Semantic Web, MicroFormats included, wont be much use to us at the level of the browser. We will still need Web based portals or “Libraries” or “repositories” or “Catalogs” or what have you, to connect to, in order to really take advantage of this stuff. Semantic markup on pages is great. RSS is an example of how a little bit of semantics can go a long way. But what’s of greater significance is the idea of the Web Of Data, where resources are “semantically” interconnected, by leveraging information that’s mapped to the domain of knowledge where it’s useful and the relationships between resources are also specified in a machine-understandable way.
Rel=”me” is the equivalent of saying “The person represented by this URL is the same person as the person represented by this other URL.” Taking that into consideration, imagine how this would effect the experience of searching the “Web of Documents.” I argue that if enough of us implement rel=”me” (or other microformats or RDFa) in our HTML pages, we will empower the Googles and Yahoos to take advantage to knowledge expressed by this markup. So let’s do it!
Quotes from the Article I mentioned:
“…So assuming that you went through the trouble to write up your HTML with rel=me, what next, where is that information actually consumed. I don’t think the 2 most popular browsers (IE 7 and Firefox 2) at this time have native support for XFN, I hear Firefox 3 is suppose to have native microformat support but I haven’t looked for it and if it is there, it isn’t immediately obvious to me. The closest thing I can find is a Firefox plugin called Operator. Operator is a microformat capable reader and for the most part seems to be able to consume most of the above microformat standards except rel=me, kind of odd but kind of understandable…”
“…At this time, I can honestly say that XFN rel=me proliferation is limited and experimental at best. It would take a while for mass adoption to happen and requires a lot of user education, adoption by popular social sites like Facebook, MySpace, etc, and native browser support…”
I commented there and when I take the time to write a long comment out, that isn’t something I’ve already written in so many words here, I like to steal my own comment and put it here for anyone who reads my blog. My response:
I felt like I had to chime in and point out that the point of MicroFormats or RDFa isn’t really to make an overnight change in how we use the Web. It’s to create a backbone of linked data so that as Search Engines and other “Libraries” begin to have stores of these relationships between documents and other resources available to work with, they can begin to improve their services. It will be nice when Search is only partly based on scanning for text-strings or combinations of words.
If you were looking for Andrew in Sebastopol, CA, how would you do it? Perhaps you’d google “Andrew Sebastopol CA…”
But what if you could specify that you are looking for a person?
What if you could specify geocoding info or otherwise specify that Sebastopol is a town in Northern California?
What if you could filter your results by the time web-pages were created or filter by domain specifications (like show me wiki articles first or show me all MySpace profiles) or filter by type of site like say, show me blogs only, and finally, and this is where rel=”me” comes in, what if you could specify in your search results that you want to see every other document that is an expression of the same person, once you have selected from your query, a person named Andrew who lives in Sebastopol, CA? This is what it’s all about. It works because links work backward. In other words, you can already say “show me all the pages that link to this thing…” but what about being able to say “show me all the pages linking to this Twitter page that link using rel=”me” or better yet, show me all the pages linked to with rel=”me” from any page that links to this twitter page with rel=”me” …And so on…
The Web is becoming a library. By adding microformats and other semantic markup to our documents, we are making it possible for decent “card-catalogues” to be built, whether they’re being built by google, yahoo! or the guy down the street.
A weekly roundtable discussion about the DataPortability Project in specific, and efforts involved in data portability in general. The show is produced and hosted by J. Trent Adams and Steve Greenberg.
I recommend Episode 7
We kick off episode 7 of the DataPortability: In-Motion Podcast with the news of the week that MySpace launched “Data Availability” with Yahoo!, eBay, Photobucket, and Twitter. Following immediately on their heels was the announcement that Facebook is releasing “Facebook Connect”, an extension of their 3rd party API providing deeper access to their user’s data.
We’re also joined by Brady Brim-Deforest, founder of Human Global Media, talking about the DataPortability Legal Entity Taskforce. He provides a good overview and update on the process underway to formalize the the project under a recognized legal banner.
The featured interview segment is with Danny Ayers, Semantic Web Developer at Talis. He touches on moving from document linking, through microformats, to feature-rich RDF modeling to identify portable data. Contrary to popular belief, he dispels the myth that it’s hard to migrate from a standard SQL data representation into addressable semantic objects.
Danny regularly posts on the following sites:
- Talis: N2 Blog
- Talis: Nodalities Blog
- This Week’s Semantic Web
- DataPortability & Me Video
Also mentioned in the episode:
- Semantic Tech Conference (Danny is speaking)
- Talis: Semantic Platform
- Tim Berners-Lee’s Giant Global Graph post
- Tabulator data browser
Yahoo! is working on a Semantic Search platform. That’s all I know. I suspect that it will be cool.
HERE’s an interview Paul Miller did with Peter Mika from Yahoo Research for the Talking With Talis Podcast.
I heard about this through Lawrence Lessig’s blog. Professor Lessig is taking the month of May off, and off the grid, which I applaud him for.
I wonder if these guys are going to implement any Semantic technologies into the data they store… I wonder if they’re going to make deals with bookmarking services like del.icio.us… All my words could automatically be links to mini-libraries of items I’ve bookmarked! It’d look a little ugly given the current style conventions but hey. Let’s change those.
It’s interesting to me to ponder how this non-semantic-web service, because it’s also a library/bookmarking tool, could become hugely useful to the Semantic Web as they snatch up web user’s resources/web-bibliographies.
Oh man. This is a hot item!
Great job, Danny. That’s funny.
I’ve mentioned before how increasingly the ‘Live Web’ or ‘Blogosphere’ (or whatever you want to call this thing) is being infiltrated by Robot Blogs. What they appear to be doing is crawling the web and scraping excerpts of blog posts and reposting the excerpts, linking back to where it came from. They usually say:
“[KeyWord] wrote an interesting post today”
Since they link back to the blog post they scraped, they show up as a trackback in the comments area of the original post. This way, the unsuspecting blogger is linking to the fake blog. The fake blogs seem to be set up in an attempt at monetizing traffic via adsense ads.
I googled the phrase “wrote an interesting post today” and the top hit was (I probably am the top hit now) some blogger talking about filtering any comment that contains the phrase “wrote an interesting post today.”
I had decided to change my little tagline thingy to this exact phrase as a sort of inside joke for bloggers, but found myself wondering if being associated with that phrase will adversely effect my findability. Perhaps Search Engines or Spam Filters will begin to look out for that phrase?
Already, I bet there are tons of bloggers who filter out comments containing words like “viagra” or “casino,” assuming that there is absolutely no context in which these words could be used in a legitimate discussion. The fact that I am using those words here is proof that there is such a thing as a legitimate discussion which contains them.
Filtering for a word or phrase seems to me to be a slippery slope, especially if we’re talking about Search Engines, since they act as our main interface to the Web.
Google: Please don’t hate me because I said Viagra. I’m not a spammer.
My friend threw together an app that scrapes your MySpace contacts and puts useful info into a reusable format.
DOWNLOAD IT HERE. (ZIP FILE)
UPDATE: It’s also available as a Torrent via The Pirate Bay. Please consider seeding this. It’s a tiny, tiny file.
Here’s the Read Me info I just put together to go with it:
and change those.
LEAVE THE QUOTES IN PLACE
Save the file.
Upload these two files to your server.
point your web browser to http://where-you-put-the-file-on-your-server/ms_test.php
and what will result is a CSV file of all your MySpace friends and their demographic information. Also included is the URLs to “send message” etc, and some other useful things.
View the source of the page and copy it into a PlainText text file
Name the text file with the extension .csv
Now you should be able to work with your myspace friends in Excel
There is nothing malicious about this simple application. No viruses, spyware etc. It only does what it’s supposed to do: scrape your friends so you can more easily work with your social network data.
If you are of the camp that feels that people scraping their own myspace contacts is unethical, I suggest that you consider that all the pages are already available and the data they contain is rendered in HTML which can be freely accessed already. This is just a tool to make it easier to get the useful data separated from the clutter.
Finally, this is possibly against MySpace’s Terms Of Service, so use at your own risk.