Scoop.it combines semantic analysis with human curation to help brands publish relevant content. Its technology crawls over 20 million pages across the web, analyzes them, and makes personalized content suggestions for users based on their areas of interest. Users can pick and choose the items they find interesting or relevant and publish them to their personal or company site. For Scoop.it, you are the content you publish.
Dealing with such amount of Big Data, we were curious to understand how they managed it all and what emerging trends Marc Rougier is noticing.
How much is Scoop.it's Big Data semantic technology highlighting great content pieces and how much is it driving the discovery of great content curators?
We have a philosophy we call “humarithm”. Humarithm implies that nothing is published by algorithms, that everything is controlled by humans, but that those humans collaborate. We rely on various algorithms to help humans find content. We use statistical and semantic algorithms, and leverage our unique event data base of user behaviours to correlate users to content to topics. These findings are then applied to what we call the “interest graph”: a living organization of meta data that helps us find and suggest content. (We don’t use this data to monetize our users, as we aren’t a media outlet. We use this data to qualify the meaning of content).
In what ways did the 3 V's of Big Data (volume, velocity and variety) change during the last 3 years at Scoop.it?
Well, Scoop.it is three years old so everything has changed. But the biggest change is of course volume, because we didn’t try to browse the web (bigger guys do this fairly well), but rather exploit human signals. When we started we had limited signals, but as we grow we're building a decent base of such signals.
Who is going to benefit more from the data volume increase - the content curators or the content creators?
Scoop.it is a tool that helps businesses publish content. Publishing content, aka having a content strategy, is a key requirement nowadays for businesses to exist online. Brand awareness, thought leadership, community engagement, traffic/SEO, lead generation... they all require lots of content. So we help businesses find, edit and distribute content. We are a publishing platform, but we are not a media outlet.
Publishing platforms that deploy a media business model leverage Big Data to predict what subjects they should publish about. In our case, the curator decides what he/she wants to publish (that's their marketing editorial line) and we leverage big data to find relevant content. In both cases, curators and creators have to fight humongous amounts of data. And how do they thrive in this crowded and noisy web? By having effective tools to help them leverage the data: either to know what to publish about (the creator’s case) or to find relevant content (the curator’s case). Curation and data analysis turn information overload from threat to opportunity.
Is content curation creating thought leaders or influential trend spotters?
Thought leaders exist independently from content curation. But content curation is part of their process. First, thought leaders are savvy; curating helps them stay on top of content. Second, they need to demonstrate their leadership by being avid publishers. Sharing is a much more valuable way to publish than writing. That's why thought leaders are often intense curators.
Since you worked in various environments - both in major companies such as IBM and in startups such as Meiosys - how do you think that the companies' attitude towards data analysis evolved during the last 10 years?
10 years ago data analysis was already a powerful science, but reserved to very specialized experts in specific vertical industries (manufacturing, transportation, finance, etc). What we see today is that data analysis has seemingly become a subject for everyone. The underlying required infrastructure has been commoditized. The need to actively play the data game has apparently become mandatory across the industry spectrum, with verticals such as media, health or e-commerce generating strategic assets out of big data. As a result, more affordable and usable tools as well as new experts and solutions such as BIME have arisen.
Which industries are using the most content curation?
The media industry is obviously the most important curator. Successes such as the Huffington Post or Upworthy are great examples. But there are active curators in every industry. It's part of the marketing arsenal of any company that’s at all mature with regards to e-things (inbound marketing, SEO, etc). An enterprise that wants to gain top-of-mind on the internet need to be an intense, multi-channel publisher; and they benefit from being curators. For example, there are very active curators in the Health, IT, Energy, Insurance, Automotive or the Tourism industries. Education is also a massive provider for curators since curating is about filtering, organizing and sharing knowledge.
How much do you think that virality is the result of a well-designed curation targeting and data analytics strategy rather than the outcome of a Youtube overnight success?
Spontaneous virality exists and that's one of the delights of the internet; instant, serendipitous glory. At the same time, engineering the virality of bad content is a seriously challenging task, so it's a good idea to start with a intrinsically viral content. Repeated success in viral distribution only comes from a very thorough, organized and always improving process. And the process itself relies on an increasing amount of increasingly real time data. Miracles do exist; and good data scientists, tools and processes do more than help.
A big thanks to Marc Rougier for his time and for sharing his insights!