Content Curation Marketing, Content Scraping…What’s The Difference?

Content curation marketing is a great way to diversify you business blog’s content supply. Essentially, you borrow a little content from here and there, add links back to those sources, cite who you’re borrowing from, and frame it within your own context.

Content scraping is when a website automatically rips your writing off or a blog author knowingly copies your work, word for word, and republishes it on their blog. They might change a few words to throw you off and make themselves feel like they aren’t cheating the system, but the structure of the post and the points being made are yours.

So, does that mean it’s scraping when your content is stolen without attribution, but it’s content curation when the proper attribution is given? Some marketers and bloggers group content curation with content scraping. Some call it “overrated plagiarism” or “the arterial plaque of the Internet.” For all its detractors, though, content curation has just as many advocates, if not more.

Let’s look at why scraping and curation get tangled up, how you can make sure your conversion strategies don’t make you act like a scraper, and how to really sock it to ’em if you find somebody scraping your content.

How Content Scraping Affects Your Website

To see how scraping hurts you, you have to understand why content curation helps you.

Attributed content is okay because it helps Google link your content back to you and someone else’s content back to them. Google typically gives all parties involved extra ranking credit if they share each other’s content, attribute properly and are in good standing with Google’s indexers. Content syndication and curation usually helps all parties involved by making everyone involved look better, based on everyone’s good reputations combined.

Scrapers hurt your website because Google sees the same content in two places, without any explanation for why and how it got there. When there’s no clear attribution, Google’s search indexing robots start to suspect something fishy, and they attempt to punish the scraper by comparing post timestamps. If for some reason Google can’t determine on its own that your site originally made the post and the scraper site stole it from you, Google just dishes out punishments to everyone involved.

Syndication and content curation might be similar to scraping, but proper attribution makes all the difference. The sites you link to aren’t the only things that matter when it comes to curation; you have to keep an eye on spammy sites linking back to you. It’s a two-way street, which makes it all the more confusing.

Content Curation Marketing Practices That Won’t Make You Look Like a Scraper

Do you remember how often you had to cite your sources for school essays you wrote way back in the day? Citation ensures that your readers know where you found and borrowed someone else’s idea. Attribution serves exactly the same purpose. Clear attribution links back to the original source must be added to curated content, or you’re committing a flagrant act of scraping, punishable by total search engine delisting…basically, death.

Mashable has an excellent primer on how to be a valuable content curator. I love their concise points, but I’m going to try and wrap them all up in one little idea:

Curated content must be cited within the context of something you’ve created.

It sounds a little gimmicky, but it’s easy in practice. When you curate content, you shouldn’t be just dumping someone else’s text on to your site. You can borrow direct text in clearly-defined quotations, or you can borrow ideas you found from someone else. You can post excerpts from a news piece, or snippets of someone’s points from their blog post, for example.

When you do this, you absolutely must link back to the sources, and you need to explain to your readers why you’re borrowing this content: what does it mean to you; why are you sharing it with them? Providing your own input helps to advance the conversation and add your voice to the content you’re sharing: that’s the whole point of content curation marketing.

Content Syndication: An Exception To The Rule

Content syndication sites are a special exemption to the rules of content curation, and you should use these sites to your benefit. Syndication sites take your content in its entirety and republish it, word for word, on their site. It’s not considered content scraping because syndicators provide extensive attribution links back to the author of the piece, the site it was originally hosted on, and all other relevant information. Syndicators also republish work that makes sense within their own context, so their audience is already qualified and will be interested in your content from the start.

Content syndication is a powerful tool you can use to increase your traffic and your conversions with minimal effort. Syndicators put your content directly in front of their audience, which could include thousands of people that don’t know who you are…yet. Links pointing directly back to your website are generously provided, and readers that find you through a syndication service can find you quickly and easily.

The biggest downside to syndication is when the syndicator is larger and commands more authority than your own site. If they naturally rank better than you, your content might actually boost your syndicating site above your own site, even though they’re using your content. You don’t necessarily want to boost your syndicator’s SERPs ratings above your own because it could make your site look somehow less authoritative by comparison.

Content Scrapers Making Your Life Miserable? Here’s How To Fight Back

When I looked around for advice on how to deal with scrapers, I found this great discussion from Webmaster World about a real-life scraper situation. There are a lot of great suggestions in this thread, and I’m going to add a few of my own tricks to give you a complete toolkit for dealing with annoying scrapers.

The Cease And Desist Escalation Method

If you find a scraper stealing your content for their webpage, you can look up any documented data on who owns the website with a simple, easy WHOIS search. This can potentially net you the first and last name, IP address, web hosting company, and other vital info on the jerk that owns the offending website. Once you’ve got their info, use a preformatted C&D from  PlagiarismToday and send a letter to the site owner, with a CC: to the site’s hosting company.

Unfortunately many scrapers already disregard standard legal processes, and if their hosting company doesn’t care either, you may have to take matters into your own hands.

Sneaky Trick #1: Blowing Up Scrapers’ Websites With Hidden Embeds

If you’ve got a pesky scraper that is copying all of your content verbatim, without hosting your images on their site, your scraper is doing some seriously bad stuff to your site. They’re not only stealing your content, but they’re using up your monthly bandwidth by “hotlinking” your images and leaching your web server’s bandwidth. You can use this to your advantage and really mess with scrapers, though.

Make an image file on your privately-controlled server that is a simple 1×1 blank pixel. Embed that tiny pixel in the middle of your content, or in multiple places throughout the content as “bait.” If and when your scraper strikes again, check the source code for your stolen content on their site, and look for those pixels you embedded.

If they’re still there, your plan will work. Edit your invisible pixels out of your own content, then make another image file with the same filename and file type as the 1×1 pixel invisible square. Make it a massive image—thousands of pixels tall and wide, something as disruptive as possible. Replace the invisible pixel file with your new disruptive mega-picture, and make sure it matches the tiny pixel image’s file name.

Once you upload it, refresh your scraper’s website. If they’re hotlinking to you, your hidden pixels will make their content page blow up, and your stolen content becomes completely unreadable.

You can also set up an automatic anti-hotlinking countermeasure with some simple server-side file editing. Courtesy of brand designer David Airey, here’s a full write up on how to protect your site from hotlinking with htaccess editing.

Sneaky Trick #2: Impersonate Your Scraper and Get Yourself In Trouble

Another clever trick is to make a fake Google account, then hop on Google’s Support Forums impersonating your scraper. Make a bunch of threads asking “why did my site lose ranking? Why am I not getting any more search traffic?” Provide links to your scraper’s website, and basically disrupt Google’s support forums as much as possible under the guise of your scraper.

This will put your scraper’s site directly in front of Google moderators and engineers, and grabs their attention to your scraper’s dirty dealings. This crafty method forces Google to deal with your scraper problems, and can potentially put the scraper on a fast-track to total delisting in search results.

Cool Trick #3: Help Google Fight Back By “Donating” Scrapers

As a final note, one member of that scraper discussion linked to a form Google made for content creators. Google’s own Matt Cutts tweeted back in August of 2011 “We need datapoints for testing,” and included a link to Google’s own scraper reporting page. If you find a scraper stealing your content, “donating” your scraper will help Google study scrapers’ tendencies and build a better search algorithm that identifies and punishes scrapers.

When it comes to content, borrowing content is okay in small amounts, but you really need to produce a lot of your own material. If writing and producing new content isn’t your forte, or you just don’t have time to do it as much as you’d like to, our professional content writing services can fill in the blanks in your business blogging calendar. Leave the writing to us, and save the content curation—the fun part—all to yourself!

The following two tabs change content below.
Andrew Glasscock is currently based in Nashville, Tennessee. He graduated with a BA in English, specialized in Creative Writing, with a minor in Marketing this past May. Along with copywriting, he loves being an improv comedian, playing frisbee, and dogs.

Related Posts:

Comments

  1. Andrew – Thanks for taking time to write this piece. I was intrigued by your point on Impersonating a Web Scraper. This is the first time I’ve heard this suggestion and I’m curious how much success you or your clients have seen from this approach.

    We have access to a lot of data that seems relavant to this discussion. Forgive me if this sounds self-promoting. Our content protection network detects and blocks a significant number of the more sophisticated web scrapers and hackers that duplicate content. These bots typically access a victims’ content from 100′s-1,000′s random IP addresses. They’ll republish the duplicated content on websites that will only be online for a short period of time.

    Because content and data published online can have a limited shelf-life, it’s the first few days after publishing new content that digital publishers want to protect the content from scrapers.

    How quickly does it take for Google to respond to your impersonation efforts?

    Thanks agains for your post.
    Sean
    Distil Inc.

  2. You make a lot of great points here, Andrew! Really enjoyed this post.

    I think that something to add when distinguishing between scraping and syndication is that true, honest-to-goodness syndication sites will never just grab content from a website without permission. Permission is key. At Business 2 Community, before we will turn on a site’s feed, we ask for their express permission to do so. This way both parties can ensure that we’re on the same page.

    What we also find is that we will sometimes rank above an original site, but it’s usually when the exact title is searched. When other search terms and parameters are used, that’s not necessarily the case. :)
    Renee DeCoskey was just talking about…Steal These Mobile Stats: Mobile Usage is On the RiseMy Profile

Speak Your Mind

*

CommentLuv badge