Skip to content
Giuseppe Pastore: Italian SEO Consultant
  • Home
  • About
    • Featured on
  • Contact
  • Blog
  • Italian

How to scrape Google Autocomplete with Excel

If you have read something on this blog in the past, you should already know I like scraping with Excel (here I managed to scrape Twitter followers); if not, well, I guess you’ll learn it now.
Without wasting time on what web scraping is, and supposing you already know why scraping Google Autocomplete would be useful for SEO and not only, I’m going straight to show you how to scrape it without coding just using Excel combined with:

  • SEO tools for Excel
  • Regular Expressions – here an useful cheatsheet
  • Xpath – here a very complex explanation that we can sum up in Xpath is a query language for selecting nodes from an XML document

Before we start, an important statement you can find here on Google’s Official Webmaster blog:

In the interest of maintaining the integrity of autocomplete as part of Search, we will be restricting unauthorized access to the unpublished autocomplete API as of August 10th, 2015

We all know Google don’t like scrapers, even if they do it:

.@mattcutts I think I have spotted one, Matt. Note the similarities in the content text: pic.twitter.com/uHux3rK57f

— dan barker (@danbarker) 27 febbraio 2014

So pay attention to what you do since you might easily have your IP temporarily or permanently banned from using the search engine.

That said, let’s go.

Step 0 – Excel set up

Build your Excel file like mine:
excel-1

In cell A2 you’ll type your query; in cell B2 the language code.

Please note I’m using an Italian version of Excel, I’ll try to use single quotes (‘) and commas (,) for the English version. If something doesn’t work try using double quotes (“) and semicolon (;).

Step 1 – Where to get the data from

This is very simple, you can send GET requests to this address

http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=yourquery

Here we have to notice 3 things:

  • output=toolbar: this query-string specifies you want results in XML format
  • hl=en: this asks for results in English language
  • q=yourquery: of course you have to change yourquery with an actual query (words concatenated using plus (+) symbol – for example my+query+is+silly

Step 2 – Downloading data

Just open your Excel file and use this function in cell D2, available with SEO tools for Excel:
=DownloadString('http://suggestqueries.google.com/complete/search?output=toolbar&hl='&B2&'&q='&A2)

It downloads the entire content of the page and you’ll get an answer like this (here I’m using poker as my query):

<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion><suggestion data="pokeradar"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker hands"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker rules"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker face"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerstars"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerus"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker games"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerap"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker table"/></CompleteSuggestion>
</toplevel>

Step 3 – Getting rid of XML syntax

We want to extract the text from that XML, so we start deleting the code we don’t need:
=RegexpReplace(
RegexpReplace(
DownloadString('http://suggestqueries.google.com/complete/search?output=toolbar&hl='&B2&'&q='&A2),
'((.*)toplevel>)?<(/?Complete)?suggestion((\s)data=)?>?()?',
''
),
'/>',
';')

We get something like this:

"poker hands";"poker";"poker rules";"poker face";"pokerstars";
"pokerus";"poker games";"pokerap";"poker table"

Isn’t it cool? Yes, but we want to separate each item of the list…

Step 4 – Separating items

In cell C2 you use LEFT and FIND functions:

=RegexReplace(LEFT(D2,FIND(D2,';',1)),'"','')

We’re finding the first semicolon in the string and keeping what’s at its left, also deleting the double quotes; in C2 you’ll have:

poker hands

Step 5 – Iterative extraction

Once we have extracted the first item, we remove it from the string so to obtain a string containing the last 9 terms; in cell D3 type:

=RIGHT(D2,LENGHT(D2)-FIND(';',D2,1))

With this formula we are finding the first semicolon and keeping only what is at it’s right; in D3 you should have:

"poker";"poker rules";"poker face";"pokerstars";
"pokerus";"poker games";"pokerap";"poker table"

So now in cell C3 you can extract the first term of this string (the second of the original one) by typing:

=LEFT(D3,FIND(D3,';',1))

From now on just drag down C3 and D3 till row 11 and you’ll have in cells C3-C11 each of items.

If you have done everything correctly, this magic will happen:

trump

Happy scraping (and if you want to thank me I’ll accept links and shares).

Follow me on Twitter @zen2seo

This bug in Google Search Console’s robots.txt testing tool can be really dangerous
Why art galleries don’t understand SEO
3 COMMENTS
  • Vanny
    August 13, 2018 at 12:29 pm
    Reply

    Caro Signor Giuseppe
    volevo farle i miei complimenti per l’arguzia con cui affrontata il problema “scraping google suggestions” semplice ed efficacie.
    io programmo in basic “visual basic 2010 per la precisione”(che immagino per lei saranno “mattoncini lego”.. 🙂
    e vorrei chiederle se secondo lei potrei riuscire anche io a fare scraping passando per una windows form.
    ho navigato quasi ininterrottamente per giorni alla ricrca di un modo ed ho quasi perso le speranze, ma vista la sua competenza ho pensato di domandarle un consiglio.
    attualmente posso salvare la richiesta(le prime parole chiave) in un file txt
    e per me sarebbe il massimo riuscire a salvare le 20 o 30 risposte in un altro file txt.
    purtroppo non conosco nulla della tecnologia SEO 🙁

    secondo lei quale sarebbe la strada piu giusa da seguire per ottenere una macchinetta utile allo scopo?
    ci tenevo a precisarle che userei un timer per limitare le richieste in modo da non danneggiare google e non farmi bannare( il che mi preoccupa molto:).

    la ringrazio molto per la sua attenzione.
    Vanny

  • Sandy Miller
    February 3, 2019 at 9:46 am
    Reply

    Hi Giuseppe,

    Still not so sure on how these scrapped data is useful.
    Other than scrapping for users – which you have talked about in a previous post, what exactly would the kind of data above be useful for? Keywords?

  • yasin
    April 25, 2019 at 1:36 pm
    Reply

    hi, thanks for your tutorial. can you send me the excel file?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Why art galleries don’t understand SEO
  • How to scrape Google Autocomplete with Excel
  • This bug in Google Search Console’s robots.txt testing tool can be really dangerous
  • The first guide to Canonicalized link building
  • 10 SEO Experts haven’t the complete answer but this is needed to be an SEO (maybe)

Categories

  • Link Building
  • Personal
  • SEO
  • Social Media
  • Tools
Giuseppe Pastore © 2020
  • Twitter
  • LinkedIn