How to scrape Google Autocomplete with Excel

Oct 20, 2016 | Tools
How to scrape Google Autocomplete with Excel

If you have read something on this blog in the past, you should already know I like scraping with Excel (here I managed to scrape Twitter followers); if not, well, I guess you’ll learn it now.
Without wasting time on what web scraping is, and supposing you already know why scraping Google Autocomplete would be useful for SEO and not only, I’m going straight to show you how to scrape it without coding just using Excel combined with:

Before we start, an important statement you can find here on Google’s Official Webmaster blog:

In the interest of maintaining the integrity of autocomplete as part of Search, we will be restricting unauthorized access to the unpublished autocomplete API as of August 10th, 2015

We all know Google don’t like scrapers, even if they do it:

So pay attention to what you do since you might easily have your IP temporarily or permanently banned from using the search engine.

That said, let’s go.

Step 0 – Excel set up

Build your Excel file like mine:
excel-1

In cell A2 you’ll type your query; in cell B2 the language code.

Please note I’m using an Italian version of Excel, I’ll try to use single quotes (‘) and commas (,) for the English version. If something doesn’t work try using double quotes (“) and semicolon (;).

Step 1 – Where to get the data from

This is very simple, you can send GET requests to this address

http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=yourquery

Here we have to notice 3 things:

  • output=toolbar: this query-string specifies you want results in XML format
  • hl=en: this asks for results in English language
  • q=yourquery: of course you have to change yourquery with an actual query (words concatenated using plus (+) symbol – for example my+query+is+silly

Step 2 – Downloading data

Just open your Excel file and use this function in cell D2, available with SEO tools for Excel:
=DownloadString('http://suggestqueries.google.com/complete/search?output=toolbar&hl='&B2&'&q='&A2)

It downloads the entire content of the page and you’ll get an answer like this (here I’m using poker as my query):

<?xml version="1.0"?>
<toplevel>
<CompleteSuggestion><suggestion data="pokeradar"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker hands"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker rules"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker face"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerstars"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerus"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker games"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="pokerap"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="poker table"/></CompleteSuggestion>
</toplevel>

Step 3 – Getting rid of XML syntax

We want to extract the text from that XML, so we start deleting the code we don’t need:
=RegexpReplace(
RegexpReplace(
DownloadString('http://suggestqueries.google.com/complete/search?output=toolbar&hl='&B2&'&q='&A2),
'((.*)toplevel>)?<(/?Complete)?suggestion((\s)data=)?>?()?',
''
),
'/>',
';')

We get something like this:

"poker hands";"poker";"poker rules";"poker face";"pokerstars";
"pokerus";"poker games";"pokerap";"poker table"

Isn’t it cool? Yes, but we want to separate each item of the list…

Step 4 – Separating items

In cell C2 you use LEFT and FIND functions:

=RegexReplace(LEFT(D2,FIND(D2,';',1)),'"','')

We’re finding the first semicolon in the string and keeping what’s at its left, also deleting the double quotes; in C2 you’ll have:

poker hands

Step 5 – Iterative extraction

Once we have extracted the first item, we remove it from the string so to obtain a string containing the last 9 terms; in cell D3 type:

=RIGHT(D2,LENGHT(D2)-FIND(';',D2,1))

With this formula we are finding the first semicolon and keeping only what is at it’s right; in D3 you should have:

"poker";"poker rules";"poker face";"pokerstars";
"pokerus";"poker games";"pokerap";"poker table"

So now in cell C3 you can extract the first term of this string (the second of the original one) by typing:

=LEFT(D3,FIND(D3,';',1))

From now on just drag down C3 and D3 till row 11 and you’ll have in cells C3-C11 each of items.

If you have done everything correctly, this magic will happen:

trump

Happy scraping (and if you want to thank me I’ll accept links and shares).


Comment