You’ve made it. After a lot of effort, your product is finally beginning to work. When your first users signed up you performed a customized follow-up, you listened to what they said about your service, and you made all the necessary changes to make sure that they were totally satisfied. Word-of-mouth began to pay dividends, and now you have some thousands of users signed up and the number continues growing. But are you going to be able to continue with this customized follow-up of all your users? How are you going to know which things are working best and which ones you need to change?
Of course you’re aware that there are web analytics products out there like Google Analytics, MixPanel and GoSquared. What’s more, you’ll almost certainly have reached that point, and you may have already incorporated some analytics solution. The problem is that although you’re sending a lot of your user data to these tools, you still can’t get the reports you need. Another point is that you’d like to be able to cross-reference the web data with the information from the CRM and your invoicing system, and your web analytics tools don’t give you what you need.
Don’t panic. You’re brilliant at Excel. So you ask your development team to make you a small integration and export the data to Excel, where your pivot tables and your almost magical formulas will be able to extract the information you’re looking for.
From Excel to Big Data
After a while you realize that there are simply too many data, and that it’s impossible to work on all of them in Excel. And although making aggregates or even correlations is not too difficult, as soon as you want to analyze the churn (users who have unregistered) or make a sales forecast, things get complicated and your beloved Excel starts to feel the strain.
As you really want to stay up to date, you’re certain to have heard that there are people out there using something called “Big Data” as a way of solving this type of problem. In fact you don’t really know that much about what “Big Data” actually is, but recently you’ve even seen it mentioned in news broadcasts, so it must be something important.
Unfortunately Big Data is still very complicated to use. Working with vast quantities of data or with real-time data occurring at breakneck speed normally requires several servers, with specialized applications that need personalized configuration settings for each use case.
When the news broadcasts talk about Big Data they tend to focus on the fantastic applications it has –which are all true–, but they don’t tell us about the vast amounts of work required to set up a Big Data system, maintain it and make it grow as your volume of data increases.
In practice, having a dedicated Big Data system in your organization is going to mean you need someone specialized in your team who dedicates a large part of his or her time to the system. And the worst thing of all is that even though you only want to see the data from the analysis two or three times a day, you still have to pay for your servers for when you want to use them.
Google BigQuery as a solution
Here’s where Big Data analytics as a service comes in. Instead of installing and managing the servers on your own, a provider offers you a Big Data system configured, optimized and operating in what is known as the cloud. All you have to do is send them your data and ask them to do the operations you want when you need it, either in real time or in batch operations.
This category includes Google BigQuery, a large-scale data analytics service. You send the data from your systems to BigQuery, and BigQuery stores them for when it needs to consult them. When you want to make an analysis, BigQuery provides you with a mechanism that allows you to make any query and obtain the results in seconds, regardless of your volume of data.
The best thing is that the administration cost is zero. You only need to pay a minimal price to store your information in BigQuery, and then another sum for each query operation you make, with a querying capacity of one terabyte of data free each month. If you don’t know what a terabyte is, all it really means is that most BigQuery users only pay for storage, and not for queries, as the amount of data that can be queried free each month is very generous.
The power of BigQuery is based on three fundamental pillars:
● An internal data structure based on columns which allows the data to be stored and consulted in a very efficient way.
● Exhaustive use of the cloud, making all the queries in parallel in order to achieve the greatest speed. It’s not uncommon for a single query to be carried out in parallel in hundreds of servers, each one processing a small amount of information, before providing the final answer. Google does this by using its vast infrastructure to achieve response times that would have seemed incredible just a few years ago.
● The SQL query language. Although the infrastructure of Google BigQuery is not that of a conventional database, the query language it uses is exactly the same, so if you’ve already used SQL in the past, you’ll find BigQuery quite simple to use.
You can send practically any type of data to Google BigQuery. You can send data on your CRM, your analytics, your invoicing system, tracking your order shipping, and then cross-reference data from all of them to extract information and patterns. If you’re more technical, BigQuery can intake data in CSV format (data organized in rows and columns as in Excel) or data in JSON format (structured and nested data to represent complex information hierarchies). These data can be sent from time to time (every hour, once a day…) or streamed, so they can be analyzed in real time (where “real” means just a few seconds). If your requirement is in the realm of milliseconds, then you’ll need another solution.
As we mentioned earlier, the queries are made using SQL, and can be done directly through the web console or automatically by integrating it into your systems.
BigQuery can be integrated with practically any modern programming language. Google provides libraries for Java, .Net, PHP, Python, Ruby, JavaScript, Objective-C and Go. If your favorite programming language is not on this list, you can also use an API REST and send your data from any environment.
Practical applications
And don’t think analytics is only good as a service for analyzing online data. There are people using BigQuery to solve problems in the field of the genome, the Internet of Things, user acquisition in online/offline marketing campaigns , and fleet estimates for transport companies. Provided your system is capable of sending data to BigQuery, you can analyze them regardless of their context.
In terms of costs, if you were using BigQuery for a detailed analysis of the traffic on your website, and you had a website with one million pages viewed monthly, the cost of storing these data would be around 10 dollars a month for each year stored (you can always export and delete old data if you’re not going to use them), and the cost of your queries depends mainly on how you’re going to use the system, but should not typically be much more than five dollars a month. In total, around 15 dollars a month for having this solution running.
Of course there are users with much greater analytical needs, particularly when working in environments other than websites where the volumes of information tend to be greater; in this case the costs increase proportionately, but always remain fairly reasonable for the service being provided.
Lastly, once you get results you’re certain to want to see a visualization of your data. In this case, Google BigQuery doesn’t offer any kind of facility, but it allows you to integrate directly with Excel, from where you can send queries and share the results using your preferred graphics. Or if you need more sophisticated visualizations you can always integrate BigQuery with third parties like Tableau, Looker or Bime.
This is one of the examples of the new possibilities that open up when two emerging technologies like cloud and Big Data come together and make available to any SME functionalities which were previously only within reach of companies with far more resources.
By Javier Ramírez:
Javier Ramírez is the founder of teowaki, where he offers consulting services on Big Data, development in the cloud, and data analytics as a service through https://datawaki.com. He is also an authorized trainer for Google Cloud Platform and can help you if you need training in Google’s cloud platform.