Data Scientists, the “unicorn” of data: what are they, what do they do and how will they change the world?

5 min reading
11 March 2015
Data Scientists, the “unicorn” of data: what are they, what do they do and how will they change the world?
Data Scientists, the “unicorn” of data: what are they, what do they do and how will they change the world?

BBVA API Market

Data Scientist, el ‘unicornio’ de los datos: ¿qué es, qué hace y cómo cambiará el mundo?

“A person who is better at statistics than any software engineer and better at software engineering than any statistician.” This is how Josh Wills, Director of Data Science at Cloudera, defined data scientists at a conference he gave in 2012 entitled “The life of a Data Scientist”. In a world where specialization is an essential value, this professional profile has become a sort of Michelangelo of the 21st century applied to data.

Big companies like Google, IBM, Facebook, HP, Oracle, Amazon or LinkedIn navigate every day in the world of Big Data seeking for competitive advantages. And the key to this process is Data Science: improved algorithms that allow to save costs, improve recommendation or search systems, modernize industrial processes, control the levels of risk… and, of course, transform the business model of any company.

The data scientist must have knowledge in applied science, with an extensive experience in its industry, and training in science (supervised and unsupervised learning…). This allows him to produce creative solutions based on judgment. “The data scientist goes far beyond the Power Point to which we are used to in the world of innovation: his responsibility begins with the design of a prototype with the technologies best suited to the problem at hand (Hadoop, MongoDB, Spark, Python, R), and ends up with the supervision of its implementation into production,” says Sergio Álvarez Teleña, Head of Global Strategies & Data Science in BBVA, Global Markets.

José Antonio Guerrero, data scientist a the Hospital Universitario Virgen del Rocío, Seville, and one of the leading experts in the sector, believes that this professional profile requires “to have experience in one or more sectoral areas, in order to be able to interact with managers of companies, raise hypotheses, help interpret results and implement the analysis of information.”

Markets and sectors demanding data scientists

Due to this special blend of experience and knowledge, finding professionals that meet the market’s challenges is complicated. So much so, that the industry refers to them as “unicorns”. However, the demand for information and training in this field is increasing. A simple search on the jobs platform Indeed shows how the interest in this discipline has come a long way since 2011.

"Data scientist", "Data science" Job Trends graph

"Data scientist", "Data science" Job Trends "data Scientist" jobs"data Science" jobs

The truth is that the area of Big Data is growing above 50% annually. In 2011 it increased 59% and in 2012, 58%. Market forecasts point that the sector will reach 38 billion dollars in sales in 2015; 45 billion in 2016, and will break all records in 2017, totaling 50 billions.

The US, UK and Israel, the three hottest spots in international innovation, are and will be the countries more strongly committed to these profiles, financing teams and projects. In Spain it is more difficult, except for those companies that are dedicated to provide these services. “The fear of the unknown and comfort are the two main reasons why this country doesn’t evolve at the pace of other developed economies… but it will come – we have a surplus of talent,” predicts Álvarez.

In the United States, for instance, the sectors using Data Science are: analytics, software development, consulting, energy, financial services, education and research, advertising and media, infrastructure and new talent recruiters. Below is a graph with the distribution of this discipline’s demand in the US:

Data from data-scientists-count.silk.co

Despite this demand, there is a shortage of educational qualifications. “The best alternative is to start with a basic training in statistics or computer engineering and complement it with postgraduate studies, in order to obtain a broader spectrum of skills,” says Guerrero. For Álvarez, “the best place in the world to learn all this is the UK PhD Centre for Financial Computing & Analytics, a very elitist institution in terms of talent that can only be accessed with a grant from the British government.”

The tools of a data scientist

The best virtue of a professional who wants to specialize in Data Science is his creativity and ability to achieve optimal solutions to problems; develop libraries and tools to improve business. To be able to do this it is essential to master different programming languages, such as R, C ++, Python, Matlab and Pascal. “My first choice is usually R. It has a huge set of general and specialized packages that cover most of the needs for the analysis of information,” explains Guerrero.

The C ++ language is often used to mitigate the shortcomings of R, an effective management of objects in memory, or the speed on loop-type operations. For the development of deep learning projects, Python developers can find a very interesting professional field. There are also other options such as Pascal, a quick solution for complex data structures.

And why is this useful? Success cases

The Big Data and Data Science environment has become a closed circle, where leaks which may give advantages to competitors are avoided. In many cases, companies develop successful solutions that they don’t publicize for fear of being copied. Companies like Google, Microsoft, Facebook, Twitter or financial institutions of half the planet are investing billions to lead their markets.

Today, more than half of the stock exchange transactions are executed based on algorithms that don’t require the intervention of a person. “I started in an equities department in Europe, and now I have a team of data scientists with which we control the innovation in trading and e-commerce for all the assets of BBVA globally,” says Álvarez. “Many describe this commitment to innovation as a remarkable success in change management. In fact, my unit has been the only initiative presented by Global Markets to some internal awards for excellence. My goal now is to spread the knowledge of Data Science to other areas, so that together we can squeeze the most out of this new discipline,” he says.

Social networks like Facebook analyze the interaction of users within their platform to determine the brand or product image of companies. This analysis is vital to the business model of many companies, so its commercial exploitation is a huge source of funding.

Microsoft, for example, used Data Science to improve Kinect, the system that allows playing video games with the body. The multinational resorted to the scientific community to improve its recognition system of body gestures. Alfonso Nieto Castañón, a Spaniard who currently works at the University of Boston, won twice the challenge launched by the Redmond-based company. In that same line of work, data scientists have developed applications for the recognition of sign language or for rehabilitation exercises in the field of medicine.

Where is Big Data headed?

The Future of Data Science lies in what we might call the Trinity of Big Data: volume, differentiation and speed. Until recently, any data analysis solution had as one of its priorities the management of large volumes of data. Today, the heterogeneity of the data (text, image, video, conversations in forums, social networking and mobile applications…) and the speed with which it is generated represent many more problems.

Therefore, “we must be prepared to work with NoSQL databases and distributed storage systems,”explains Guerrero. In addition, companies increasingly demand real-time solutions. “We are currently developing online incremental learning techniques to solve situations in areas such as online advertising or automatic trading, or in the security field to detect network intrusions,” he adds.

Other advances in recent years include the increase of parallelizable solutions. Companies are increasingly relying on OpenMP to optimize multithreaded processors; Hadoop and MapReduce, for multinode clusters; and, more recently, the emergence of Spark, which allows “to manage in-memory data a hundred times faster than Hadoop for algorithms computing,” explains Guerrero.

The future is in the hands of data scientists, and businesses not able to surf the great wave of Big Data will lose the great battle of knowledge.

BBVA – Follow us on @BBVAAPIMarket

It may interest you