The 4 hottest trends in data science for 2020

Prasad Bashapakala • 10 June 2020

The 4 Hottest Trends in Data Science for 2020

Companies all over the world across a wide variety of industries have been going through what people are calling a digital transformation. That is, businesses are taking traditional business processes such as hiring, marketing, pricing, and strategy, and using digital technologies to make them 10 times better.

Data Science has become an integral part of those transformations. With Data Science, organizations no longer have to make their important decisions based on hunches, best-guesses, or small surveys. Instead, they’re analyzing large amounts of real data to base their decisions on real, data-driven facts. That’s really what Data Science is all about — creating value through data.

This trend of integrating data into the core business processes has grown significantly, with an increase in interest by over four times in the past 5 years according to Google Search Trends. Data is giving companies a sharp advantage over their competitors. With more data and better Data Scientists to use it, companies can acquire information about the market that their competitors might not even know existed. It’s become a game of Data or perish.

Google search popularity of “Data Science” over the past 5 years. Generated by Google Trends.

In today’s ever-evolving digital world, staying ahead of the competition requires constant innovation. Patents have gone out of style while Agile methodology and catching new trends quickly is very much in.

Organizations can no longer rely on their rock-solid methods of old. If a new trend like Data Science, Artificial Intelligence, or Blockchain comes along, it needs to be anticipated beforehand and adapted quickly.

The following are the 4 hottest Data Science trends for the year 2020. These are trends which have gathered increasing interest this year and will continue to grow in 2020.

(1) Automated Data Science

Even in today’s digital age, Data Science still requires a lot of manual work. Storing data, cleaning data, visualizing and exploring data, and finally, modeling data to get some actual results. That manual work is just begging for automation, and thus has been the rise of automated Data Science and Machine Learning.

Nearly every step of the Data Science pipeline has been or is in the process of becoming automated.

Auto-Data Cleaning has been heavily researched over the past few years. Cleaning big data often takes up most of a Data Scientist’s expensive time. Both startups and large companies such as IBM offer automation and tooling for data cleaning.

Another large part of Data Science known as feature engineering has undergone significant disruption. Featuretools offers a solution for automatic feature engineering. On top of that, modern Deep Learning techniques such as Convolutional and Recurrent Neural Networks learn their own features without the need for manual feature design.

Perhaps the most significant automation is occurring in the Machine Learning space. Both Data Robot and H2O have established themselves in the industry by offering end-to-end Machine Learning platforms, giving Data Scientists a very easy handle on data management and model building. AutoML, a method for automatic model design and training, has also boomed over 2019 as these automated models surpass the state-of-the-art. Google, in particular, is investing heavily in Cloud AutoML.

In general, companies are investing heavily in building and buying tools and services for automated Data Science. Anything to make the process cheaper and easier. At the same time, this automation also caters to smaller and less technical organizations who can leverage these tools and services to have access to Data Science without building out their own team.

(2) Data Privacy and Security

Privacy and security are always sensitive topics in technology. All companies want to move fast and innovate, but losing the trust of their customers over privacy or security issues can be fatal. So, they’re forced to make it a priority, at least to a bare minimum of not leaking private data.

Data privacy and security has become an incredibly hot topic over the past year as the issues are magnified by enormous public hacks. Just recently on November 22, 2019, an exposed server with no security was discovered on Google Cloud. The server contained the personal information of 1.2 Billion unique people including names, email addresses, phone numbers, and LinkedIn and Facebook profile information. Even the FBI came in to investigate. It’s one of the largest data exposures of all time.

How did the data get there? Who does it belong to? Who is responsible for the security of that data? It was on a Google Cloud server, which really anyone could have created.

Now we can rest assured that the whole world won’t be taking down their LinkedIn and Facebook accounts after reading the news, but it does raise some eyebrows. Consumers are becoming more and more careful of who they give their email address and phone number out to.

A company that can guarantee the privacy and security of their customer's data will find that they have a far easier time convincing customers to give them more data (by continuing to use their products and services). It also ensures that, should their government enact any laws requiring security protocols for customer data, they are already well-prepared. Many companies are opting for SOC 2 Compliance to have some proof of the strength of their security.

The entire Data Science process is fueled by data, but most of it isn’t anonymous. In the wrong hands, that data could be used to fuel global catastrophes and upset everyday people’s privacy and livelihood. Data isn’t just raw numbers, it represents and describes real people and real things.

As we see Data Science evolve, we’ll also see the transformation of the privacy and security protocols surrounding data. That includes processes, laws, and different methods of establishing and maintaining the safety, security, and integrity of data. It won’t be a surprise if cybersecurity becomes the new buzzword of the year.

(3) Super-sized Data Science in the Cloud

Over the years that Data Science has grown from a niche to its own full-on field, the data available for analysis has also exploded in size. Organizations are collecting and storing more data than ever before.

The volume of data that a typical Fortune 500 company might need to analyze has gone far past what a personal computer can handle. A decent PC might have something like 64GB of RAM with an 8 core CPU and 4TB of storage. That works just fine for personal projects, but not so well when you work for a global company such as a bank or retailer who have data covering millions of customers.

That’s where cloud computing enters the field. Cloud computing offers the ability for anyone anywhere to access practically limitless processing power. Cloud vendors such as Amazon Web Services (AWS) offer servers with up 96 virtual CPU cores and up to 768 GB of RAM. These servers can be set up in an autoscaling group where hundreds of them can be launched or stopped without much delay — computing power on demand.

A Google Cloud data center

Beyond just compute, cloud computing companies are also offering full-fledged platforms for Data Analytics. Google Cloud offers a platform called BigQuery, a serverless and scalable data warehouse giving Data Scientists the ability to store and analyze petabytes of data, all in a single platform. BigQuery can also be connected to other GCP services for Data Science. Using Cloud Dataflow to create data streaming pipelines, Cloud DataProc to run Hadoop or Apache Spark on the data, or using BigQuery ML to build Machine Learning models on the huge datasets.

Everything from data to processing power is growing. As Data Science matures, we might eventually Data Science being done purely on the cloud due to the sheer volume of the data.

(4) Natural Language Processing

Natural Language Processing (NLP) has made its way firmly into Data Science after huge breakthroughs in Deep Learning research.

Data Science first began as an analysis of purely raw numbers since this was the easiest way to handle it and collect it in spreadsheets. If you needed to process any kind of text, it would usually need to be categorized or somehow converted into numbers.

Yet it’s quite challenging to compress a paragraph of text into a single number. Natural language and text contain so much rich data and information — we used to be missing out on it since we lacked the ability to represent that information as numbers.

Huge advancements in NLP through Deep Learning are fueling the full-on integration of NLP into our regular Data Analysis. Neural Networks can now extract information from large bodies of text incredibly quickly. They’re able to classify text into different categories, determine sentiment about text, and perform analysis on the similarity of text data. In the end, all of that information can be stored in a single feature vector of numbers.

As a result, NLP becomes a powerful tool in Data Science. Huge datastores of text, not just one-word answers but full-on paragraphs, can be transformed into numerical data for standard analysis. We’re now able to explore datasets that are far more complex.

For example, imagine a news website that wants to see which topics are gaining more views. Without advanced NLP, all one could go off of would be the keywords, or maybe just a hunch as to why a particular title worked well versus another. With today’s NLP, we’d be able to quantify the text on the website, comparing entire paragraphs of text or even webpages to gain much more comprehensive insights.

For a technical overview of the most important advancements in NLP over the past few years, you can check out the guide by Victor Sanh.

Data Science as a whole is growing. As its capabilities grow, it’s embedding itself into every industry, both technical and non-technical, and every business, both small and large.

As the field evolves over the long term, it wouldn’t be a surprise to see it democratized at a large scale, becoming available to many more people as a tool in our software toolbox.

< Older Post Newer Post >

New routes launched as part of points-based immigration system

by Prasad Bashapakala • 16 January 2021

What is the points-based system? Free movement between the EU and the UK ends on 31 December. From 1 January the new UK points-based immigration system will allow the UK to attract the brightest and best talent from around the world. Those wanting to come into the UK to work from 1 January 2021 will be awarded points for a job offer at the appropriate skill level, if they speak English, and for meeting the appropriate salary threshold. Skilled worker visas will be awarded to those who gain enough points. What routes are being launched on 1 December? A number of routes will open on 1 December 2020 ahead of free movement ending: Skilled worker - The Skilled Worker route is for employers to recruit people to work in the UK in a specific job. A Skilled Worker must have a job offer in an eligible skilled occupation from a Home Office-approved sponsor. Intra-Company Transfer - The Intra-Company Transfer route is for established workers who are being transferred by the business they work for to do a skilled role in the UK. Graduate ICT - The Intra-Company Graduate Trainee route is for workers who are being transferred by the business they work for to undertake a role in the UK as part of a structured graduate training programme. Global Talent - The Global Talent route is for people aged 18 or over in the field of science, engineering, humanities, medicine, digital technology or arts and culture who can show they have exceptional talent or exceptional promise. Innovator - The Innovator route is for a person seeking to establish a business in the UK based on an innovative, viable and scalable business idea they have generated, or to which they have significantly contributed. The application must be supported by an endorsing body. Start-up - The Start-up route is for a person seeking to establish a business in the UK for the first time. The person must have an innovative, viable and scalable business idea which is supported by an endorsing body approved by the Home Office. PBS child dependent and PBS partner dependent – Routes for persons seeking to come to the UK as a dependent partner or dependent child of a Skilled Worker Detailed information on the routes are available here. Skilled Worker Route To be applicable for the skilled work visa, applicants must obtain 70 points Applicants must meet the following mandatory criteria in addition to passing the relevant UK criminality checks: the applicant must have an offer of a job from a licensed sponsor; the job must be at or above the minimum skill level: RQF3 level or equivalent (Alevel or equivalent qualification). Workers will not need to hold a formal qualification. It is the skill level of the job they will be doing which is important the applicant must speak English to an acceptable standard Meeting this criteria will give people 50 points. A further 20 points must be obtained through “tradeable” points through a combination of points for their salary, a job in a shortage occupation or a relevant PhD. How to obtain 70 points Offer of job by approved sponsor (Mandatory) - 20 Job at appropriate skill level (Mandatory) - 20 Speaks English at required level (Mandatory) - 10 Salary of £20,480 to £23,039 or at least 80% of the going rate for the profession (whichever is higher) (Tradeable) - 0 Salary of £23,040 to £25,599 or at least 90% of the going rate for the profession (whichever is higher) (Tradeable) - 10 Salary of £25,600 or above or at least the going rate for the profession (whichever is higher) (Tradeable) - 20 Job in a shortage occupation as designated by the Migration Advisory Committee (Tradeable) - 20 Education qualification: PhD in a subject relevant to the job (Tradeable) - 10 Education qualification: PhD in a STEM subject relevant to the job (Tradeable) - 20 Dependants Skilled workers will have the right to bring dependants. Dependants are spouses, partners and children (below the age of 18 at point of entry) and their application is linked to that of the main applicant. Global Talent Applicants must be endorsed by a recognised UK body, as approved by the Home Office. Individuals can apply to one of the following endorsing bodies who will verify their expertise before they can apply for a visa: The Royal Society, for science and medicine; The Royal Academy of Engineering, for engineering; The British Academy, for humanities; UK Research and Innovation, for science and research; Tech Nation, for digital technology; Arts Council England, for arts and culture. Once endorsed, subject to criminality and immigration checks, migrants are given a highly flexible permission, enabling them to work for employers or be self-employed; change jobs without informing the Home Office; travel abroad and return to the UK for research purposes; and bring dependants with them. Intra-Company Transfer and Graduate Intra-Company Transfer The route will require applicants to be in roles skilled to RQF6, and subject to a different minimum salary threshold from the main Skilled Worker route. It will not be subject to English language requirements but will be subject to a requirement that the worker has been employed by the sending business for a minimum period prior to the transfer (12 months in the case of intra-company transfers or three months in the case of intra-company graduate trainees). Applicant must: have been employed by the sending business for at least 12 months prior to date of transfer (three months in the case of Graduate Trainees) meet minimum salary requirement (currently £41,500) meet “cooling off” requirement Start Up The Start-up route is for those setting up a business for the first time, who need to work to support themselves while developing their business ideas. Applicants can be individuals or teams. Each applicant must have the support of an approved Endorsing Body. Endorsing Bodies are either Higher Education Providers or business organisations who have a track record of supporting UK based entrepreneurs and the support of a Government Department. Endorsing Bodies assess each application to ensure it is innovative, viable and scalable, and are responsible for monitoring the progress of the businesses they endorse. Innovator Route The Innovator route is for those with industry experience and at least £50,000 funding, who can dedicate their working time to their business ventures, or those moving from Start-up who are progressing their business. Applicants can be individuals or teams. Like the start up route, each applicant must have the support of an approved Endorsing Body. What routes were previously opened? Health and Care visa – Was opened on August 4 and is a fast-track visa route for eligible health and care professionals. Student and Child student - Was opened on October 5 to allow eligible international students to study in the UK.

How technology can curb the spread of covid 19

by Prasad Bashapakala • 10 June 2020

As governments explore different options for reducing the spread of COVID-19 after lifting some shelter-in-place and essential-only orders, gamification may be an answer. People are familiar with gamification principles used to inspire behaviors like weight loss or increased movement, but they also can be used to encourage many other behaviors by focusing on helping people achieve their individual goals. For example, China’s Health Code app displays a colored “badge” to represent the health status of an individual: Green for the ability to freely travel and yellow or red to indicate the person should alert authorities. While there are concerns about the app’s transparency and data collection, similar options with transparent criteria for badge colors might work in other countries. covid-19 Remote Work During COVID-19 and Beyond A cross-functional panel of Gartner experts discusses remote work challenges and best practices. “If governments are to reopen schools and workplaces and allow the resumption of social interaction, we need a fine-grained approach to minimizing the risk of COVID-19 transmission, while maximizing freedoms on an individual basis,” says Brian Burke, Research VP, Gartner. “We need technology solutions to enable transmission-reduction strategies, automate monitoring and drive scale quickly.” There are two types of approaches to using technology to fight COVID-19. Proactive technology encourages behaviors to prevent the spread, including washing hands and social distancing. Reactive technology, which has been a focal point for many countries, focuses more on activities after a risk incident has occurred, such as contact tracing and quarantine enforcement. Contact tracing: Privacy versus protection Earlier this year, Singapore launched the TraceTogether app, which uses Bluetooth to trace interactions between users of the app. The app stores data on individual phones, but in the event of a positive COVID-19 result, authorities will request the data to alert those who may have been exposed. As a by-product of contact tracing, the total number of close contacts is also counted. Reducing and containing the spread of the virus at a macro level will require limiting the maximum number of social interactions to a level that is within the healthcare system’s ability to cope. Imagine if everyone had a limited budget of close contacts to manage as they like. Different people would choose different activities as their top priorities. Leaders can look to apps that focus on gamification as a way to encourage specific behaviors like hand washing Gamification offers the opportunity to do the things that are the most important and skip those that matter less. Right now, governments are deciding between allowing people to go to restaurants versus attending a choir practice or participating in a team sport. People should be given the opportunity to decide for themselves. As countries move away from total lockdowns, contact tracing will remain a key part of reducing transmissions. In addition to this functionality, enterprise architects and technology innovation leaders can look to apps that focus on gamification as a way to encourage specific behaviors like hand washing or reduced social interaction. Gamification to curb the spread of coronavirus Gamification can be used to encourage more preventative behaviors and reduce transmission altogether. These behaviors should include: Social distance Good hygiene Screening For social distancing, an app could provide a “score” for how well a user minimized contact with others and award points for things like visiting the grocery store during off-peak hours or going for a walk in the park with friends versus sitting down for dinner at a crowded restaurant. It’s possible to reward low-risk individuals with access to specific buildings like a gym When it comes to hygiene, the app could remind people to wash their hands every two hours, or offer rewards for disinfecting door handles or using hand sanitizer after being on public transportation. For screening, it’s possible to reward low-risk individuals with access to specific buildings like a gym or restaurant and offer those at a higher risk access only to essential buildings like grocery stores or pharmacies. “Gamification can let people choose to participate in the activities that are a high priority to them, and forgo those that are not. It can provide people with a budget of social contacts to manage themselves,” says Burke.

Make the most of the season by following these simple guidelines

by websitebuilder • 10 June 2020

The new season is a great reason to make and keep resolutions. Whether it’s eating well or cleaning out the garage, here are some tips for making and keeping resolutions.

These data and analytics technology trends will have significant disruptive potential over the next

Gartner top 10 data analytics trends

10 June 2020

These data and analytics technology trends will have significant disruptive potential over the next three to five years.