By: Posted on: August 3, 2018
Most of us know about the Netflix recommendation System Challenge that offered $1,000,000 to anyone who cracked it. But before we get too far, let’s cover our bases first. What is a recommendation system? Simply put, it is a system that gives us recommendations based on the data that it has collected from us, and other users like us, over a course of time. These systems today, work in areas like movies, music, news, research articles, search queries, restaurants, hashtags, and more. Almost every website that we visit these days (of which most are free), collect some data. In reality, these websites are not free. We are paying them, not in the form of money, but in the form of data. And this data is used by the websites, so as to provide better recommendations or is sold to other websites, who want to provide better recommendations, using their massive repositories of data on individuals. Often, when we search for something, on the web, we find the most relevant information or links at the top. These results are unique for a particular individual and would be different for different users. Although we do not think about this for a second time, it is all because of recommendation systems, that feed on our data and decide which results, fit us best.
These days, job boards have started using the same recommendation systems and similar technologies to show better results. The reason is simple. Thousands of companies post jobs, and job details. At the same time, millions of candidates and job seekers post their resumes on these websites. The better a website can match these jobs to the respective seeker, the better the chances of a conversion, and the more popular will the website become, through referrals and word of mouth. Traditional methods can only show results when crunching lesser amounts of data. Job portals previously used decision-trees (a system based on yes or no answers to a series of questions, that would help match a person to a job posting), and did their best, on a set of predefined conditions. The reason for adopting such an archaic method was that data scraping was still slow, memory and processing power was costly, and machine learning was yet to boom. Today, the scenario has changed. No more are static rules being applied to match people to jobs. Instead, dynamic algorithms are in place. These algorithms study the resumes of people, collect data, convert it into a structured form, and then try to match it to any pre-existing resume in the database. Once it finds the closest match, it fetches the job recommendations that it had shown to the closest match, and displays them. Once a job recommendation results in a conversion and the candidate submits a confirmation, the company or the job posting is moved higher in the view list or priority list. Using this system of closest matching, an interconnected web is formed, that changes every second, as more and more data is poured into the system. With every job posting that is matched correctly and gets converted, the system undergoes self-training, also known as reinforced learning.
Although most people think of recommendation systems as Machine Learning Systems and Artificial Intelligence and complicated algorithms that run on huge servers, what they end up forgetting is the data that drives it all. Netflix had provided a training data set of more than one hundred million ratings that almost five hundred thousand users had given to approximately eighteen thousand movies. Can you even imagine what a massive data-set that was? But is it the largest data-set ever used for boosting artificial intelligence? No. The 1000 Genomes project makes its 260 TB of human genome data available publicly for use. And you can use it to run your own algorithms. Data-sets like these are growing every day, and with the era of big data, our databases will soon shift from terabytes to petabytes. At this booming age in data-science, you need to get to the root by collecting enough data first. And it is a continuous process- You need to keep collecting and cleaning, and restructuring data, so that the intelligent systems, and machine learning algorithms that your data-scientists write, can run better, build more efficient models, and provide superior recommendations.
So you have a website where you post jobs and help connect companies to candidates. Scraping job boards is not enough for you. You need to scrape tech-websites and news portals so as to know the latest buzzwords in the industry. You need to make sure that you are scraping job postings of trustworthy companies. When an applicant comes to your website, he will submit his resume, that you will process. Irrespective of that, try to collect other information that might help your algorithm. Try to get other information from him, by making everyone fill a standard data form, but leave an empty text box too, to let everyone write something special about themselves. You never know that which data or which attribute in the database will help you reach a breakthrough in the field. So it is best if you have all the data.
Make sure that you hire data scientists and engineers with experience in building recommendation systems so that your ideas can come to light faster. When you are trying to improve a pre-existing recommendation system, keep the following points in your mind:
While recommendation systems maybe not new to job portals, their underlying technologies are constantly changing based on the changes in the sources of data, and how much data the websites can gather and scrape.