Most of us know about the Netflix recommendation System Challenge that offered $1,000,000 to anyone who cracked it. But before we get too far, let’s cover our bases first. What is a recommendation system? Simply put, it is a system that gives us recommendations based on the data that it has collected from us, and other users like us, over a course of time. These systems today, work in areas like movies, music, news, research articles, search queries, restaurants, hashtags, and more. Almost every website that we visit these days (of which most are free), collect some data. In reality, these websites are not free. We are paying them, not in the form of money, but in the form of data. And this data is used by the websites, so as to provide better recommendations or is sold to other websites, who want to provide better recommendations, using their massive repositories of data on individuals. Often, when we search for something, on the web, we find the most relevant information or links at the top. These results are unique for a particular individual and would be different for different users. Although we do not think about this for a second time, it is all because of recommendation systems, that feed on our data and decide which results, fit us best.
How do Job recommendation Systems Work?
These days, job boards have started using the same recommendation systems and similar technologies to show better results. The reason is simple. Thousands of companies post jobs, and job details. At the same time, millions of candidates and job seekers post their resumes on these websites. The better a website can match these jobs to the respective seeker, the better the chances of a conversion, and the more popular will the website become, through referrals and word of mouth. Traditional methods can only show results when crunching lesser amounts of data. Job portals previously used decision-trees (a system based on yes or no answers to a series of questions, that would help match a person to a job posting), and did their best, on a set of predefined conditions. The reason for adopting such an archaic method was that data scraping was still slow, memory and processing power was costly, and machine learning was yet to boom. Today, the scenario has changed. No more are static rules being applied to match people to jobs. Instead, dynamic algorithms are in place. These algorithms study the resumes of people, collect data, convert it into a structured form, and then try to match it to any pre-existing resume in the database. Once it finds the closest match, it fetches the job recommendations that it had shown to the closest match, and displays them. Once a job recommendation results in a conversion and the candidate submits a confirmation, the company or the job posting is moved higher in the view list or priority list. Using this system of closest matching, an interconnected web is formed, that changes every second, as more and more data is poured into the system. With every job posting that is matched correctly and gets converted, the system undergoes self-training, also known as reinforced learning.
What role does data play?
Although most people think of recommendation systems as Machine Learning Systems and Artificial Intelligence and complicated algorithms that run on huge servers, what they end up forgetting is the data that drives it all. Netflix had provided a training data set of more than one hundred million ratings that almost five hundred thousand users had given to approximately eighteen thousand movies. Can you even imagine what a massive data-set that was? But is it the largest data-set ever used for boosting artificial intelligence? No. The 1000 Genomes project makes its 260 TB of human genome data available publicly for use. And you can use it to run your own algorithms. Data-sets like these are growing every day, and with the era of big data, our databases will soon shift from terabytes to petabytes. At this booming age in data-science, you need to get to the root by collecting enough data first. And it is a continuous process- You need to keep collecting and cleaning, and restructuring data, so that the intelligent systems, and machine learning algorithms that your data-scientists write, can run better, build more efficient models, and provide superior recommendations.
How can websites improve their recommendation systems?
So you have a website where you post jobs and help connect companies to candidates. Scraping job boards is not enough for you. You need to scrape tech-websites and news portals so as to know the latest buzzwords in the industry. You need to make sure that you are scraping job postings of trustworthy companies. When an applicant comes to your website, he will submit his resume, that you will process. Irrespective of that, try to collect other information that might help your algorithm. Try to get other information from him, by making everyone fill a standard data form, but leave an empty text box too, to let everyone write something special about themselves. You never know that which data or which attribute in the database will help you reach a breakthrough in the field. So it is best if you have all the data.
Make sure that you hire data scientists and engineers with experience in building recommendation systems so that your ideas can come to light faster. When you are trying to improve a pre-existing recommendation system, keep the following points in your mind:
- Decide: How do you want your recommendation system to improve? Do the results need to be more accurate, diverse, fast or something else?
- Hypothesize: What is the change, that you can most easily make to your system that would improve the metrics that you have selected above? This could even be a new pre-processing step or a completely new algorithm with some post-filtering. It might even be the integration of current data source with a completely new one, such as from some web pages that you have been scraping.
- Experiment: Implement your hypothesis from the previous step. Run the algorithm and see how the results differ from before. For some metrics, you could run this as an offline test on a static dataset. For other metrics (or once the offline tests work out), then you’ll want to run an A/B test. An A/B test is one, where you provide 50% of your users with one scenario and the rest with the other and see how it plays out. In this case, you would be letting half your users use your new algorithm, and half the older one. This is for testing which one works better in real time.
- Measure: Are the metrics improving? If yes, time to head to your data-engineers and get the infrastructure in place to get the new algorithm running. If not- time to go back to the second step.
While recommendation systems maybe not new to job portals, their underlying technologies are constantly changing based on the changes in the sources of data, and how much data the websites can gather and scrape.