Applying Topic Modelling on JobsPikr Data to Create Robust Job Matching Service

machine learning banner

By: Preetish Panda August 11, 2017

Hiring the right candidate and matching the skills required for the job requirement is a major bottleneck for any organisation. The problem is huge and there are continuous efforts to mitigate the same by different companies. According to this TechCrunch post, there has been a proliferation of new startups in the recruitment space with investors doling out large sum of capital. Now, if we look at the ways through which companies get candidate profiles, it’d be quite evident that there are thousands of job boards and recruitment agencies to get in touch with prospects apart from own career section in the official site. While the initial wave of job marketplaces and boards solved the problem of creating talent pool for companies, another problem emerged when the aggregators started to scale both in terms of recruiters and applicants. That problem is closely associated with the ability to match the right candidate with the right employer. Needless to say that a job board’s competitive advantage lies in providing highly accurate matching service. In order to achieve that, we need to consider the following primary factors:

  • Ability to populate fresh job listings
  • Building strong algorithms for matching of the candidates with the jobs

In this post, we will explore how that can be done by using the job data provided by JobsPikr. For those who might not be familiar with JobsPikr, it is a job data delivery platform powered by PromptCloud’s proprietary machine learning technique that performs automated crawls on daily basis to extract job data directly from company sites.

As we can see that the job data delivered by JobsPikr is from the source of truth, i.e., company websites, there is more accuracy, freshness and comprehensiveness. Thus, the data quality is far superior than the data you get by crawling other job boards. This aspect of JobsPikr handles the first factor mentioned above. More on this is topic can be found out in our previous post.

Coming to the crux of this post, i.e., solid algorithm for job matching we need to first understand the data fields delivered by JobsPikr. Here are the data fields that you would get from a typical job listing:

  • URL
  • Job post date
  • Company name
  • Job title
  • Job text

Matching using ‘job title’ is quite straightforward, but the ‘job text’ is a gold mine for us in this context. This brings us to the concept of Topic Modelling and how it can be applied to the ‘job text’. Note that this post doesn’t cover the complete job matching process, rather it focuses on Topic Modelling, which can be used to strengthen the existing matching process.

What is Topic Modelling?

In the simplest terms, Topic Modelling can be considered as an unsupervised machine learning technique that can be used to analyse, annotate, organize and search large volume of unlabeled text. This technique is particularly useful for finding latent pattern in large collection of text by extracting cluster of words that are closely related and frequently occur together. For example, a good topic model would create the following word cluster for Education related topic – “learn”, “teacher”, “college”, ” career” and the following for Business related topic – “finance”, ” corporate”, “investment”, ” acquisition”. Essentially, the topics are created by cluster of words and the documents are created by different topics. The key factor is that documents won’t belong to a single topic — they would be a mixture of different topics. That means a single document can be formed by combinations of multiple topics. Here is an illustration:

Topic modelling flow

Given below is the illustration that depicts the relationship of words, topics and documents:

Topic modelling relationship

Although you can find a lot of articles on this subject, here is a great paper to learn more.

How to Apply Topic Modelling to JobsPikr Data

There are various algorithms (Explicit semantic analysis, Latent semantic analysis, Latent Dirichlet allocation, etc.) and libraries available for topic modelling that can be applied to the text corpus. As discussed earlier, the ‘job text’ delivered by JobsPikr contains the job description along with qualification and other associated details available in a job listing. We can consider the ‘job text’ of each listing as documents and perform topic modelling to find out the underlying topics of job listing from different geographies, industries and companies. For example, we applied LDA (Latent Dirichlet allocation) on sample of the Amazon job listings to arrive at the following topics:

Rplot - Topic modelling Job Listings

As you can see, four different topics have been formed based on the content of ‘job text’. One topic for the subsidiaries like shopbop and East Dane, another one the equal opportunity employment, finally third and fourth topic for engineering and management related job listings. Once the topics are identified, we can find out the probability of different words being generated from the topics and the per-document-per-topic probabilities (estimation of percentage of words in a document being generated from a topic). This means various sections of the candidate profile (e.g. skills, experiences, education) can be analyzed to find out the maximum probability of finding them from the documents. This probability estimation is nothing but the closeness of the candidate profile to the job listing. For example, someone with skills such as scripting, GIT, AWS, Jenkins, etc. would have a high probability for a topic that can be labelled as “DevOps” and this particular topic would have maximum contribution to the document (job text of a listing) with requirement for DevOps engineer. Similarly, various sections of candidate profile can be given weight-age in terms of closeness match to finally arrive at the most suitable opportunity.

Over to You

As personalisation is a key factor in any business domain, the recruitment industry must also adopt the same by deploying cutting edge technologies. One facet of personalisation would be showing candidate the right job opportunity via machine learning techniques. Currently, there are numerous open source ML frameworks released by tech giants that can be used by any company to solve their business problems.

In this post, we covered why job matching is crucial to any job board and how the data delivered by JobsPikr can be instrumental in building a solution that can provide competitive edge to the aggregators. Now it’s time for you to explore options that can help you make your job board highly relevant to both job seekers and employers.

Leave a Reply

Your email address will not be published. Required fields are marked *