So, what is semantic search?
Although the words “semantic search” is being used in different contexts these days, what is important to understand is that it means searching not just using keywords, but also using logic and the context in which the keyword was used. This can mean multiple things. When we are searching for a keyword, we will get all content having that keyword. Since any word can be used in dozens of different ways, it is highly likely that we will have to manually filter the results we receive, to find the most suitable matches.
What if we don’t just search for keywords, but also the context in which they have been used, or the words that the keyword is usually associated with. For example, we want to search for all the wars that the USA has participated in. Searching for just the words “USA” and “war” together might not make much sense, but instead creating a set of words, and creating custom rules to find certain subsets of them within close proximity might tend to give us finer results. The example I gave you was a crude one, but a semantic search is somewhat similar and uses all that it has to find matches and not just some keywords only.
How are recruitment data and job listings gathered?
Today, the international job market is growing at a faster pace than ever. Gone are the days when companies used to advertise jobs in daily newspapers. The bigger or more prominent companies put up their job listings on their own websites, whereas the rest take to different social media websites or job boards to put up their job listings. In fact today, there are different types of job boards. Ones that cater only to startups, or to a particular niche, etc.
Getting data from so many sources is a difficult task altogether. On top of that, most companies hire for different types of roles whereas many hire for different geographical regions. This is why web scraping has grown to become the number one way to gather job data in large quantities. However, the data needs to be filtered by region, industry sector, role and more. This filtering is usually done by simple keyword matching or finding techniques.
For scraping the data, the first point is that one must have a list of websites that need to be scraped. Secondly, the regex of the web-pages in each website that contains job feeds or listings need to be found. Only after this can the actual job posts be scraped one by one using various filters. Also, job posts are something that is updated every day, and sometimes even during the day. Due to this reason, jobs need to be scraped more frequently to keep your job feed updated at all times.
What role does semantic search play in scraping of job data?
Not all websites have structured data. If you scrape ten different job boards to aggregate job data, you might find more than ten different formats in which the data is stored in all of them. However, semantic search makes it easier by giving you more than just keywords to search for when scraping jobs.
When doing a semantic search, you will be searching not only for keywords but a set of keywords, of which some may be present in particular proximity. Like this, different other rules can be created to find search matches. Unlike normal keyword matches, these matches can you give you a percentage match- say 70% or 80%- which signifies how close it is to what you searching for. Then, depending on your risk tolerance, you can set a boundary limit of say 75% or 80%, and take all that have a higher match, as results for your semantic search.
What difference does it make as compared to using traditional methods?
The benefits of this are simple. Ten companies may be hiring for data science positions. But not all of them may have articulated their job posts in the same manner. Thus, you would need to use semantic search so that you can find all the data science job listings, even if the way in which the job has been described is a little different than convention. This way you wouldn’t miss out on jobs. Another plus point is that since your computer itself is doing the filtering for you, you can reduce the manual labor required to check for errors and if you keep using the semantic search engine over a long period of time, you can zero in on two things- the best algorithms for your semantic search as well as the boundary percentage which suits your problem best.
In an ideal situation, a semantic search would work, even if a job posting is in an entirely different language. But then you will have to wait for such algorithms and technology to come into being, and many of them are already in the development phase. Keyword searches, the bag of words approach, all these things have been used widely in the past. But in today’s world, where people are actively using machine learning to solve problems or create systems that can learn from their mistakes, it is better to use semantic search as opposed to keyword searches.
Semantic search is the way to go for filtering content. Keyword search is easy to trick and can go down in case it finds too many matches for a particular keyword. At the same time, it would be able to find outliers which have the same context but do not contain that particular keyword. Semantic search can be done using various NLP (natural language processing algorithms), and although none of them is perfect, they are getting better by analyzing more and more data.
If semantic search can be applied properly, it can revolutionize the recruitment sector since it would become much easier to find jobs in any place, in any sector, and for any post. It would be great for candidates as well as job boards and recruitment companies. It would bring companies and candidates closer and make recruitment an easier nut to crack altogether. With time, and with better algorithms, semantic search is transforming recruitment, one step at a time.