I have a dataset in csv format. Most columns have numbers in them. The data is mostly number based, with little text.
A sample of my dataset:
Test_name Execution_date User_name Total_tests Passed_tests Failed_tests
test1, 01-02-2020, Marshall, 15, 3, 12
test2, 06-11-2021, Bruce, 6, 5, 1
test3, 08-10-2023, Mathers, 5, 3, 2
test4, 07-06-2023, Three, 2, 1, 1
I want to build a semantic search engine that can answer questions like the following:
Which user has executed the maximum number of tests?
On what date did a particular user execute tests?
Which date observes the maximum number of failed tests?
What I have tried:
I tried Amazon Kendra, Azure Cognitive Search, ChatGPT, Bard, built my own semantic search engine from scratch, but none of them are performing well on numeric data. They're all more text based.
One idea I have is, if I change the numbers to text, will the models be able to predict well?
So for my requiement, what would you suggest me to try? Please shed some light on this.