Click here to Skip to main content
15,946,320 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have a dataset in csv format. Most columns have numbers in them. The data is mostly number based, with little text.

A sample of my dataset:


Test_name Execution_date User_name Total_tests Passed_tests Failed_tests
test1, 01-02-2020, Marshall, 15, 3, 12
test2, 06-11-2021, Bruce, 6, 5, 1
test3, 08-10-2023, Mathers, 5, 3, 2
test4, 07-06-2023, Three, 2, 1, 1




I want to build a semantic search engine that can answer questions like the following:

Which user has executed the maximum number of tests?
On what date did a particular user execute tests?
Which date observes the maximum number of failed tests?

What I have tried:

I tried Amazon Kendra, Azure Cognitive Search, ChatGPT, Bard, built my own semantic search engine from scratch, but none of them are performing well on numeric data. They're all more text based.

One idea I have is, if I change the numbers to text, will the models be able to predict well?

So for my requiement, what would you suggest me to try? Please shed some light on this.
Posted
Comments
[no name] 24-Nov-23 14:09pm    
Simple LINQ / SQL query. How do you sharpen a pencil with a chain saw?
Apoorva666 24-Nov-23 23:44pm    
We don't know what exactly the user query is gonna be like, the question can be framed in any way, which is why I'm looking for a tool with semantic search capabilities. For example, "Which user has executed the maximum number of tests" can also be asked as "who ran the maximum number of test operations".
[no name] 25-Nov-23 12:09pm    
You're looking for "one size fits all" with an inadequate specification. You also keep asking the same questions ... expecting others to do the research and "learning" for you. And unless you accept that (currently) you need to "bin" your "numeric data", you will keep asking the same pointless questions.
Apoorva666 26-Nov-23 5:30am    
Look, I've done lot of research in this regard. I've spent months on this. If you read my question properly, you'll see that I've clearly stated "what I've tried". I'm still a student. I'm not experienced like you guys. I've got no mentors or anyone. My questions may seem silly or repetitive to you, but that is how we newbies who are confused about a lot of things question & eventually learn. And I'm asking for guidance, not the complete code which I can copy-paste, run in an IDE & get the perfect expected output. If you don't understand my question & if you're interested in helping out, do let me know what I need to be more clear about. If you don't have an answer, move on. No need to be so cold.
[no name] 26-Nov-23 15:15pm    
"Semantic Search" requires "richly structured source data" (e.g. XML). Yours is not "structured" in that sense.

1 solution

I think you don't understand what semantic analysis[^] are.

I'd suggest to start with Stanford's Univerity publication: Learning Word Vectors for Sentiment Analysis[^]. Then, use its Large Movie Review Dataset[^] to create your own model.

In other words - your dataset can not be used for semantic analysis.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900