Our next speaker in the Know Thy Speaker series is Kuldeep Jiwani
His talk is titled – Topological space creation and Clustering at BigData scale
What got you into Data Science/AI?
I see Data Science has the potential to move the entire human race upwards in the knowledge pyramid, thus it can prove beneficial for intellectual growth of entire population. In the knowledge pyramid, Data Science / AI are one of the topmost layers. If the machines can take over and automate majority of the difficult mundane jobs, then the people who get free from regular jobs would have the opportunity to learn more and move up the knowledge chain. This influx would push the people higher in the pyramid to further go higher. Thus encouraging innovations and in-turn raising the knowledge pyramid to new heights.
Popularly Data Science is seen as a human replacer, but I believe it as a human enabler and this motivates me to create new things and this is the reason that got me into DataScience / AI.
What do you think is the biggest challenge faced by Data Scientist today?
In today’s world there is an increasing need for DataScience in multiple areas of text, speech, image, etc. But the traditional Machine Learning techniques needs labelled data, which is hard to obtain in majority of the industries. Majority of the data scientist struggle for a labelled dataset to train their models. With time this problem is going to increase, as new datatypes and new use cases will arrive and labeling data by humans would be nearly an impossible task.
So the need of the hour is to either create techniques which can either generate labels on their own like Word2Vec, Doc2Vec, etc use supervised deep learning but build the labels themselves. Or the other option is to move to unsupervised techniques that can discover information on their own and move to Artificial Intelligence techniques that can discover information on their own, like a relation between input data to a label.
So building such systems that can literally auto-train without human input is a formidable challenge for Data Scientists.
What got you interested to present at this conference?
We recently applied Machine Learning techniques to one of the tougher industries for ML i.e. Cyber Security. Here even the problem definition is not fixed, as security is a subjective topic and can mean multiple things to multiple people. Also the diversity of data that should be monitored encompasses many kinds of logs and for a normal enterprise this can easily go to many TBs of data to be analysed constantly. Just to add to it security breaches are a rare event and hence no labelled data is available.
We want to share with everybody, the technical challenges faced in solving such a problem via Unsupervised Machine Learning techniques. So that others can take ideas from it and do more innovations in DataScience to take the community to higher levels. There couldn’t have been a better platform than ODSC that is popular amongst majority of the DataScience community, hence we chose it to share here.