There are different aspects of research process. To find and use the right dataset is also important, especially for the projects related to Natural Language Processing. In most of the NLP projects it is crucial to use real dataset to get effective implementation and better outcomes.
Developing Dialog Management Systems is one of the such projects. To build conversational agent which is capable of handling complex dialog turns and continue human-like conversation, dataset used to train the models also should contain some conversational patterns. For example, text of messages, dataset from the messengers such as WhatsApp, Facebook or online customer service are good examples for possible datasets. Additionally, dataset should reflect the characteristics of the project domain. For example, chatbots of the touristic companies must be able to handle common customer requests, related to the touristic places, etc.
In my work, I am planning to use real human to human messages in Azerbaijani language. It is noisy dataset which contains misspellings, noise of internet data and incomplete sentences. Additionally, agglutinative nature of Azerbaijani language, for instance, having several morphological forms of the same words, should also be considered. On the other hand, this dataset has applicable for chatbots structure, in the form of questions and answers.