Background
Human-machine conversation is one of the most important topics in artificial intelligence (AI) and has received much attention across academia and industry in recent years. Currently dialogue system is still in its infancy, which usually converses passively and utters their words more as a matter of response rather than on their own initiatives, which is different from human-human conversation. Therefore, we set up this competition on a new conversation task, named knowledge driven dialogue, where machines converse with humans based on a built knowledge graph. It aims at testing machines’ ability to conduct human-like conversations.
About the Challenge
1. Task Description
Given a dialogue goal g and a set of topic-related background knowledge M = f1 ,f2 ,..., fn , a participating system is expected to output an utterance "ut" for the current conversation H = u1, u2, ..., ut-1, which keeps the conversation coherent and informative under the guidance of the given goal. During the dialogue, a participating system is required to proactively lead the conversation from one topic to another. The dialog goal g is given like this: "Start->Topic_A->TOPIC_B", which means the machine should lead the conversation from any start state to topic A and then to topic B. The given background knowledge includes knowledge related to topic A and topic B, and the relations between these two topics.
2. Dataset
The background knowledge provided in the dataset is collected from the domain of movies and stars, including information such as box offices, directors, reviews and etc, organized in the form of {entity, propery, value}. The topics given in dialogue goal are entities, i.e., movies or stars.
The data set includes 30k sessions, about 120k dialogue turns, of which 100k are training set, 10k are development set and 10k are test set. You can download the data set from the Dataset page after signing up.
3. Evaluation Metrics
3.1 Automatic Evaluation Metrics
(1) F1: char-based F-score of output responses against golden responses, the main metric for dialogue systems.
(2) BLEU: word-based precision of output responses against golden responses, the auxiliary metric for dialogue systems.
(3) DISTINCT: diversity of the output responses, the auxiliary metric for dialogue systems.
Based on the evaluation results, we will rank all systems on the leaderboard.
3.2 Human Evaluation
The top 10 models on the leaderboard will be evaluated by humans on criteria including coherence, consistency, proactivity, etc. The final rankings and winners will depend on the human evaluation results.
4. Baseline Systems
Open-source baseline systems will be provided, Please refer to source and task description for details. Baidu AI Studio provides free GPU Cluster and baselines: retrieval-based .
Participation Info
1. Eligibility
The challenge is open to all individuals, research institutions, colleges, universities, and enterprises in related field.
2. Registration
Please click the register button on the top right corner to sign up. If you have any questions, please email us or scan the Q-code on the right to ask.
*Teams who registered and submitted valid results will get a Memorial T-shirt for each member.
3. Registration Deadline
March 31st, 2019
Scan QR code to join group chat
Timeline
Feb 25
Registration open, partial data release
Mar 31
Registration close, full data release, test1 data available
May 13
Test2 data available
May 20
Testing results submission due
May 25
Top 10 systems code submission due
Jun 15
Final results announcement, system report submission
Jul 31
Camera-ready Submission Deadline
Aug 24
Workshop and award ceremony
Oct 2
NLPCC 2019
Awards Setting
The challenge will award one First Prize, two Second Prizes and two Third Prizes. Winners will get the award certificates issued by CCF& CIPS . The prizes and travel grants for attending the workshop and award ceremony will be sponsored by Baidu Inc..
¥30,000
award certificate
¥20,000
award certificate
¥10,000
award certificate
*Notes:
1. All prizes are inclusive of taxes.
2.The award requires participants to provide their system reports (including method descriptions, system code & data, references, etc.) and name lists of team members.