Aim: To improve the accuracy of Chinese word splitting.
Background: With the development of Internet technology, people want to get some effective medical information from the Internet, but there are still technical difficulties for non-specialists. At the same time, the level of medical construction can not keep up with the demand of patients for medical treatment, the phenomenon of doctor-patient conflicts has not been fundamentally solved, and the problem of difficult consultation prevails. With the arrival of the era of big data and artificial intelligence, medical Q&A has been applied.
Objective: In order to meet the user's need to get the correct answer as soon as possible, medical Q&A needs to have high execution efficiency. The accuracy of Chinese participle directly affects the execution efficiency of Q&A. Improving the accuracy of Chinese participle can fundamentally improve the accuracy of medical Q&A and shorten the answering time.
Methods: Improvement of the Chinese Segmentation Algorithm based on BI-LSTM-CRF using natural language processing technology. Based on the same medical Q&A dataset, the medical Q&A is trained and tested under three commonly used segmentation algorithms and the segmentation algorithm designed in this paper.
Results: The experiments show that the Chinese Segmentation Algorithm studied in this paper improves the accuracy of medical Q&A and can improve the execution efficiency of medical Q&A.
Conclusion: Based on the calculation and matching process of the same similar answers, different word-splitting methods directly affect the effect of medical Q&A in the later stage. The better the effect of segmentation, the higher the accuracy of the correct answers in medical Q&A. The improved LSTM-CRF split word accuracy designed in this paper achieves a good split word effect in the training process. Compared with the HMM segmentation algorithm, which has the best segmentation performance among the other three algorithms, the segmentation accuracy is improved, and the accuracy of the Q&A that delivers the correct answers is relatively high. Despite the improved accuracy in segmenting the medical dataset, the time complexity did not decrease much. The LSTMCRF combined network segmentation algorithm designed in this paper performs better in medical Q&A compared to other commonly used segmentation algorithms in terms of subject operating characteristics and larger regions surrounded by coordinate axes.