关键词:乳腺腺体;抽样;算法;分类;诊断(医学)
摘 要:Currently, no automated means of detecting abnormal mammograms exist. While knowledge discovery capabilities through data mining and data analytics tools are widespread in many industries, the healthcare industry as a whole lags far behind. Providers are only just beginning to recognize the value of data mining as a tool to analyze patient care and clinical outcomes. The research conducted by the authors investigates the use of genetic algorithms for classification of unstructured mammography reports, which can be later correlated to the images for extraction and testing. In mammography, much effort has been expended to characterize findings in the radiology reports. Various computer-assisted technologies have been developed to assist radiologists in detecting cancer; however, the algorithms still lack high degrees of sensitivity and specificity, and must undergo machine learning against a training set with known pathologies in order to further refine the algorithms with higher validity of truth. In a large database of reports and corresponding images, automated tools are needed just to determine which data to include in the training set. Validation of these data is another issue. Radiologists disagree with each other over the characteristics and features of what constitutes a normal mammogram and the terminology to use in the associated radiology report. Abnormal reports follow the lexicon established by the American College of radiology Breast Imaging Reporting and Data System (Bi-RADS), but even within these reports, there is a high degree of text variability and interpretation of semantics. The focus has been on classifying abnormal or suspicious reports, but even this process needs further layers of clustering and gradation, so that individual lesions can be more effectively classified. The tools that are needed will not only help further identify problem areas but also support risk assessment and other knowledge discovery applications.