作者:马瑞民;马民艳;王浩畅; 时间:2011-01-01 点击数:
马瑞民;马民艳;王浩畅;
1:东北石油大学计算机与信息技术学院
摘要(Abstract):
鉴于生物医学命名实体识别的多数模型使用单机器学习算法时识别效果不好,提出一种基于条件随机域(CRFs)与最大熵(Maxent)分类器融合的方法,利用基分类器之间的相关性和互补性,结合有效的特征集合,进行再学习,得到融合模型.实验表明,该模型的识别性能与单一分类器和JNLPBA专题会议相关的系统比较,取得很好成绩,F测度达到70.7%,证明该融合方法有效.
关键词(KeyWords):条件随机域;最大熵;分类器融合;特征提取;生物医学命名实体识别
Abstract:
Keywords:
基金项目(Foundation):黑龙江省自然科学基金项目(F200603)
作者(Author):马瑞民;马民艳;王浩畅;
Email:
参考文献(References):
[1]王浩畅,赵铁军.生物医学文本挖掘技术的研究与进展[J].中文信息学报,2008,22(3):89-98.
[2]Krauthammer M,Rzhetsky A,Morozov P,et al.Using BLAST for identifying gene and protein names in journal articles[J].GENE,2000,259(1):245-252.
[3]Olsson F,Er iksson G,Franzen K,et al.Notions of correctness when evaluating protein name taggers[C/OL]//Proceedings of the19 th international conference on computational linguistics.2002:765-771[2007-05-10].http://www.sics.se/~fredriko/pa-pers/coling02.pdf.
[4]Zhou Guodong,Zhang Jie,Su Jian,et al.Recognizing names in biomedical texts:a machine learning approach[J].Bioinformatics,2004,20(7):1178-1190.
[5]胡俊锋,陈浩,陈蓉,等.基于感知器的生物医学命名实体边界识别算法[J].计算机应用;2007,27(12):3026-3031.
[6]王浩畅,赵铁军.基于SVM的生物医学命名实体识别[J].哈尔滨工程大学学报,2006,27(增):570-574.
[7]L N Y F,TSA I T H,Chou W C,et al.A maximum entropy approach to biomedical named entity recognition[C/OL]//4th workshopon datamining in bioinformatics.2004:56-61[2007-05-01].http://iasl.iis.sinica.edu.tw/webpdf/paper-2004-A—Maxi-mum—Entropy—Approach—to—Biomedical—Named_Entity—Recognition.pdf.
[8]Lafferty J,Mccallum A,Pereira F.Conditional random fields:probabilistic models for segmenting and labeling sequence data.proc.ofthe 18th international conference on machine learning[C].San Francisco:2001:282-289.
[9]Tom M.机器学习[M].北京:机械工业出版社,2000:166-170.
[10]马瑞民,马民艳.基于CRFs的多策略生物医学命名实体识别[J].齐齐哈尔大学学报,2011,27(1):39-42.
[11]Yoshimasa T,Yuka T,Kim Jin-Dong,et al.Developing a robust part-of-speech tagger for biomedical text[A].Advances in Informat-ics-10th panhellenic conference on informatics[C].Japen,[s.l.]2005.
[12]Mika S R.Protein names peeled precisely off free text[J].Bioinformatics,2004,20:241-247.
[13]Finkel J,Dingare S,Nguyen H,et al.Exploiting context for biomedical entity recognition:from syntax to the web[A].Proceedingsof the joint workshop on natural language processing in biomedicine and its applications(JNLPBA-2004)[C].Geneva:Switzerland,2004.
[14]Settles B.Biomedical named entity recognition using conditional random fields and novel feature sets[A].Proceedings of the jointworkshop on natural language processing in biomedicine and its applications(JNLPBA-2004)[C].Geneva,Switzerland,2004.
[15]Song Y,Km E,Lee G G,et al.POSB DTM-NER in the shared task of BioNLP/NLPBA 2004[C]//Proceedings of the joint work-shop on natural language processing in biomedicine and its applications,2004:100-103[2007-05-01].http://isoft.postech.ac.kr/publication/iconf/bionlp04—song.pdf.
2019 版权所有©东北石油大学 | 地址:黑龙江省大庆市高新技术产业开发区学府街99号 | 邮政编码:163318
信息维护:学报 | 技术支持:现代教育技术中心
网站访问量: