Information extraction with Python


jiawei chen /英語


machine learning, natural language processing, data analysis


This talk will present a named entity recognition (NER) system for extracting attributes and values, like person, company, place or time, from various of text data. I will introduce how to combine several python tools to build this system. First, use a python written annotation tool BRAT to create a custom annotated corpus. Second, use python to link CRFsuite, training a Conditional Random Fields model to labeling our list of text data, the labeling result will be further analyzed by pandas and scikit-learn.


A search engineer, usually like to study machine learning and natural language processing.


search engineer