python ntlk를 이용한 sent, word tokenize

2018. 11. 6. 16:41

NLTK 라이브러리를 이용한 문장, 단어 나누기.

import nltk

sent = "This is an unstructed data analysis. Prof. Geum is happly to teach this. United States"

s = nltk.tokenize.sent_tokenize(sent)

print(s)

w = nltk.tokenize.word_tokenize(sent)

print(w)

# tokenize 와 split 차이. .이 한 토큰에 같이 들어옴.

print(sent.split())

# 네임드 엔티티의 경우 단어를 잘끊어줌.

# 미국의 경우 합쳐서 한 토큰으로

PCA & PCR Python numpy code (0)	2018.11.25
NLTK를 이용한 Frequency Distributions, Conditional Frequency Distributions, Stopwords (0)	2018.11.08
𝐹-test and T-test for OLS regression boston dataset (0)	2018.10.17
비정형 데이터 모델 TF Model , TF-IDF Model(CountVectorizer, TfidfVectorizer, Word Count) (0)	2018.10.17
K-Means Clustering, Hierarchical Clustering (0)	2018.10.16

Deeppp