체크개발자's Blog

파이썬 크롤링 - 빅데이터 시각화 본문

프로그래밍/Python

파이썬 크롤링 - 빅데이터 시각화

체크개발자 2017. 12. 16. 14:52

구글 확장 프로그램 quick javascript switcher


pip install pygame 

pip install simplejson

pip install pytagcloud


한국어 형태소 분석

pip install konlpy

https://www.lfd.uci.edu/~gohlke/pythonlibs




# pip install JPype1-0.6.2-cp36-cp36m-win_amd64.whl
# pip install pygame
# pip install simplejson
# pip install konlpy
# pip install jpype
"""
https://www.lfd.uci.edu/~gohlke/pythonlibs/#jpype
여기서 다운받아 설치하자
"""

from collections import Counter
from konlpy.tag import Twitter
import pytagcloud
f = open('blog_data.txt')
data = f.read()
nlp = Twitter()
nouns = nlp.nouns(data)
count = Counter(nouns)
tags2 = count.most_common(40)
taglist = pytagcloud.make_tags(tags2, maxsize=80)
pytagcloud.create_tag_image(taglist, 'wordcloud.jpg', size=(900, 600), fontname='Korean', rectangular=False)

f.close() 





#-*- coding:utf-8 -*-

"""
pip install lxml
pip install wordcloud
pip install pytagcloud
pip install pygame
pip install simplejson
"""
from collections import Counter
import pytagcloud
import webbrowser
"""
Counter - list타입 단어 많이넣어놓으면
Couter 로 객체를 만들면
korea 5
america 12
dic 형태로 전환

pytagcloud - 워드클라우드 차트 생성 라이브러리
webbrowser - 웹 브라우저 여는 프로그램
"""
words = list()
words.extend(['korea' for t in range(8)])
words.extend(['beautiful' for t in range(3)])
words.extend(['flower' for t in range(7)])
words.extend(['cloud' for t in range(23)])
words.extend(['rose' for t in range(8)])
words.extend(['lily' for t in range(4)])
words.extend(['apple' for t in range(5)])
words.extend(['orange' for t in range(9)])
words.extend(['rainbow' for t in range(45)])
words.extend(['rain' for t in range(14)])
words.extend(['snow' for t in range(27)])
words.extend(['puppy' for t in range(5)])
words.extend(['one' for t in range(5)])
words.extend(['lion' for t in range(5)])
words.extend(['cat' for t in range(52)])
words.extend(['tree' for t in range(51)])
words.extend(['pink' for t in range(45)])
words.extend(['brown' for t in range(35)])
words.extend(['gold' for t in range(25)])
words.extend(['silver' for t in range(15)])
words.extend(['green' for t in range(15)])
words.extend(['president' for t in range(5)])
words.extend(['moon' for t in range(25)])
words.extend(['river' for t in range(15)])
words.extend(['sun' for t in range(5)])
words.extend(['sunflower' for t in range(5)])
words.extend(['two' for t in range(19)])
words.extend(['three' for t in range(21)])
words.extend(['four' for t in range(17)])
words.extend(['five' for t in range(16)])
words.extend(['six' for t in range(9)])
words.extend(['seven' for t in range(8)])
words.extend(['nine' for t in range(54)])
words.extend(['ten' for t in range(45)])
words.extend(['june' for t in range(56)])
words.extend(['july' for t in range(57)])

print(words)
count = Counter(words)
print(count)
#위의 단어들 중에서 상위 20개만 가져다
#차트 그리겠다
tag = count.most_common(100)
taglist = pytagcloud.make_tags(tag,
maxsize=100)
print(taglist)

#차트 그리기
pytagcloud.create_tag_image(taglist,
"image1.jpg", size=(900,600),
fontname='Nobile',
rectangular=False)
webbrowser.open("image1.jpg")










WordCloud


http://pinkwink.kr/1029


'프로그래밍 > Python' 카테고리의 다른 글

tensorfloa 텐서플로우  (0) 2017.12.16
파이썬 - 엑셀 다루기  (0) 2017.12.16
파이썬 - html 파싱  (0) 2017.12.16
Beautiful Soup  (0) 2017.12.09
파이썬2  (0) 2017.11.25
Comments