doc2vec python example

Learn doc2vec python example Using Pertained doc2vec Model for Text Clustering. See how we label each document and the look of doc2vec. The doc2vec is un supervised algorithm used to generate the documents and phrases.

Example:-

From gensim.test.utlis import common_texts
Common_texts [0:3]
[[‘human’,’interface’,’computer’],[‘survey’,’user’,’computer’,’system’,’response’,’time’],
[‘eps’,’user’,’interface’,’system’]]

Example:-

From gensim import corpora
Dictionary=corpora.Dictionary (common_texts)
Print (dictionary)
Dictionary (12 unique tokens: [‘computer’,’human’,’interface’,’response’,’survey’]…)
Print (dictionary.token2id)
{‘computer’:0,’human’:1,’interface’:2,’response’:3,’survey’:4,’system’:5,’time’:6,’user’:7,’eps’:8,’trees’:9,’graph’:10,’minors’:11}
Corpus= [dictionary.doc2bow (text) for text in common_textscorpus]
[[(0, 1), (1, 1), (2, 1)],
[(0.1),(3,1),(4,1),(5,1),(6,1),(7,1)],
[(2, 1),(5,1),(7,1),(8,1)],
[(3, 1), (6, 1), (7, 1)],
[(9, 1)],
[(9, 1), (10, 1)],
[(9, 1), (10, 1), (11, 1)],
[(4, 1), (10, 1), (11, 1)]]

Example:-

From gensim, model, doc2vec import Doc2vec, TaggedDocument
Documents= [Taggeddocumnet (doc, [i]) for I, doc in enumerate (common_texts)]
Documents
[TaggedDocument (words= [‘human’,’interface’,’computer’], tags= [0]),
[TaggedDocument(words=[‘survey’,’user’,’computer’,’system’,’response’,’time’],tags=[1]),
[TaggedDocument (words= [‘eps’,’user’,’interface’,’system’], tags= [2]),
[TaggedDocument (words= [‘system’,’human’,’system’,’eps’], tags= [3]),
[TaggedDocument (words= [‘user’,’response’,’time’], tags= [4]),
[TaggedDocument (words= [‘trees’], tags= [5]),
[TaggedDocument (words= [‘graph’,’trees’], tags= [6]),
[TaggedDocument (words= [‘graph’,’minors’,’trees’], tags= [7]),
[TaggedDocument (words= [‘graph’,’minors’,’survey’], tags= [8]),]
See how we label each document and the look of doc2vec.

Example:-

Import gensim
Import gensim.downloader as api
Dataset=api.load (“text8”)
Data= [d for d in dataset]
The data for doc2vec is list of TaggedDocument.
Def create_tagged_document (list_of_words):
For I, list_of_words in enumerate (list_of_words):
Yield gensim.models.doc2vec.TaggedDocument (list_of_words, [i])
Train_data=list (create_tagged_document (data))
Print(train_data [:1])

Using Pertained doc2vec Model for Text Clustering:-

From sklearn import metrices
Import gensim.models as g
Import codecs
Model=”doc2vec”
Test_docs=”data”
Start_alpha=0.01
Infer_epoch=100
M=g.Doc2vec.load (model)
Test_docs= [x.stri ().split () for codes.opne (test_docs,”r”,”utf-8”) ()]
Print(test_docs)
X= []
x.append (m.infer_vector (d, alpha=start_alpha, steps=infer_epoch))
k=3
from sklearn.cluster import birch
bec=birch (bramching_factor=50, n_cludters=k, threshold=0.1, comoute_labels=True)
brc.fit(X)
clusters=brc.predict(X)
labels=brc.lables
print (“clusters:”)
print (cluster)
silhouette_score=metrics.silhouette_score (X, labels, metric=’euclidean’)
print (“silhouette_score :”)
print(silhouette_score)
Clusters:-
[1 0 0 1 1 2 1 0 1 1]
Silhouette_score: -
0.17644188

Additional Services : Refurbished Laptops Sales, Python Classes, Share Market Classes And SEO Freelancer in Pune, India