Embedding in pytorch(嵌入pytorch)
问题描述
我在 Stackoverflow 上查看了 PyTorch 教程和与此类似的问题.
I have checked the PyTorch tutorial and questions similar to this one on Stackoverflow.
我很困惑;pytorch 中的嵌入(Embedding)是否使相似的词更接近对彼此?我只需要给它所有的句子吗?或者它只是一个查找表,我需要对模型进行编码?
I get confused; does the embedding in pytorch (Embedding) make the similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?
推荐答案
nn.Embedding 持有一个维度为 (vocab_size, vector_size) 的 Tensor,即大小为词汇表 x 每个向量嵌入的维度,以及进行查找的方法.
nn.Embedding holds a Tensor of dimension (vocab_size, vector_size), i.e. of the size of the vocabulary x the dimension of each vector embedding, and a method that does the lookup.
当您创建嵌入层时,张量会随机初始化.只有当你训练它时,相似词之间的这种相似性才会出现.除非您使用先前训练过的模型(例如 GloVe 或 Word2Vec)覆盖了嵌入的值,但那是另一回事了.
When you create an embedding layer, the Tensor is initialised randomly. It is only when you train it when this similarity between similar words should appear. Unless you have overwritten the values of the embedding with a previously trained model, like GloVe or Word2Vec, but that's another story.
因此,一旦您定义了嵌入层,定义并编码了词汇表(即为词汇表中的每个单词分配一个唯一编号),您就可以使用 nn.Embedding 类的实例来获取相应的嵌入.
So, once you have the embedding layer defined, and the vocabulary defined and encoded (i.e. assign a unique number to each word in the vocabulary) you can use the instance of the nn.Embedding class to get the corresponding embedding.
例如:
import torch
from torch import nn
embedding = nn.Embedding(1000,128)
embedding(torch.LongTensor([3,4]))
将返回对应于词汇表中单词 3 和 4 的嵌入向量.由于尚未训练任何模型,因此它们将是随机的.
will return the embedding vectors corresponding to the word 3 and 4 in your vocabulary. As no model has been trained, they will be random.
这篇关于嵌入pytorch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:嵌入pytorch
基础教程推荐
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
