ValueError: Unable to set entity for token 27 which is included in more than one span in entities(ValueError:无法为Entities中的多个范围中包含的令牌27设置实体)
本文介绍了ValueError:无法为Entities中的多个范围中包含的令牌27设置实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试将dataset转换为.spacy,方法是先在doc中将其转换为DocBin。可以通过GoogleDocs访问整个dataset文件。
我运行以下函数:
def converter(data, outputFile):
nlp = spacy.blank("en") # load a new spacy model
doc_bin = DocBin() # create a DocBin object
for text, annot in tqdm(data): # data in previous format
doc = nlp.make_doc(text) # create doc object from text
ents = []
for start, end, label in annot["entities"]: # add character indexes
# supported modes: strict, contract, expand
span = doc.char_span(start, end, label=label, alignment_mode="strict")
# to avoid having the traceback;
# TypeError: object of type 'NoneType' has no len()
if span is None:
pass
else:
ents.append(span)
doc.ents = ents # label the text with the ents
doc_bin.add(doc)
doc_bin.to_disk(f"./{outputFile}.spacy") # save the docbin object
return f"Processed {len(doc_bin)}"
在dataset上运行函数后,我获得了回溯:
ValueError: [E1010] Unable to set entity information for token 27 which is included in more than one span in entities, blocked, missing or outside.
仔细查看dataset文件以查找引发此回溯的text后,我发现了以下内容:
[('HereLongText..(abstract)',
{'entities': [('0', '27', 'SpecificDisease'),
('80', '93', 'SpecificDisease'),
('260', '278', 'SpecificDisease'),
('615', '628', 'SpecificDisease'),
('673', '691', 'SpecificDisease'),
('754', '772', 'SpecificDisease')]})]
我不知道如何解决此问题。
推荐答案
我认为这应该会清楚地说明您的问题。以下是具有相同错误的代码的略微修改版本。
import spacy
from spacy.tokens import DocBin
from tqdm import tqdm
def converter(data, outputFile):
nlp = spacy.blank("en") # load a new spacy model
doc_bin = DocBin() # create a DocBin object
for text, annot in tqdm(data): # data in previous format
doc = nlp.make_doc(text) # create doc object from text
ents = []
for start, end, label in annot["entities"]: # add character indexes
# supported modes: strict, contract, expand
span = doc.char_span(start, end, label=label, alignment_mode="strict")
# to avoid having the traceback;
# TypeError: object of type 'NoneType' has no len()
if span is None:
pass
else:
ents.append(span)
doc.ents = ents # label the text with the ents
doc_bin.add(doc)
doc_bin.to_disk(f"./{outputFile}.spacy") # save the docbin object
return f"Processed {len(doc_bin)}"
data = [("I like cheese",
{"entities": [
(0, 1, "Sample"),
(0, 1, "Sample"), # Same thing twice
]})]
converter(data, "out.txt")
请注意,在这些示例中,完全相同的跨度有两个注释。如果删除其中一个批注,则不会出现错误。
您可能收到错误,因为您的批注重叠且不可用。
这篇关于ValueError:无法为Entities中的多个范围中包含的令牌27设置实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
编程基础网
本文标题为:ValueError:无法为Entities中的多个范围中包含的令牌27设置实体
基础教程推荐
猜你喜欢
- Discord.py 缺少必需的参数 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
