The number of differences in a column(列中的差异数)
问题描述
我想检索一列,每行中的字母有多少差异.例如
I would like to retrieve a column of how many differences in letters in each row. For instance
如果你有一个值test"而另一行有一个值testing",那么test"和testing"之间的差异是4个字母.该列的数据将为值 4
If you have a a value "test" and another row has a value "testing ", then the differences is 4 letter between "test" and "testing ". The data of the column would be value 4
I have reflected about it and I don't know where to begin
id || value || category || differences
--------------------------------------------------
1 || test || 1 || 4
2 || testing || 1 || null
11 || candy || 2 || -3
12 || ca || 2 || null
在这个场景和上下文中,测试"和休息"没有区别.
In this scenario and context it is no difference between "Test" and "rest".
推荐答案
我认为您正在寻找的是 编辑差异,而不仅仅是计算前缀相似度,为此有一些常用算法.Levenshtein 的方法 是我以前使用过的方法,我已经看到它作为 TSQL 函数实现.this SO question 的答案建议了一些 TSQL 中的实现,您可能只是能够按原样获取和使用.
I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.
(尽管花时间测试代码并理解方法,而不是仅仅复制代码并使用它,以便在出现问题时您可以理解输出 - 否则您可能会产生一些技术债务你以后要还钱)
确切地说,您想要哪种距离计算方法取决于您想如何计算某些事物,例如,您是将替换算作一次更改还是将删除和插入算作一次,以及您的字符串是否足够长,可以这样做你想考虑子串移动等等.
Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.
这篇关于列中的差异数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:列中的差异数
基础教程推荐
- 无法解决整理冲突 2021-01-01
- SQL:使用来自具有相同列名的两个表中的数据... 2021-01-01
- 将 SQL Server DateTime 列迁移到 DateTimeOffset 2021-01-01
- 需要 MySQL 5.1 中的抽象触发器来更新审计日志 2021-01-01
- 是否可以执行按位分组功能? 2021-01-01
- SQL Server 实例在登录协商期间返回无效或不受支持的协议版本 2021-01-01
- 如何使用 mysql.connector 禁用查询缓存 2022-01-01
- SSMS 中的权限问题:“对象 'extended_properties'、数据库 'mssqlsystem_resource'、... 错误 229)上的 SELECT 权限被拒绝" 2022-01-01
- 在 SQL 中连接多个表 2021-01-01
- SQL 效率:WHERE IN 子查询 vs. JOIN 然后 GROUP 2021-01-01
