在 MySQL 表中查找重复记录

本教程探讨如何分别使用 SELECT DISTINCTSELECT COUNT 语句检查 MySQL 表中的重复项并检索重复项的数量。

使用 SELECT DISTINCT 语句检查 MySQL 表中的重复项

数据库表中有重复记录的影响可能从轻微的不便到灾难。幸运的是,MySQL 有一些漂亮的关键字可以组合起来扫描表中的重复项。

此外,我们可以计算重复记录的出现次数,并在必要时将其删除。

让我们使用名为 employee_details 的示例表创建一个 test_company 数据库。

CREATEDATABASEtest_company;USEtest_company;CREATETABLEemployees_details(idINTAUTO_INCREMENT,nameVARCHAR(255)NOTNULL,titleVARCHAR(255)NOTNULL,salaryINT,PRIMARYKEY(id));

该表填充了值,包括重复值。

INSERTINTOemployees_details(name,title,salary)Values('James Maddison','Computer Engineer',80000),('Matthew Defoe','Software Architect',150000),('Daniel Jameson','Software Engineer II',95000),('Jules Reddington','Senior Software Engineer',120000),('Carlos Rodriguez','Data Engineer',100000),('Matthew Defoe','Software Architect',150000),('Daniel Jameson','Software Engineer II',95000),('Jules Reddington','Senior Software Engineer',120000);SELECT*FROMemployees_details;

输出:

+----+------------------+--------------------------+--------+
| id | name             | title                    | salary |
+----+------------------+--------------------------+--------+
|  1 | James Maddison   | Computer Engineer        |  80000 |
|  2 | Matthew Defoe    | Software Architect       | 150000 |
|  3 | Daniel Jameson   | Software Engineer II     |  95000 |
|  4 | Jules Reddington | Senior Software Engineer | 120000 |
|  5 | Carlos Rodriguez | Data Engineer            | 100000 |
|  6 | Matthew Defoe    | Software Architect       | 150000 |
|  7 | Daniel Jameson   | Software Engineer II     |  95000 |
|  8 | Jules Reddington | Senior Software Engineer | 120000 |
+----+------------------+--------------------------+--------+
8 rows in set (0.00 sec)

此表中有三个重复项,由于表的大小较小,因此很容易发现。我们对较大的表使用 SELECT DISTINCT 语句从表中检索唯一记录。

根据官方文档,SELECT DISTINCT 语句仅检索输出记录的一个实例,即使它出现多次。

因此,当 SELECT DISTINCT 语句返回的记录数小于表中的总记录数时,我们可以确定存在重复。

-- Retrieving only distinct records.
SELECTDISTINCTname,title,salaryFROMemployees_details;

输出:

+------------------+--------------------------+--------+
| name             | title                    | salary |
+------------------+--------------------------+--------+
| James Maddison   | Computer Engineer        |  80000 |
| Matthew Defoe    | Software Architect       | 150000 |
| Daniel Jameson   | Software Engineer II     |  95000 |
| Jules Reddington | Senior Software Engineer | 120000 |
| Carlos Rodriguez | Data Engineer            | 100000 |
+------------------+--------------------------+--------+
5 rows in set (0.00 sec)

请注意,查询中不包含 id 列,因为 MySQL 将 id 列标识为唯一记录。

因此,在查询中包含 id 列将返回所有记录(包括重复项)作为唯一记录。

从结果集中,我们可以推断出有三 (3) 条重复记录,因为查询返回五 (5) 条不同的记录并且表中有八 (8) 条记录。

使用 SELECT COUNT 语句查找 MySQL 表中重复出现的次数

现在,在检测到重复项的存在后,我们可以使用 SELECT COUNT 语句来查找重复项的出现次数。

SELECTnameAS'employee name',COUNT(*)ASOccurrenceFROMemployees_detailsGROUPBYnameHAVINGOccurrence>1;

输出:

+------------------+------------+
| employee name    | Occurrence |
+------------------+------------+
| Daniel Jameson   |          2 |
| Jules Reddington |          2 |
| Matthew Defoe    |          2 |
+------------------+------------+
3 rows in set (0.001 sec)

这将检索重复的记录和表中每条记录的重复数。正如预期的那样,有三 (3) 条重复记录。

SELECT COUNT 语句的使用将通过此官方参考进一步讨论。

使用 INNER JOIN 语句查看 MySQL 表中的重复记录

我们可以在目标表和 SELECT DISTINCT 查询之间使用 INNER JOIN 查询来查看主记录旁边的重复记录。

SELECTI.id,O.name,O.title,O.salaryFROMemployees_detailsASIINNERJOIN(SELECTDISTINCTname,title,salaryFROMemployees_details)ASOONI.name=O.name;

输出:

+----+------------------+--------------------------+--------+
| id | name             | title                    | salary |
+----+------------------+--------------------------+--------+
|  1 | James Maddison   | Computer Engineer        |  80000 |
|  2 | Matthew Defoe    | Software Architect       | 150000 |
|  6 | Matthew Defoe    | Software Architect       | 150000 |
|  3 | Daniel Jameson   | Software Engineer II     |  95000 |
|  7 | Daniel Jameson   | Software Engineer II     |  95000 |
|  4 | Jules Reddington | Senior Software Engineer | 120000 |
|  8 | Jules Reddington | Senior Software Engineer | 120000 |
|  5 | Carlos Rodriguez | Data Engineer            | 100000 |
+----+------------------+--------------------------+--------+
8 rows in set (0.001 sec)