196. Delete Duplicate Emails

2024年7月11日星期四

196. Delete Duplicate Emails

表格: Person

+-------------+---------+

| 列名 | 類型 |

+-------------+---------+

| id | int |

| email | varchar |

+-------------+---------+

id 是這個表的主鍵（具有唯一值的列）。這個表的每一行包含一個電子郵件。電子郵件不會包含大寫字母。

編寫一個解決方案來刪除所有重複的電子郵件，只保留具有最小 id 的唯一電子郵件。

對於 SQL 用戶，請注意你應該編寫一個 DELETE 語句，而不是 SELECT 語句。

對於 Pandas 用戶，請注意你應該就地修改 Person 表。

運行你的腳本後，顯示的答案是 Person 表。驅動程序將首先編譯並運行你的代碼，然後顯示 Person 表。最終 Person 表的順序無關緊要。

範例：

輸入：

Person 表:

+----+------------------+

| id | email |

+----+------------------+

| 1 | john@example.com |

| 2 | bob@example.com |

| 3 | john@example.com |

+----+------------------+

輸出：

+----+------------------+

| id | email |

+----+------------------+

| 1 | john@example.com |

| 2 | bob@example.com |

+----+------------------+

解釋：john@example.com 重複了兩次。我們保留 id = 1 的行。

MySQL

# Write your MySQL query statement below
DELETE p1
FROM Person p1
JOIN Person p2
ON p1.email = p2.email AND p1.id > p2.id;

1011ms

Pandas

import pandas as pd

def delete_duplicate_emails(person: pd.DataFrame) -> None:
    # Sort the DataFrame by 'id' to ensure the smallest 'id' for each email comes first
    person.sort_values(by='id', inplace=True)
    
    # Drop duplicates based on the 'email' column, keeping the first occurrence
    person.drop_duplicates(subset='email', keep='first', inplace=True)

70.15MB, 454ms

PostgreSQL

-- Write your PostgreSQL query statement below

DELETE FROM Person
WHERE id NOT IN (
    SELECT MIN(id)
    FROM Person
    GROUP BY email
);

0.00MB, 290ms

網站頁籤

2024年7月11日 星期四