How do I link two tables, People
and Messages
, when there is no single column in the Messages
table that corresponds to the primary key of the People
table?
I have two tables: one containing email messages called Messages
and one of people called People
.
People
| person_id | email_address |
| -------- | ---------------- |
| 001 | `[email protected]` |
| 002 | `[email protected]` |
| 003 | `[email protected]` |
| 004 | `[email protected]` |
| 005 | `[email protected]` |
Messages
| msg_id | from_addr | to_addr | cc_addr | bcc_addr |
| ------ | ----------------- | ------------------------------------ | ----------------- | ----------- |
| 001 | `[email protected]` | `[email protected]`, `[email protected]` | `[email protected]` | NULL |
| 002 | `[email protected]` | `[email protected]` | NULL | NULL |
| 003 | `[email protected]` | `[email protected]` | `[email protected]` | `[email protected]` |
I will be programmatically passed a person_id
(say 005
for [email protected]
) and want to be able to query the Messages
dataset to get all "involved" emails–where a [email protected]
was sent, received, was cc’d or bcc’d in the email message. So with the dataset above, the query ought to return the rows with msg_id
001
and 003
.
I don’t know to map the relation between the People
and Messages
tables.
Say the primary key of of the People
table is person_id
. There is no single column that I can link that to in the Messages
table to act as a foreign key.
I cannot determine how to represent this data correctly.
Am I doing this wrong? What is a better way to represent these two tables with more tables structured differently?
After normalising to FNF the data can be represented as follows.
Messages_FNF
| msg_id | from_addr | to_addr | cc_addr | bcc_addr |
| ------ | ----------------- | ----------------- | ----------------- | ---------------- |
| 001 | `[email protected]` | `[email protected]` | `[email protected]` | NULL |
| 001 | `[email protected]` | `[email protected]` | `[email protected]` | NULL |
| 002 | `[email protected]` | `[email protected]` | NULL | NULL |
| 003 | `[email protected]` | `[email protected]` | `[email protected]` | `[email protected]` |
However, the primary key (that uniquely identifies each row in the table) for Messages_FNF
would have to be a composite key combining msg_id
, from_addr
, to_addr
, cc_addr
, bcc_addr
.
So how do I link the People
and Messages
tables together using Primary and Foreign keys?
Or must I further normalise or partition these tables such that the People
table and Messages_FNF
have a meaningful matching field – allowing for easy joins? If so, how?
I’ve read that the cross join lets you join tables that do not have a common field but I worry that it will result in a VERY large table with unnecessary rows and be computationally expensive.
Is linking the tables with a PK->FK the right way of structuring the data in this problem?
2
Answers
The Email_Address on People Table is same as From_Address on Messages Table, correct? If it is correct, you can simply join on these two fields. To join the tables, you don’t necessarily join on only primary key. You can join on any related fields.
To put OP out of their misunderstanding on the main point:
To "link tables together" does not need going via keys — not necessarily any key in any table. The most common linkages in a 1:many join are from Foreign key in the many side to Primary key in the 1 side. That’s why those relationships carry those names.
That Primary, Foreign key business is a result of ‘normalising’ the data model, aimed at exactly avoiding ‘update anomalies’ and making joining straightforward.
Joins going via non-key fields are sometimes call ‘cross joins’ to distinguish them — but they’re really all just joins, and you can use sql keyword
join ... on ...
.OP has now revised the q to show
Messages_FNF
. To repeat: no need to go via keys, so you could joinPeople.email_address
to each/any of the three fields holding email addresses.Whilst this design is in 1NF (under a very generous interpretation), it is hardly a solution with all that content repeated, including repeating a load of useless
NULL
s. You’ll suffer update anomalies (see the wikipage); you’ll also indeed get a blowup in numbers of rows.You need to vertically split off the columns that have repeating (or nullable) fields. Retain
from_addr
in theMessages
table (because it must have exactly one from address). Create three extra tables each with a compound key of{message_id, to_addr}
etc, in whichmessage_id
is a Foreign key to (REFERENCES
)Messages
.Now your sql can’t use
join ... on ...
because you’re looking for a match in any of the four tables, not all. You’ll need a disjunctivewhere ... = ... or ... = ... or ...
query. Exercise left for the reader: that’s a straightforward piece of sql.Addit: in response to comment
Do they already exist in
People
table? You didn’t say that in the original q. I get messages from all sorts of people I’ve never emailed before; I certainly don’t know who else might get bcc’d on messages I receive.So if you have a business rule that anybody’s email address appearing anywhere on any message is to automatically get an entry in the
People
table, yes your auxiliary tables could carrymsg_id, person_id
. You’ll still have "rows upon rows" ofperson_id
.You’ll still have to query all those
Messages
auxiliary datasets to search forperson_id
; and it still won’t be a whole Primary key of any table. (You seem to be worried about efficiency/effort: I’d guess not much saving, on modern DBMSs.)