skip to Main Content

I’m trying to design my first complex object-oriented PHP application and I’m confused about how to use objects to represent data from my database. More specifically I’m confused about how to combine multiple objects without making a huge number of database queries.

A simple example: let’s say I have two database tables:

products: - product_id
          - manufacturer_id
          - name
          - price

manufacturers: - manufacturer_id
               - name
               - factory_address
               - factory_zipcode
               - factory_country
               - // [50 additional columns]

I represent these using two classes:

class Product {}
class Manufacturer {}

Now let’s say I want to create a page which lists 1000 products.

I create an array of 1000 Product objects using a ProductFactory class or similar, something like this:

class ProductFactory {
    function allProducts() {
        $results = db_query("SELECT * FROM products");
        foreach ($results as $result) {
            $product_objects[] = new Product($result);
        }
        return $product_objects;
    }
}

That’s fine. That’s only a single database query and now I have all the product data – great!

But what if I want to create a slightly different page which also displays 1000 products, but also includes data from the manufacturers table?

Sure, I could change my query to use a JOIN to return columns from the manufacturer table as part of my product query, but that doesn’t seem very flexible to me, because:

  1. I might not always need manufacturer data, so loading it in the ProductFactory seems wasteful if I only use it on a single page and nowhere else.
  2. The manufacturer data might be very extensive, and I don’t want 50 extra columns bloating my query result when all I need is the manufacturer’s name.
  3. If I fetched product and manufacturer data in a single query I don’t see how I could elegantly create two separate objects (Product and Manufacturer) from the query results without it being super messy.

I guess I could add a second method to my ProductFactory class, something like:

function allProductsWithManufacturerData() {
    // SELECT * FROM products JOIN manufacturers...
}

But that seems hacky and inflexible. People using my code would not know that they need to specifically call this method if they want to include manufacturer data in the Product objects.

Alternatively I could load the manufacturer data only when it’s explicity requested:

echo $productObject->getManufacturer()->name;

That seems like a fairly flexible approach to me. But in my example above, that would result in 1000 extra queries (one for each product on the page) which would be horrible for performance.

So it seems like I’m being forced into a choice:

  1. Choose to fetch all the data you could ever possibly need (using a single JOIN query)
  2. Or… choose to fetch the bare minimum data and then use extra queries to fetch data as required.

Are these my only choices or am I missing something? Is there a design pattern I don’t know about that I can use to leverage more flexibility here?

2

Answers


  1. Here is how I solve this issue 🙂

    Let’s assume you already got your 1000 products, with manufacturers that only have ID (which you got by selecting from only products table).

    $manufacturers = []; 
    // I would actually use a Collection class, but that's not important here
    
    foreach ($products as $product) {
        $manufacturers[] = $product->getManufacturer();
    }
    
    $manufacturerMapper->populate($manufacturers);
    

    Then the data mapper for manufacturers makes a single select with WHERE manufacturer_id IN (....) condition and "fills in" all of the objects using the setters.

    What this relays on is the following behavior:

    $foo = new Thing;
    $bar = $foo;
    $bar->setStuff('blah');
    var_dump($foo->getStuff() === 'blah'); /// TRUE
    

    The list manufacturers that you passes to the mapper are the same manufacturers that already exist inside the product entities.

    The flaw

    This is not a perfect solution, because you have to handle "the manufacturers repeat" situation. While you are dealing with 1:1 case, it’s not all that tricky (you just need to make an "index" inside the mapper to know all entries with matching PK).

    But this does not actually scale all that well.

    The "more correct" way is to use something called Identity map that acts as something like a runtime-cache either inside your mappers or your repositories. The first time I encountered the concept was when I as reading about how to make "proper repository".

    Core of the solution is creation of this behavior:

    $first = $repo->get(45);
    $second = $repo->get(45);
    
    $first->setTitle('Lorem Ipsum');
    var_dump($second->getTitle() === 'Lorem Ipsum'); // TRUE
    

    This looks a lot like the previous bit, but the important past is that the repository is NOT returning an entity with same data, but instead is returning the same entity.

    I personally implemented this behavior at the mapper-level and not in the repositories themselves, but YMMV.

    If you are just starting with OOP, ignore this "overkill" solution and stick with the first part 🙂 This extended version requires some boilerplate code.

    Login or Signup to reply.
  2. Just change you query to select all from products and only select name from manufacturer with join

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search