Building a NoSQL E-Commerce Data Model

170220221645089578.png

E-commerce is capturing an ever increasing market share of shoppers and, in addition to the requirements of managing a physical store like tracking inventory, e-commerce storefronts have additional challenges such as getting the product to the customer, using a payment processor, and building relationships with customers.

All of these tasks require specialized infrastructure, which in turn interact with your database. A lot of your storefront’s efficiency and effectiveness rests on the decisions you make about structuring and storing data. In this article, we’ll explore the pros and cons of a NoSQL database for your e-commerce application and use the most popular document store option, MongoDB, in our examples.

Note: This data modeling exercise is a response to the Hacker News comments about a previous post we published about building a scalable e-commerce data model.

 

NoSQL Databases and MongoDB

MongoDB supports ACID (atomicity, consistency, isolation, and durability) transactions, and its cloud offering is PCI DSS (Payment Card Industry Data Security Standard) compliant. It’s one of the best NoSQL options for e-commerce.

MongoDB also saves documents in a JSON-like format, which means it’s very simple to convert queries and results into a format your frontend code understands. This removes the need to write object-relation mappers (ORMs). Further, MongoDB scales extremely well; it’s designed to be distributed across multiple machines. NoSQL databases don’t come with enforced schemas. Introducing new fields is fast because the database doesn’t need to be told you’re going to change the format or even the existence of a field. It simply saves what you tell it, no fuss.

While this is great for growing businesses where products and checkout workflows evolve rapidly, it does require lots of care to ensure you don’t create data that will confuse future developers and application versions.

Before starting to model your system, be aware that MongoDB doesn’t start with access control. Don’t let this default catch you by surprise. Enable access control before you write real data! PCI compliance requires that some cardholder information is encrypted at rest and in motion. You can enable Transport Layer Security/Secure Sockets Layer (TLS/SSL) encryption to secure network traffic.

We’ll cover at-rest encryption later in the article, but before going live with any site using MongoDB, go over the security checklist to make sure you’re up to date on best practices.

 

Data Modeling a Digital Storefront with NoSQL

There’s an almost endless array of features you can build out for an online store, but for the purposes of this article, I’ll focus on modeling just a few: your product catalog, user information, payments, and orders.

Product Catalog

In order to sell things, you need to record what you have so clients can browse before commiting to a purchase. The bare essentials for a product document are the price and a unique identifier like a sku. It’s also common to include a description, the quantity you have available, and an image file.

All examples are run from in the mongo shell and should be adopted to a driver for your application.

db.inventory.insertOne( { 
    item: "journal", 
    price: 9.99, 
    qty: 25, 
    size: { h: 14, l: 21, w: 1 }, 
    features: "Beautiful, handmade journal.",
    categories: ["writing", "bestseller"],
    image: "items/journal.jpg"  
} )

This will return a verification message like this:

{
    "acknowledged" : true,
    "insertedId" : ObjectId("600e814359ba901629a14e13")
}

If you don’t supply MongoDB with an _id field, it creates one using an ObjectId. The _id field is a unique, immutable, indexed identifier. To make your example more lifelike and show the flexibility of MongoDB, use the updateOne command and the returned _id to accommodate two related products with slightly different features, and throw in a promotional offer.

db.inventory.updateOne( 
    { _id: ObjectId("600e814359ba901629a14e13") }, 
    {
        $unset: {image: 1, size: 1, qty: 1, price: 1},
        $set : { 
            item: "journal", 
            features: ["Inner pocket", "Durable cover"],
            skus: [ { 
                sku: "154A",
                price: {
                    base: NumberDecimal(9.99),
                    currency: "USD"
                },
                quantity: 20,
                options: {
                    size: { h: 14, l: 21, w: 1 },
                    features: ["72 sheets of premium lined paper"],
                    colors: ["brown", "red"],
                    ruling: "wide",
                    image: "images/journal1.jpg"
                }
            }, {
                sku: "154B",
                price: {
                    base: NumberDecimal(14.99),
                    currency: "USD",
                    discount: NumberDecimal(4.00)
                },
                quantity: 15,
                options: { 
                    size: { h: 18, l: 22, w: 2 },
                    features: ["140 sheets of premium paper"],
                    colors: ["brown"],
                    ruling: "unlined",
                    image: "images/journals.jpg"
                }
            } ]
        }
    }
)

Notice that we changed the structure of the features field from a string to an array. This highlights the powerful (and dangerous) flexibility of NoSQL databases—you can radically change how data is formatted without a peep from the database. In that regard, command line SQL is scary, but command line MongoDB is terrifying.

We also settled on a format for monetary data. Choose one and stick to it throughout your application.

You can get a head start on product modeling by scouting the data required by major providers like Google Marketplace and Amazon Marketplace. You can also check out Fabric’s product information management (PIM) system to learn how an e-commerce platform like Fabric handles product modeling.

Query Design Practices

The goal with MongoDB is to structure your schema so your application returns all the data it needs in a single read query. Adding complexity in MongoDB often involves adding embeddable documents to a record. Indexing is important for efficiently scanning for information. Without an index, MongoDB must perform a collection scan, examining every document in a collection before returning the result. Index the fields you’ll use to regularly search for products.

 

Users

Once you have an idea of what products you want to offer, think about how to model your users. You want to store information that will allow you to build an ongoing relationship with a customer, such as being able to display items relevant to them. If you’re selling physical products, every customer must have contact information.

Here’s a simple user profile for a store that needs to ship a physical product. Because you want to enforce unique email addresses for users and use that field to look them up, you’ll index that field and make it unique. You could accomplish this with a command like the following:

db.customers.createIndex(
  { email: 1 },
  { unique: true } 
)

But this example will use the built in _id field instead. Remember that the _id field is indexed, unique, and immutable. If a user changes their email, you’ll want to revalidate it. In this case, immutability is an asset; if a user wants to update their email address, MongoDB will demand a new collection entry, and you can be sure that your emailVerified field gets reset to false.

db.customers.insertOne( { 
    _id: "journalfanatic@e/mail.com",
    fname: "Journal", 
    lname: "Fanatic",
    hashedAndSaltedPassword: "$1$O3JMY.Tw$AdLnLjQ/5jXF9.MTp3gHv/",
    emailVerified: false,
    address: {
        country: "United States",
        street1: "99 Main Street",
        street2: "Apt #3",
        city: "Boston",
        state: "MA",
        zip: "74586" 
        }
} )

Customer profiles are a tricky part of e-commerce because they represent people. Unlike products, people are out of your control and constantly changing. They change addresses, want packages delivered to work, and forget their passwords.

This is when you have to start thinking about protecting personally identifiable information (PII) and start working toward Payment card industry (PCI) compliance. Becoming PCI compliant is a big undertaking and often one of the compelling cases for using a third-party provider like Fabric that is already compliant.

Let’s update our simple example to start handling these issues. First you’ll update the zip code, then add a new delivery address and a date modified field.

db.customers.updateOne( 
    { _id: "journalfanatic@e/mail.com" },
    { $set: { 
        "address.zip": "60601",
        shippingAddress: { 
            street1: "50 Work Street",
            street2: "Floor 16",
            city: "Chicago",
            state: "IL",
            zip: "60601" } },
     $setOnInsert: { dateModified: new Date() }
    }
)

Next, help your model accomodate a password reset request with an expiration for thirty minutes from now (and update the dateModified field). For performance reasons, you need to be sure that embedded documents don’t infinitely expand over time. If a user resets their password a hundred times, and each time it’s added to the customer document, pulling information about that customer for a simple action like a login will be very resource intensive. It’s better to separate out events that can be repeated many times, like orders, site visits, and password resets into their own collection and use document references to link other collections.

To make a reset collection:

db.reset.insertOne({ 
    customerId: "journalfanatic@e/mail.com",
    token: "493654",
    status: "requested",
    expiration: new Date(ISODate().getTime() + (1000 * 60 * 30)),
    requestTime: new Date()
} )

These tables can get much more complicated very quickly, especially once you start tracking user sessions, presenting customized content, or selling products to teams of users and want to track utilization and distribution of resources.

Validation

Validation might not be necessary for your product models, but it’s absolutely a requirement for customer-related fields. Addresses, phone numbers, emails, and credit card numbers have to be validated to ensure you’re not opening your site to malicious users and to prevent mistakes that stop customers from obtaining your products.

MongoDB offers tools for validating schema at the collection level. You’ll also want to implement application-level validation. Validating uniqueness in MongoDB is only possible by creating an index and requiring it to be unique. A customers collection already exists in our example, so use runCommand to add some validation.

db.runCommand( {
   collMod: "customers",
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "hashedAndSaltedPassword", "address" ],
      properties: {
         _id: {
            bsonType: "string",
            pattern: '.+@.+..+',
            description: "must match the email pattern"
         },
         hashedAndSaltedPassword: {
            bsonType: "string",
            minLength: 10,
            description: "must be a string ten characters long, and is required"
         },
         address: {
            bsonType: "object",
            required: [ "street1", "zip" ],
            properties: {
                zip: { 
                    pattern: "^[0-9]{5}(?:-[0-9]{4})?$",
                    description: "must match pattern and is required"
                },
                street1: { 
                    bsonType: "string",
                    description: "must be a string and is required" },
                street2: { 
                    bsonType: "string",
                    description: "must be a string" }
            },
            description: "must be an object, and is required"
         }
      }
   } },
} )

As you can see, writing validation in MongoDB is cumbersome. We only validated six fields, and the code took twice as many lines as creating the record. Libraries like Mongoose can make validation easier, while headless commerce APIs from third parties like Fabric will handle data validation for you.

No matter how you decide to implement validation, it’s vital for the long-term health of your application that you audit your data and create solid schema and application-level validation for any user-entered fields.

 

Payments

Modeling your payments is where the security of your application is most important. It’s also the first place where you really need to reference another table.

It can be tempting to include payments in the customer table by adding a nested document, but it’s not wise. Payment information has specific and extremely important security concerns; it’s simpler to audit your PCI compliance practices if payments are not mixed in with a lot of other data.

Let’s start with a naive (and dangerous!) implementation, then discuss some of the security implications that can improve the model.

db.payments.insertOne( { 
	customerId: "journalfanatic@e/mail.com",
	status: "verified",
	gateway: "stripe",
	type: "credit",
	amount: NumberDecimal(1.00),
	card: {
		brand: "Visa",
		pan: "4242424242424242",
		expirationMonth: 1,
		expirationYear: 2090,
		cvv: 123
	}
} )

This is a starting point, but e-commerce experts will panic seeing this schema. All the information necessary to use this card in your database in plaintext, which might be convenient but creates huge legal liabilities for your business. The PCI DSS is very clear, don’t store cardholder data unless it’s necessary, and never store authentication data like the verification code.

Encryption

MongoDB offers enterprise clients the ability to encrypt data files and perform automatic client-side encryption. Any user is able to use manual client-side encryption to encrypt individual fields, then set schema validations to enforce the encryption. Encrypted fields are stored as binary data (subtype 6).

With manual encryption, the driver for your chosen language handles applying an appropriate encryption key, as well as the encryption and decryption operations.

Following the PCI DSS guidelines, you don’t have to encrypt your database fields if you’re only storing the last four digits of the card. But be aware that any time you’re bundling the Primary Account Number (PAN) with other personally identifiable information, whether at rest or in motion, all that data must be encrypted. Let’s try again.

db.payments.remove( { } )
db.payments.insertOne( { 
    customerId: "journalfanatic@e/mail.com",
    status: "verified",
    gateway: "stripe",
    type: "credit",
    amount: NumberDecimal(1.00),
    card: {
        brand: "Visa",
        panLastFour: "4242",
        expirationMonth: 1,
        expirationYear: 2090,
        cvvVerified: true
    }
} )

Now you’ve removed the authentication data and replaced it with a Boolean, and truncated the card number so you’re not required to encrypt other information like the expiration date. In this example, you’ve handled one type of payment option (credit cards), but there are many others, each with their own security requirements and data models. Research what you’ll need, and the best practices for handling and storing that information.

Tokenization

A great way to remove some of the complexity of PCI compliance is to check if your payment processor (like Stripe) offers a tokenization service. Tokenization) replaces all the information used in payment processing with a token string that your systems can use to track a payment. Tokenization is great as it offloads PCI compliance requirements to your payment processor, reducing your liability and simplifying your code.

Here’s an example of a tokenized model:

db.payments.insertOne( { 
	customerId: "journalfanatic@e/mail.com",
	status: "awaitingVerification",
	gateway: "stripe",
	type: "token",
	token: "card_1IDHBZFdjJUqVVV2gPlbz8BC"
} )

 

Orders

Orders are where the rubber meets the road. Your customer has a profile that enables you to send them something, and you have products to send them and the ability to accept payments. The order tracks your product through the process of leaving your possession and entering the customer’s hands. Orders are another data model for tracking human behavior, so they can get messy. Customers can abandon and modify orders and request refunds.

This step in the example brings together all the moving parts of your model: products are purchased by customers with payments. Our model will include things that exist in other collections, like shipping addresses and discounts.

In a SQL database, this information would exist as references to existing tables, but remember that in MongoDB you want all the relevant data to be returned in a single read command. It’s often more efficient to make a copy of relevant information and store it in the orders collection. That way, when a user checks on an order, you don’t have to read through multiple collections to display the information you want.

db.orders.insertOne( { 
	customerId: "journalfanatic@e/mail.com",
	paymentId: "600e6f37aa2232f59e273082",
    paymentStatus: "paid",
	status: "shippedAwaitingDelivery",
	currency: "USD",
	totalCost: NumberDecimal(39.85)
	items: [ 
        { sku: "154B",
            quantity: "2",
            price: NumberDecimal(14.99),
            discount: NumberDecimal(1.00),
            preTaxTotal: NumberDecimal(27.98),
            tax: NumberDecimal(1.00),
            total: NumberDecimal(28.98),
		}, { sku: "154A",
            quantity: "1",
            price: NumberDecimal(9.99),
            preTaxTotal: NumberDecimal(9.99),
            tax: NumberDecimal(.87),
            total: NumberDecimal(10.86)
        } 
    ],
	shipping:  {
        address:{
            street1: "50 Work Street",
            street2: "Floor 16",
            city: "Chicago",
            state: "IL",
            country: "USA",
            zip: "60601"
        },
        origin: {
            street1: "1 Penn Ave",
            street2: "",
            city: "New York",
            state: "NY",
            country: "USA",
            zipCode: "46281"
        },
        carrier: "USPS",
        tracking: "123412341234"
    }
} )

 

Modeling the Whole

Here are all the parts of your NoSQL e-commerce data model so far:

nosql-ecommerce-data-model

It’s an excellent start, but as I stated at the beginning of this guide, there are many facets of an e-commerce store, and we’ve barely brushed the surface. Other valid activities to model could include adding product distribution channels, handling image optimization, internationalization, additional data validation, product categorization, customer customization, subscriptions, recurring payments, taxes, and returns.

When modeling your data for e-commerce, mistakes can be costly, and customers will rarely appreciate the amount of effort that goes into creating a finely tuned database. With that in mind, you can consider offloading the management of your e-commerce database to a service like Fabric. You can make use of low-overhead API requests to store your data and let someone else keep up with industry best practices while you focus on improving your product.


Topics: Commerce
John Gramila

Tech advocate and writer @ fabric. Developer @ Tough Pixels.

Learn more about fabric