Articles, Cognitive Intelligence, Data Science, Webinars

Practical Data Augmentation Techniques for Predictive Models

When looking at analyzing propensity for a customer to buy a product or inclination for a customer to respond to a campaign and related use cases, there are four kinds of information that are important to assemble to create an excellent predictive model.

  1. Who are my buyers? This typically is Demographics data with information such as customer age, gender, etc.
  2. Why do they buy? This usually is Psychographics data with information such as if they enjoy traveling, outdoor activities, etc.
  3. What did they buy? This typically is Transaction data which is captured in the Commerce system.
  4. How did they buy? This usually is Behavioral data which is obtained in the Commerce/log analysis system.

Demographic data shows who your customers are. Demographics show you who your audience is in terms of age, sex, education, occupation, income, marital status, job, religion, family size, etc. A simple way to think about this would be a Male, aged 35 to 45, Married, with children with a household income of $75k plus and having issues with weight.


Psychographic data helps you understand why they purchase. Psychographic information might be your buyer’s habits, hobbies, spending habits, beliefs, personality characteristics, attitudes, and lifestyle and values. The fundamental goal to add this information is to understand how our product or the service or the brand fits into the customer lives. Their everyday values, the challenges they face every day, their hesitation & their attitudes typically impact the buying behavior. A simple way to think about this is a user who has an active lifestyle, cares about his appearance, eats healthy, favors quality over price, likes a balance of work and personal life and enjoys hanging out with friends.

Let’s look at an example of a retail company who are looking to promote its new high-end trail walking shoe. From the demographics, people this company would probably target are people in the age group 28 to 35 with an average annual income of $75,000. Just this information does not mean much as many of the folks in that demographics may want to sit at home and watch TV and not care about walking in a trail.

This is where psychographics information helps. Psychographics may communicate the knowledge that your buyer has a stable career, enjoys traveling, frequents gyms in the weekdays, a big football fan and hangs out in a sports bar in the weekend.

With a combination of demographics and psychographics, you can get a clearer picture of who buys your product and why they buy. The buying decision is influenced by one’s beliefs, principles, and attitudes. This is one of the primary reasons why it’s important to augment your data with demographics and psychographic profile.

Sourcing demographics data

There are multiple ways of sourcing demographics data. It could be population demographics or personal demographics. Population demographics is all about taking the ZIP code the customer is from and getting the appropriate information about the people in the ZIP codes as a proxy for the customer demographics. This is a decent approximation as compared to not having any demographics data at all. Such data can be got from census data or can be bought from vendors like ESRI.

On the other hand, one can also buy specific profiles about people by buying individual data from many third-party providers like Experian and other such sources. The good thing about this data is it is personalized but does have challenges in uniquely mapping the customer with their corresponding data.

Sourcing psychographics data

This kind of data is always a challenge to procure. This data could be from agencies that do research. In the recent past, most of this data comes from social media sites like Facebook. Customers liking specific brands, specific celebrities, specific Sports and other specific Pages have a special meaning. Depending on the pages they visit or like one can characterize them as people who like to travel, like pop music, loves nightlife and likes sci-fi movies, etc.

Simple Segmentation examples

Armed with this kind of information, retailers can better anticipate and serve their customer needs better. I will write a separate article on this area but wanted to give a summary here. For example, they can

  1. Segment (KMeans Cluster), their customers by demographics and psychographic profile. Analyze each segment to understand, what they want and how they want it. This helps them better serve and target their segment based on their unique needs.
  2. Using this they can further understand what other products they would be interested in. Should I be opening a new store? How far would the customer be willing to travel to get to my store?
  3. Further, personalize the product & communication characteristics based on the customer’s segment characteristics. This could be the product color, size, picture that goes with the product, best time to call, best mode to call, etc. etc.

In summary, predictive models benefit from aggregating additional information about the user that genuinely captures who the customers are and why they want to buy. Such information tends to improve the accuracy of the model and helps the retailer personalize their services to their end customer. Remember a model is as good as what was fed as the input to learn from. Better the information that captures the buyers intent, the better would be the model.