Data is fundamental to starting data-driven activities. Valuable data is hard to access in many different production systems. The critical task is to determine which data is relevant enough to collect. At this point, the number of data sources is limited. You often retrieve the data from your own systems. It can be challenging to retrieve data that resides in functional silo's. Because you are reporting over historical performance, the timing is not a big challenge. Data is often transported in a daily job to a primitive data warehouse. The quality of the data should be kept in mind but is not a huge concern at this point, because the sources are trusted and the variety of the data limited. There are no real big data challenges that must be faced.
Determine which data needs to be visible.
Source and report critical data.
The focus in the Reporting stage was on making data insightful for users by making it available in reports and operational dashboards. In the Analyzing stage, you go deeper into the data to find insight not visible from the surface. Where you previously collected data already available, you now might want to look at creating additional sources of data. You will start to collect data that is not only relevant for reports, but also for further analysis. Collecting more data, and from different sources, brings its own challenges. Data quality might become an issue if the sources are too unreliable. Monitor data quality and assign responsibility for certain domains to ensure it never drops below a minimum. Security and privacy should be a great concern and a priority. User data should be anonymized and secured.
Determine which data is relevant for deeper analysis.
Collect and analyze relevant data.
Monitor and ensure data quality.
Define a strategy for keeping data secure and private.
Large amounts of data are being collected and analyzed. It is now time to bring it to a true big data scale to feed the data-driven models you are creating. Collecting user behavior on websites provides massive amounts of raw data. The key is refining this to user \textit{intent}. What does it mean when someone lingers on this product page? Is he doubting the price or the product model? If you find the intent of a user, you can act on it by personalizing their experience.
Collect data like user behavior on a bigger scale.
Use customer data to personalize the experience.
Focus on data quality.
Data is already being collected on a massive scale. You probably collect everything there is to collect about your customers. Keep looking out for new opportunities to learn more. Third-party data is another opportunity to get access to valuable data. External companies, specialized in data collection, can sell you market data that complements your own data. Third-party data is often expensive, so make sure it is going to be worthwhile. Data governance is becoming very important if you are starting to grant users, across the organization, access to data. You want employees to have access to all the relevant data, but you also have security concerns. With many access points and a large data volume a clear governance strategy is necessary.
Look for new data opportunities, including from third-party sources
Create a clear data governance strategy that scales
Data is one of the most valuable assets, if not the most valuable asset in your possession. You also have the capabilities to capitalize on those assets. Because data is so valuable, you are always looking for new way data sources. Companies like Google, that make money through advertising, go to great lengths to build a better profile about their users. Google invests in free software, such as Google Chrome and Android, partially to collect more information that can be used in an advertising profile. Unstructured data is also retrieved from voice and images. Image recognition is already used to make images interpretable to computers.
Continue to look new data sources from new products
Leverage alternative unstructured data sources, such as voice and images