Knowledge Management as a Keystone in your Data Science & Analytics Strategy
Knowledge Management in Analytics
In 2021, I was asked to prepare a roadmap a company just beginning along their Data Science journey. Knowledge management would be one of the keys to a healthy data science and analytics strategy for 2021 and beyond. Knowledge management (as defined for the context of this article) is the process and tools necessary to capture, disseminate, and present information generated throughout the organization—whether that be lessons learned, best practices, locations of data, project management information, tickets, and a whole host of other artifacts.
Documentation, more broadly knowledge management, is not a sexy topic. But, it can’t be overstated the unbridled frustration that occurs when analysts, data scientists, and machine learning engineers spend hours looking for data in random databases and obscure tables. it’s just infuriating. Most Data Scientists might acknowledge the importance of developing a robust knowledge management solution but never talk specifically about how they might deploy such a project within their organisation. Typically, there are a few reasons why most companies get knowledge management wrong with regards to analytics:
Return on investment is not immediately apparent
It is easy to borrow against future resources and the consequences might be perceived to be low
Documentation is boring
Difficult to maintain over time
The losses can be huge over time. Consider a single Data Scientist who makes $125,000 per year—roughly $67.00 USD per hour. Without a robust knowledge management solution, your analytics organization could be spending dozens of hours per week looking for data strewn about the business, struggling to generate queries, searching for data dictionaries, trying to figure out the transformations applied to a dataset, etc. It is easy to see that tens of thousands of dollars can be lost per year by simply not having tools to efficiently to their work.
Gaps (or complete lack thereof) in documentation is an exponential problem, by the time you notice there is an issue, the gradient has exploded! Therefore, companies who start off on the right foot (from the start) will have an easier time maintaining and reaping the benefits of knowledge management. All that said, starting is the next best step.
Where do you start?
It is probably easiest, and most useful, to begin with an entity reference (ER) diagram. These artifacts are likely the most important pieces of documentation that can be made available to data professionals. ER diagrams can come in all sorts of shapes, sizes, and complexities. However, the nature of these artifacts remains the same: these are the dictionaries that can help determine the location of data and how that data relates to other data objects/sources. There are tons of templates out there to model your work off of, but I have liked to use draw.io. The software is free, no license required, and simple to use—much like Visio or other flowcharting software.
I like to treat ER Diagrams like catalogues. They should be structured in such a way that allows the user to pose a question or search with a subject matter in mind. For example, a data professional in Marketing might want to look for data related to ‘marketing’. Therefore, maybe your ‘data catalogue’ starts simply like this:
Each will have different structures, names, themes, etc but, overall, for ease of use I find this the best way to help someone find data related to a concept. One might even draw a corollary to a graph structure. Next, move to a ‘source’ or maybe even a ‘location’ of data.
The second layer becomes extremely important because it is a catalyst for understanding exactly where data of this type is stored and accessed. It is important to be somewhat verbose in this layer of the graph. It should be quite clear the location of the data in question. Finally, but not necessarily so, I like to expand the graphs to the tables where the data exists.
The outer parts of the graph is where the complexity can become cumbersome (but it is worth it). This is the basic structure I tend to follow when working on a project such as an ER Diagram. The version I like to use is not textbook quality, but it is a framework I have adapted over time serving in many different roles and companies.
A Data Dictionary is Nice: But now what?
After creating your data dictionary tool, some might be done! There will be situations out there where is no need to continue. However, in some instances the journey might continue on to other knowledge management tools.:
Develop a knowledge center in a tool like Sharepoint or Confluence: find a way to consolidate all of the artifacts related to analytics on a single platform (think, ‘one-stop shopping’)
Start a series of training or podcasts that encourage data literacy throughout your company. Generate and disseminate knowledge in ways that enable you, as a data professional, to be more effective
Find time to communicate to the rest of your organization current projects, status, and requests for feedback
When does it all End?
Well….the job isn’t ever done! But, there is a point where maintenance is not as terrible. Largely, the end game will be determined according to each unique situation. Take steps to spread the work out amongst a few different team members, if possible. Another idea might be to set a rotation where a day or two per month is devoted solely to collecting knowledge, documenting that knowledge, and writing a brief summary that let’s other teams know there have been updates.
What is the value in the end?
The easy part about writing this article is that there is little to be debated. Organizations that collect, store, disseminate, and maintain their knowledge can have a competitive edge in creating a sustainable business. Because of the growth of data science, business intelligence, and analytics within modern companies, it only makes more sense to better organize information generated by a burgeoning profit center within contemporary institutions.
Specifically, in the context of the analytics operations there are some key value propositions:
Efficiency in development of queries, models, data models, algorithms, can help reduce go-to-market time—thereby potentially increasing return on your investment
Your analysts, scientists, and engineers will be less likely to be frustrated with finding important data
Better baseline future projects/initiatives by having a quick reference on what went wrong and right on past developments