Case Study:
SDG&E – Data Lake
Business Need
San Diego Gas & Electric (SDGE) required a solution to collect, transform, and load large amounts of company data from multiple ERP systems, project management, construction management and resource management data. The company needed to identify “authoritative sources” of data to ensure a single source of truth while maintaining the native formats in order to eliminate data silos, reduce technical debt, enable advanced analytics, and support data management at scale. The new system would need to be built within the company’s existing technology infrastructure, integrate with existing ERP systems, and require no new licensing.
The Approach
Bayen Group met with the program team to learn about their problem and gain an in-depth understanding of how each existing system worked. Bayen Group conducted a discovery period where they were able to collaborate with over 200 stakeholders, users, and managers to learn how their current methods were falling short, namely that their existing investment was unable to scale with their needs. They spoke at length about what a new cross-platform system would look like and how it would function. They determined that, due to the vast amounts of disparate data, they would need to develop a highly sophisticated solution in order to access any type of data at any time.
The Solution
The Bayen Group team proposed the creation of a Data Lake — a centralized data repository that has the capacity to ingest, transform, and curate data for the user’s individual needs. Developing the Data Lake involved identifying the origin of all project data at the utility company and ensuring the system would allow for improved data accessibility and analysis. This included the creation of a scalable technical design that would enable ingestion of data from multiple ERP systems (Cloud-to-Cloud) while maintaining data security in transit and at rest. The Bayen Group team decided on a medallion architecture (also knowns as a “multi-hop” architecture), which is a data design pattern used to logically organize data in a lake, with the goal of incrementally and progressively improving the structure and quality of data as it flows through each layer of the architecture (from Bronze ⇒ Silver ⇒ Gold layer tables). Put simply, the data is gathered (Bronze), transformed to fit the user’s need (Silver), and delivered to the user for analysis (Gold).
Additionally, the project addressed the data variety challenge, ensuring that the Data Lake can accommodate different types and formats of data. The technical team established robust data governance practices to maintain data quality and compliance with regulations. Training for the utility company’s staff on how to use the Data Lake effectively was also crucial, as is the ongoing support and maintenance of the Data Lake infrastructure.
Throughout the project, we engaged stakeholders and kept them informed of progress and any challenges that arose. This helped in managing expectations and ensured that the Data Lake aligns with the company’s strategic goals. The final stages of the project involved thorough testing and validation of the Data Lake to confirm that it met all the requirements and was ready for deployment.
Benefits
The Data Lake serves as a foundational platform for SDGE and ensures their leadership is kept abreast of critical information regarding projects that require substantial capital. Before, the method of viewing data from more than a single data source was cumbersome and time-consuming. Each system had its own way of storing, processing, and presenting information. The creation of the Data Lake now permits users to see the big picture of large-scale projects, allowing them to make highly informed decisions on resource allocation, budgeting, time frames, and more. The Data Lake essentially allows for limitless disparate systems to communicate seamlessly, ensuring that every user is provided the same information – a single source of the truth. Having the ability to see all relevant project data, make educated decisions, and know their information is sound has saved and will continue to save SDGE immeasurable time and money.