The Step by Step Flow, Challenges and solutions in the journey to Data Analytics

The Step by Step Flow, Challenges and solutions in the journey to Data Analytics

If I say Today’s Economy is a data Economy it will not be a wrong statement. In every walk of the life
there are activities, transactions generating information and storage of data. This stored data could
provide tactital reports i.e. about already that has happened which is of little use in the decision
making. Therefore the branch in data science which has gained popularity recently is Data Analytcis
as it not only proivides you insights on what happened but on the basis of certain logic predicts what
may happen or what to expect ? This is utterly critical for the businesses in the VUCA situation today
not only to survive but grow too. Therefor the Data Analytcis is extremely demanding subject from
business and would walk you through on all aspects to Data Analytics.

Data Engineering and Analytics is a complex subject, it has several compoents which involves diverse
set of technologies and processing. Pl. find below the Data Engineering activities.

There are three Steps /parts or components of end to end Data Engineering. It is a journey from
Data sources, data processing, data transformation, Data Warehouse/ Data Lake set up, Data upload
and finally getting the Data Warehouse/ Lake ready for the various analytics actions:

  1. ELT/ETL Software Engineering: This is the first step in the Data Engineering but very
    important process and Data Analytics success or failure depends upon this core step. In
    this process of software engineering Data is extracted from the various sources, it is
    transformed and loaded or some times loaded and transformed. Imagine your
    organization has multiple data sources such as Oracle, DB2, SQL, text files etc. which are
    supporting multiple applications/ products. The Data Architect will define what data,
    how much data from these sources will be extracted, transformed and uploaded into
    Data lake/ warehouses. The transformation steps could involve Data cleansing,
    formatting, removing duplicates etc. There are tools like Pentaho, Google Data Flow,
    Azure Data Factory, AWS glue which are commonly used ETL tools, whereas tools like
    Talend, Airflow, Hevo data are the commonly used ELT tools for the process.
  2. Data Engineering for Data Lakes, Warehouses: Data Architect has to design and also
    decide upon the best technology choices for the Data Warehouse or Data lakes. It is
    important to establish the data relationships, understand the data attributes,
    understand how the data will be consumed etc. etc. as an input to the DWH or Data
    lakes design. Once warehouses /lakes are ready with data loaded through ELT/ETL
    process it is ready for the Analytics layer to consume it. Snowflake, Google Data
    Warehouse tool, MS Azure Data Warehouse tool, IBM and Oracle Data Warehouse tools
    are commonly used tools for Data Warehousing. Azure Data Lake Storage, AWS Lake
    formation, Qubole, Infor Data Lake are the commonly used tools to build the Data Lake.
  3. Data Visualisation/Analytics: Post the data has been made available in the warehouses
    / data lakes, this is the third step in the Data Engineering where in there could be many
    possibilities:

    1. Applications – Some special applications could consume it which needs data from
      various sources together in a relationship.
    2. Data Analytics – Tools like Power BI, Qlik, Tableau etc. can be now implemented on
      the top of the collected data in the warehouses to perform the necessary analytics
      activities.
    3. Data Scientists – The Data collected now can be used / analysed by Data Scientists
      to infer patterns, understand what is data saying and recommend decisions etc.

    Thus, Data visualization is the representation of data through use of common graphics,
    such as charts, plots, infographics, and even animations. These visual displays of
    information communicate complex data relationships and data-driven insights in a way
    that is easy to understand.

By 2025, worldwide, organisations will generate 463 exabytes of data. If you want to utilize it and
build a data-driven culture in your organization, you’ll need to understand challenges in data
analytics and methods to overcome these data analytics challenges. Based on my experience sharing
what may work better to address these challenges

  1. Collecting Meaningful data: Due to huge data getting generated it may overwhelm
    employees. They may hence analyse the data which is readily available and not the one
    which is really critical. It will certainly not help.
    Solution: Possibly deploy a Data Analyst and also get the data literacy improved so
    employees know what to work upon and what is critical data to business.
  2. Selecting the right tools: There are loads of tools available so which one should selected for
    ETL, Data Warehousing, Analytics is a difficult decision and anything which is not debated for
    pros and cons may not give the desired results.
    Solution: One can use expert consulting advisory or form a core group comprising of
    business, IT leadership to evaluate the right tools. The Design has to be viewed not in the
    silos but has to be thought through end to end. The Data Architect has to think what will be
    tool chain, how the handshake across technologies and tools will happen, what will be the
    right choice of the tool combinations from ETL to Data warehouse to analytics chain.
  3. Consolidation of the Data from Multiple sources: Data comes from scattered and disjointed
    sources. For instance, you will need to pull data from your website, social media pages, CRM

    portals, financial reports, e-mails, competitors’ websites, etc. The data formats of most of
    these reports will obviously vary. Putting them at one common place and analysing is a
    challenge.
    Solution: One central data hub or Data Warehouse can be created to put the data at one
    location with a relationship established as the need be. This is decision to be made by Data
    Architect upfront in the ETL/ELT phase of software engg.

  4. Data Quality: It is the most important issue and affects all the activities downstream. Due to
    data updates in one application at one place and not being updated everywhere cerates
    data consistency errors. Due to manual data entry too there could be possible errors of data
    quality. Also no validation logic in the data uploads, data creation can get wrong data or
    corrupt data inside the storage.
    Solution: As far as possible make sure there are no manual data entry points, Data validation
    should always be there at various stages ensuring data is in-line with the design. Wherever
    possible get the data uploads automated. Data Synchronization through check and balances
    should be designed.
  5. Building Data Culture among the employees: According to a study, the biggest obstacle in
    becoming a data-driven company lies in an organization’s culture and not technologies. Only
    a meager 9.1% of executives have pointed out technology as a challenge in the path of data
    analysis. Many times, though top-level understand the importance of data analysis, they do
    not extend the desired support to their employees. Constant pressure and lack of support
    from the top and lower-level employees are among the most significant data analytics
    challenges.

    Solution: Up-skilling of the employees on the data and tooling, training them on the
    importance of the data, recognising the innovative solutions they may come out with are all
    the actions which will help improve the data culture of the organisation.
  6. Data Security: Since different types of data is being collated at one place which includes
    business sensitive data, employee data etc. Unprotected data sources can become an easy
    entry point for hackers. Also the access to sensitive information could create huge issues and
    affect the business.
    Solution: Data Privacy and Protection Policy needs to be defined and implemented across
    the organisation. Data must be encrypted while getting transmitted across the networks, the
    data access must be authenticated with company defined security measures. There has to
    be frequent audits on the data security measures and current state. Any violation must be
    dealt with strict actions. Beyond these, there has be physical access control, strict adherence
    to system access control measures, no access via external attachable devices like USB, Disks
    to the developer machines, no cloud data uploads etc. etc. should be followed.

Leave a Reply

Your email address will not be published.