Let’s talk about the six common challenges I see people face in data acquisition projects.
- Setup
- Banning
- Scaling
- Maintaining
- Cost
- Quality (of service and/or of data)
Before diving in, the definition and scope of “data acquisition project” here is when the data you need exists on a website / other HTTP-accessible resources. This type of projects is also commonly known as web scraping, web data, or web automation projects. The idea is you need structured data from an unstructured web-based source(s).
This is worth calling out because indeed there are multiple data sources to consider when you are exploring leveraging data to support your business goals. Obviously you’d have the data you own — internal databases, exhaust data such as logs and metadata, or even data from websites you operate. Then you’d have external data — data you know exists outside of your system and would like to acquire and integrate into your setup for further processing and use.
Having this lens would help guide you in knowing what to consider when launching a data acquisition project, or to identify problems accurately and navigate potential solutions in your current ongoing project.
These challenges are technology and team-agnostic. They are also interrelated and will inevitably need to be considered by anyone who manages any part of the project life cycle. I’ve written more about the data-acquisition project life cycle here.
We’ll dedicate next couple of posts to go into each issue in more easily digestible chunks.
That said, a level of technical understanding is needed to fully digest this breakdown — but these can all be intuitively tied to business impact (in a nutshell: more challenges more cost) so it should be accessible to most stakeholders in your organisation. I hope this guide can help you communicate what kind of support you need to make the project a success in a way that’s meaningful to the business.
Would love to hear any story if any of the challenges resonate with you, and please do share other challenges have you seen in your experience.
Mentions