Code, Culture & Scale: A small Team with Big Data
Prologue
The discipline of data science, as the name suggests, is built on the foundations of empiricism: the observability and repetition of patterns in data that help us make informed decisions. Building a team, however, especially one built from the ground up for a data science research and consultation service, is far more than applying rules deduced from observed phenomena. The randomness of the human element that renders the service introduces variance that is difficult to estimate or predict. One could argue this random error is too significant to be ignored, rendering team building at any ambitious organization more of an art than a science (at least until the neuralink with a built-in scoring module for service delivery arrives).
Humor aside, this is precisely the challenge we at DSRS had to contend with. How do we build a team that leverages state-of-the-art tools and is adept at addressing today's research needs? How do we do this while minimizing variance in service delivery and planning for employee turnover as a predominantly student-led organization? These are the travails of DSRS in a nutshell.
While some of this may feel like broad strokes, I will try to be as specific as possible whenever the details matter. This post documents our journey for the researchers who intend to work with us and for those we have worked with in the past. It hopes to inspire practitioners interested in building scalable data teams and, most importantly, our interns who are keen to push data science research frontiers.
Code
"A leader leads by example, not by force." — Sun Tzu
If the question comes down to how we lead and inspire a team in the agentic era, the answer may sound so simple that it risks being overlooked. Case in point is our response to the increasingly frequent generative AI model launches.
Our initial approach at DSRS was to observe and deploy the latest open-source model releases that could fit on our hardware. We soon realized, however, that the more scalable path was to build harnesses around these releases rather than chase each one individually. In that spirit, we are working towards building Atlas: our multi-agent framework designed to run multiple parallel agents, callable from any given interface.
One of the first implementation examples is Atlas for PitchBook, where we deployed the framework in service of researchers exploring the dataset. We have also built the harness so that, depending on the need, our interns can construct an agent on this framework with custom rules by simply inheriting the base agent class.
Code, in this sense, has a direct impact on how we approach culture and scale. The way we architect a solution becomes a way to influence the organization itself.
Culture
"Culture does not make people. People make culture." — Chimamanda Ngozi Adichie
Internally at DSRS, we have always referred to ourselves as a fledgling startup, highlighting our penchant for experimentation and improvement based on observation. Thinking back, this was the bedrock of all our ambition: creating an environment for our student interns to learn, thrive, and excel while keeping the big picture in mind — improving research output at Gies.
Building an incredible culture at DSRS begins with hiring interns who are not just talented, but who can also think on their feet and communicate clearly. Given the significance of what we are trying to build, it has been extremely important to make the hiring process both interesting and inclusive. This is why every applicant takes a screening test, and why all hiring evaluations are made by human reviewers, with AI used only to help identify candidates' strengths and weaknesses.
Creating a great culture is not just about code. It depends on inscribing the mundane. Building a team with multicultural backgrounds sometimes requires stating the obvious. To that end, we have a code of conduct for our interns that underlines basic workplace expectations and describes how to communicate with our stakeholders.
It is hard to ignore the impact of the agentic era on an organization that rests on data. To keep up with a rapidly changing environment, we are working towards a Data Governance Policy that classifies all assets created, managed, and owned by DSRS.
Scale
"Sunlight is the best disinfectant." — Louis Dembitz Brandeis
At times, our ambition to deliver services and products with impeccable standards has led us down a path where stakeholders struggle to fully comprehend our skills and capabilities. To help address this, we are working on a blockchain-based platform that improves our transparency by showcasing all active projects along with their current status and expected delivery timelines.
We believe this approach will also go a long way in addressing a persistent challenge since our inception: ensuring that we deploy our limited funds toward more impactful endeavors, with the ability to estimate costs for each undertaking.
Epilogue
Our journey has been arduous and enlightening. In hindsight, we have had big wins and incredible learning opportunities over the years, including the ones that came from falling short. This is possibly the elusive nature of perfection. To paraphrase our director, Matias Carrasco Kind: we are a small, determined team driven by an infinite ambition to make an impact.
May the force be with us.
What are your thoughts on this blog post? You can write to us at dsrs@business.illinois.edu.
