Demystifying Data Science

How would you explain Data Science someone with no prior knowledge? Maybe you could start by talking about data sets, and explaining that large amounts of data need to be analysed in clever ways – you could talk about what an algorithm is and how it works, that we can use algorithms to find patterns and information in our data sets. You could go into data structure, clustering and machine learning techniques. By now though, the eyes of your audience have likely glazed over.

In an increasingly technical world, public engagement is arguably now more important than ever. We should aim not only to inspire a diverse range of future STEM researchers, but also to provide the public with enough of an understanding that they can identify accurately the rewards, and potential risks of a data driven society. The principles of data science now underpin much of our day to day lives, from browsing the web to transport logistics, but can be difficult to communicate to those starting with little or no base understanding. Unlike a physically demonstratable subject such as Robotics, public outreach in computationally heavy areas like data science needs to get creative. This was the task facing Outreach Ambassador Sarah Taylor-Knight when she set about designing a Data Science workshop for school children with little to no experience of computing.

“I’d never coded before university” Sarah tells me, “I loved computers, but I had the stereotypical IT teacher who said that girls can’t code. So, I made a workshop that I thought the current generation of Sarah’s would want to see. I wanted the workshop to be an easy introduction to data science and to be able to say well actually, you’re doing data science techniques normally just in your brain – this is how we get a computer to do it for us.”

Part of the problem in engaging people, particularly children, is that typical data science work involves programming experience, and an established base understanding of computational systems and processes. Rather than try to cram an introduction to programming as well as basic principles of data into a single day’s workshop, Sarah opted to remove computers from the equation entirely.

“The primary version of my workshop is purely pen and paper” she explains, “No computers involved. The idea is just to give kids the idea of what is feature recognition? What is a decision tree? How would you use these classification rules in your day to day life? Then, how would you use a computer to do that? – but all without touching a computer.”

TrainingBoard

By focusing on principles and applications rather than the specific toolset used, it becomes much easier to engage and encourage an audience by giving them some motivation as to why it may be worth the effort to learn how to use said tools.

The workshop pitch is that WWF need a new mascot as pandas are no longer in danger. The group is presented with 48 images of animals each which have a tick or a cross, indicating from (invented) WWF data whether people would or wouldn’t donate based on them. They participants must then figure out if they can find any logic in the dataset provided.

“You’re starting to do feature recognition, but you don’t necessarily tell them that” Sarah notes,“You can say, what are some things you can spot in these images? For example, it turns out that in the dataset provided people always donate if there’s a lion in the picture”

decisionTree

An interesting hurdle that emerged across workshops was conveying the idea that the decision trees the group produced didn’t need to classify every element of the given data set perfectly. Says Sarah, “At school you get taught that you either get it 100% right, or it’s wrong. With maths anyway, it’s right or wrong and that’s definitely something they struggled with. But if you talk to a professional data scientist, they’re not getting everything right, and if they are there’s probably something slightly corrupt in your data set! It was something I had to fight in the workshops, especially the first which was all girls. I told them to mess around and experiment, if you can get a decision tree that’s 80% accurate that’s great, that’s a really good way of analysing this data. Anything better than dumb luck is pretty good!”

Tackling misconceptions such as these is an important step in all outreach work – conveying the realistic expectations, standards and limitations of a field can help to demystify it. The air of apprehension around AI, for example, can often be diminished with a greater understanding of just how difficult it currently is to create a system capable of the sort of tasks a baby can instinctively carry out.

By separating principles from practical application, important concepts like feature recognition, decision trees, and classifiers could be to be introduced to the workshop participants of in a way that got them engaged and wanting to learn more. While these events were run with schoolchildren, this is an approach that has utility across all ages and outreach styles. For many people, their first experience with programming or computational mathematics is not something they immediately enjoy, but may come to feel more positively when motivated by interest in a field where these are critical tools.

The key to getting people involved, according to Sarah, is simple: make it enjoyable and be ready to improvise. “If you make something that’s fun but then sneak in the information and the technical side, people still learn. I didn’t stick to the plan; you never stick to the plan when you’re running a workshop. You’ll improvise and read the crowd. One of the groups really enjoyed the technical side and so they benefited from us going around and introducing more technical aspects. Often they will give you a good idea, or ask a question that derails the workshop in a really good way and that’s how you learn from them.”


Sarah Taylor Knight is an Engineering Mathematics masters student at the University of Bristol. She also works as Outreach Ambassador for the School of Computer Science, Electrical and Electronic Engineering, and Engineering Maths (SCEEM).