Transcript: CSPS Data Demo Week: Leveraging Data to Track Public Health Research
[The CSPS logo appears on screen.]
[Vanessa Vermette appears in a video chat panel.]
Vanessa Vermette, Canada School of Public Service: Hi, everyone. Welcome to the Canada School of Public Service. Before we begin today's event, I just want to remind all our viewers that the event is being held in English, and simultaneous translation is available. A reminder that simultaneous language interpretation is available for this event today. Also, just for participants, there will be a Q&A period, and you can submit your questions for a panellist at any time during the webcast using the raise-hand icon at the top right of the video player. We'll get through as many of them as we can following the demonstration.
So, today is the final event in the Data Demo Week Series. And we are very excited to close out the week with our partners at the Public Health Agency of Canada and Evidence Partners. Since you are all watching this webcast, you know how crucial it is for researchers and policy makers to have access to high quality and timely evidence to advance knowledge and support decision making.
At the same time, the volume of studies and data that researchers have to sort through and stay on top of is enormous. It's unstructured, and it's growing by the day. So, Evidence Partners is an Ottawa-based company that's set out to address these common pain points for researchers that are involved in systematic reviews and literature surveillance, and that they're doing it in a very practical way using automation and artificial intelligence. So, we're going to learn about how they've done this and hear from the Public Health Agency of Canada about how they're using Evidence Partners DistillerSR tool to collect, screen, and extract data from scientific research studies to produce that high quality evidence in less time and with less manual intervention.
So, I'd like to introduce first our panelists beginning with our colleagues from the Public Health Agency.
[Three more panelists join.]
We have with us first Alejandra Jaramillo. She's a senior scientific manager with the agency. She has more than 18 years of experience in the healthcare field. And for the last 12 years, she has been focused on the design and conduct of systematic reviews to develop evidence-based clinical practise guidelines and public health guidance and policy.
Alejandra is joined by Alex Zuckerman. Alex is a research analyst also with the Public Health Agency who is the systematic review lead with the Evidence Synthesis Unit and has become the team expert for using the DistillerSR tool from Evidence Partners.
Now, from Evidence Partners, we're very pleased to have with us today the CEO, Peter O'Blenis. Peter is a pioneer in lit review automation and has been working on these challenges since 2001, so for 20 years now engaging with hundreds of research groups around the world. We also have with us Derek Lord, who is the team lead for Solution Engineers at Evidence Partners. So, Derek's role is to be directly involved in implementing the solutions for clients and helping them follow practices in the space.
So, Alejandra, I'm going to invite you now to provide your introductory remarks on the work that you've been doing and your involvement with the tool. Welcome. Thank you.
[Alejandra speaks silently.]
Oh, Alejandra, if you could just unmute your mic.
Alejandra Jaramillo, Public Health Agency of Canada: Thank you. Sorry. Can everybody hear me okay?
Vanessa Vermette: Yes.
Alejandra Jaramillo: So, good morning, everybody. Thank you for inviting us to be part of the Data Demo Week. I will just take a minute to tell you a little bit about our team. As Vanessa mentioned, I have the pleasure of leading a group of scientists and methodologists here at the agency. We are part of the applied research division in the health promotion and chronic disease prevention branch.
And our main job is to conduct systematic reviews of the evidences and to develop evidence-based public health recommendations and clinical practise guidelines. Currently, the focus of our systematic reviews and guidelines are topics related to the wider impact of COVID-19, such as the impact of COVID on mental health, food insecurity, family violence, long COVID and so forth. So, the data that we produce, all our findings feed into policy development here at the agency. And as you can imagine, the pace of our work is very, very fast.
New studies on COVID are published almost on a daily basis. And so, it's really important for us to keep the body of evidence and recommendations to senior management up to date. And we have very little room for error. DistillerSR has been instrumental on allowing us to deliver what is expected from us. And Peter and his team will tell you more about the tool. And I'll be happy to address questions afterwards. Thank you, Vanessa.
Vanessa Vermette: Thank you so much. Yeah, I think COVID is a great example of how quickly we need to adapt and shift and be able to stay on top of the evidence that's out there. So, thank you for that example. So, now, Peter, this will be your turn now to provide the short presentation about the tool. And then, we're going to go immediately into the demonstration after your presentation. So, Peter, over to you.
Peter O'Blenis, Evidence Partners: Perfect. Thank you very much. Just a few slides to sort of set the context, and we'll actually let Derek run with the fun stuff. So, let me just quickly share my screen, so many screens to choose from here. One moment, please. Here we go.
[Peter shares his screen, presenting a slideshow. He flits through slides showing pictures of the Evidence Partners team, statistics about the company and key quotes from Peter.]
All right. Let me know if the slides don't look right to you.
So, quick overview of Evidence Partners and Distiller, of course, there's myself and Derek who you'll be hearing from momentarily. The company itself, just a quick overview. So, the company's been around since really the end of 2008. We focused primarily initially on systematic literature reviews in the academic space. That was our primary market back in the day. But that has really expanded to pharmaceutical companies, medical device companies, government agencies in Canada, the US, Europe and Australia and the UK, and really a broad range of types of literature reviews.
It used to be primarily systematic reviews. But now, we'll hear about more and more types of literature reviews that people are doing. And so, we've got about 300 customers worldwide. Those would be institutions that use the platform. And the things we're proud about are the very high renewal rate. So, it's a subscription-based platform. And it basically has close to 100% renewal rate and a very high customer satisfaction rating.
And we like to think this because we really care about this. This is really our mandate, is to improve the quality research globally. When we first got into doing literature review software, there wasn't anything else out there that did it. And recognizing this was very important research. And people were using Excel spreadsheets and SharePoint and whatnot to try to get this work done. It was just an error-prone time-consuming process that was not letting researchers focus on the research. They were focusing on all the logistics involved in the process.
So, our mandate is to always be leading. We were disruptive in this market. We're the first product of its kind. And we're constantly focused on reinvestment in the platform. And the idea is just to continuously innovate basically all of the profits that the company generates go back into innovation. And that's an ongoing basis. Today, we're at about 75 people and growing quickly.
And the goal as well is to basically remain the trusted partner of the people that we serve. Ultimately, that's what people want. They want to be able to trust us to deliver the platform, and they want to be able to trust the evidence that comes out of it because it's critical. It's very, very important information that's coming through that system.
So, really quickly, what is a systematic review or a structured literature review? Essentially, if you were to describe it at the 50,000-foot level, you've got a question. It could be a clinical question. It could be any question that you need answered. You will search all of the corpus of knowledge that's out there for information on that. These would be scientific papers or papers of just about any kind, depending on the types of research you're doing. You'll retrieve those references that look promising. You'll then screen out the ones that are not relevant.
And then, you'll go and extract the key data elements from the ones that are. And you'll use that data for analysis and reporting and to inform decision making. That's essentially what a literature review is. Again, the issues there are that those processes involve tens of thousands of cells of data. In the COVID's case, potentially hundreds of thousands of papers over time. It could be as few as a hundred. But it really spans the whole gamut. You've got multiple people trying to collaborate in real time, people collating data, people dealing de-duplication, people validating each other's responses. This is all just overhead that makes the process more time consuming and more error prone. And that's really what we're trying to address.
When you look at who does these again, it's pharmaceutical companies, medical device companies, as part of their regulatory processes and their safety monitoring processes, government agencies, like Alejandra's, for example, using it to inform public guidelines and so on. It's used in food safety. It's used in environmental research toxicology, even in social sciences. So, I know one group that was looking at the impact of class sizes on student learning, for example. And they use a systematic literature review process to come up with evidence to inform their decisions on it.
So, there's a lot of different areas that this process applies to. It's a fairly generic process really. And you can use it for a lot of things to generate high quality evidence. And, of course, when you generate evidence this way, you're aggregating research that's already been done. So, you end up with something that's typically statistically more useful to you than a single study or a small group of studies. So, it serves its purpose quite well.
[A new slide shows cons of "ad-hoc unreliable review processes" that Peter lists, and an arrow points to a circular graphic reading "standardized automated processes. Three even sections within the circle read: Transparent Management, Automated Processes, and Structured Workflows." A smaller circle in the center reads "Trusted Evidence."]
Again, the reason that folks are looking to use a platform like Distiller is because the process is logistically intensive. It's time consuming. It's costly. It is very error prone, especially if you're using things like spreadsheets with multiple people using the same spreadsheets. It's very easy to key in bad data. We had a group at Medtronic, for example, that says it takes them on average three days to find an error in a spreadsheet because these are huge spreadsheets.
And so, that's just not a good use of researcher time. And as I think Alejandra will attest to, it's really hard to find people who are good at this. These researchers are not easy to come across. So, we want to make sure that we're using them as effectively as possible.
[A new slide show a complex diagram with small writing. It features various cloud document sources feeding into DistillerSR, which exchanges data with DAISY. The documents then feed out to more cloud sources.]
This is an eye chart. I'm not going to go into too much detail. But just to give you an idea of what Derek's going to show you is essentially if you envision Distiller as an AI-enabled workflow engine, so it's a team-based workflow engine that handles again most of the logistical elements of doing your literature review. So, you feed references into the front end of Distiller. And those references can be in just about any format you can find, essentially. And they can come from all sorts of different data sources.
So, it becomes very important to pick out duplicates. You're always going to get duplicates if you use multiple data sources or if you have an ongoing literature review. It seems like a simple problem, but it's actually not that simple. And it takes a ton of work to do this. But Distiller has a natural language processing-based deduplication engine, which is very efficient at finding and removing duplicates without accidentally removing stuff. It shouldn't remove. It then takes humans through the screening process, assessment process for papers, and basically any workflow processes that you define. So, you can define whatever your research protocol is in the platform. And it manages that process. It signs work to people. They go and do the work. And it validates it and so on.
There's another component, which is the artificial intelligence. So, it can be used to accelerate the screening process, for example. The AI will monitor your humans as they're screening. And it will learn what they like and what they don't like, and it will bubble the important papers to the top of the pile that can save a considerable amount of time.
You can also build your own classifier. So, if you've screened a whole pile of references, you can basically tell it, okay, that's the kind of paper I'm looking for. And it can build a classifier that could just run autonomously against papers to assess them for validity. And so, it's essentially a DIY AI classifier engine as well. And then, we've got people that are building or organizations that are building classifiers specific to their needs.
And then, what comes out the other side is, of course, you can monitor the whole process as it's going. You can generate customized exports. There's a bunch of canned exports as well. It also has an API, so you can feed literature into it and extract materials out of it, programmatically, if you want to build it into a larger system. And certainly, we have a lot of government clients, particularly in the US that are doing that right now and so on.
So, again, just think of this as an AI-enabled workflow engine. It's a completely web based. It's hosted in the cloud. It's regulatory compliant, which is pretty much a prerequisite for any of the customers we have and so on. And in terms of why people use it, it's a fairly dramatic improvement. People are saying somewhere between 40 and 90% time savings. Phillips says they save a million dollars a year in one department just by using it. And it's not putting people over to work because you still need really smart people to do all this analysis. It's just taking the drudgery at it for them and letting them focus on the stuff that's really, really useful to them.
So, again, really the cost savings and time savings, the error reduction, and Alejandra already alluded to that, you don't want to publish something that's wrong, especially if it's a health guideline. That's a career-emitting move and dangerous. We want results that are truly transparent and auditable. So, if somebody asks, "How did you come to that decision?" Well, the providence of every single cell of data should be readily available and click onto that cell of data. Where did we find it? What paper did it come from? Who extracted it? That's all right there at the surface. You can see it. The configurability is really important because no two groups that we work with do things exactly the same way. And you don't want the software to dictate your protocol. You should be able to define the protocol and the software should adapt to it.
And lastly, and Derek works closely with our customer support team as well, it's really important in terms of onboarding to have that level of customer support and get people up and running quickly. Typically, it's a couple of weeks from the time you can sign up depending on your availability, really, to the time we can actually go live with projects. And I don't want to take anything in away from the more important stuff. But that's just a high-level overview of Distiller and what we do. And I think we can go right on to AJ or to Derek.
[The screen share ends.]
Vanessa Vermette: Thank you so much, Peter. That's great. I think you said it best. It's the ideal marriage between human and machine when the machine does all the drudgery, right? And the humans are freed up to do the high value. Interesting work. So, thank you. Over to you, Derek.
Derek Lord, Evidence Partners: Oh, thanks very much. Let me turn on my screen share here for everyone. Just give me a confirmation quickly. You can see this popping up.
[Derek shares his screen showing a Distiller interface. A dashboard reads "Welcome to Arthritis Economic Impact Evaluation!" and lists tasks such as screenings and data extractions."]
Peter O'Blenis: That's good.
Vanessa Vermette: Yup.
Derek Lord: Excellent. Thank you very much. And thanks for Peter for the overview there. You covered a lot of great details. And I'll touch on some of them again here that are really important. But really, I want to spend time looking at the actual software. How does it work? How can we use it? What are some of the benefits of using it?
So, at a really high level, I'm going to talk about three main components here or four, really. The software overall, and then, how do we actually get data into it? How do we process that data? And then, how do we get data back out of it? All three are, of course, key to success. But each have their own unique challenges. So, like Peter said, it's software as a service. It's all online. There's no installation, supported by all major browsers. So, it makes for a collaboration around the world and across time zones very easy, and especially if you're used to using Excel sheets and trying to send those between teams. There's lot of challenges and a lot of burden. I'm sure a lot of you guys have run into with that wherein the data doesn't quite line up. One person puts a hyphen, one person puts capitals, some other person doesn't, right? It just becomes a lot messier. And there's a lot of effort just with cleaning that let alone actually getting information out of it.
So, using a tool like DistillerSR all online really takes care of that administrative stuff. It puts everything in one centralized location and lets the team focus on the important, interesting, and value-creating work as opposed to all those administrative tasks. Now, DistillerSR is incredibly configurable. This is something we've heard over and over again from everyone. It's very flexible even on this project that I've opened right now, Arthritis Economic Impact Evaluation. We can see I've got a couples levels in my workflow, title abstract screening, full text screening, data extraction, and then safety and efficacy extraction. But everything we see here is completely definable by you guys.
So, what steps should actually be taken? What work needs to be done to make sure that we get the appropriate information out of here? And this is just one project. We can even have multiple projects created and jump between those very easily up here.
[Derek expands a dropdown menu, showing a list of projects.]
And just like in the real world, no two things are identical. So, you can set each of these up uniquely to make sure that whatever the challenges of that project and that team are, they're going to be met. And we can even set up real time validation for the team. So, while they're collecting that data, it's actually going to be validated by the system and again, taking care of that administrative work right out of the gate for them.
So, what I want to take a look at here is going first to importing references and talking about that a bit more. And we'll start to circle back to a few of these other things. So, I'm going to look at the... We've got the intuitive menu along the top here. If I want to bring references into my project, I'm going to go to the "references," "find an import references."
[He navigates to the "find and import references" page. On it, a dropdown menu reads "select a source." He expands it, highlighting the "From File" option.]
Right away, we've got a few options. I'm just going to focus on "from file." And we have a lot of different groups. So, some of you use PubMed. Some of you may not. We have a direct connection to the PubMed database.
[Tabs expand, offering different tagging formatting options.]
But I'm going to jump straight into "From File." Now, DistillerSR is configurable. And that means you can make it as simple or as complex as you want it to, in a lot of cases. So, for example, right here, we have lots of bells and whistles. But realistically, I can scroll right down past that. And we can even just drag and drop a CSV file in there that we'd want to import into the project, click that...
[He drags a file into a "drag and drop" space under the formatting options. He clicks a button reading "Import References."]
Apologies. Big green button. Making tongue twisters up as we go here.
And that's going to import all those references into the start of the project where they're going to be ready for the team to come in and take a look at. So, very easy to use. And, of course, you can make that more challenging as needed depending on how many sources you have and the differences there to make sure those are being accommodated.
Now, we also have a feature called LitConnect. That's really interesting. A lot of different spaces are starting to adapt it where you can actually have databases like PubMed or like Ovid that have auto alerts or notifications set up where you don't have to go out and rerun your search every day or every week. The database will actually run it for you on your conditions, your criteria. And they can actually send those to Distiller where that data's automatically imported and brought into your project so that just like if we were doing a manual import here, all of that data is now going to be ready again back here at the dashboard, starting up at our level one screen there ready for the team to take a look at. So, again, removing that administrative effort, letting the team focus on what's interesting, what actually needs to get done as opposed to running out and scanning the web for things and clicking 20 times to try to get the work done just so they can actually get the real work done.
So, another thing I want to mention here real quick is that in addition to the configurability, everything in Distiller is also going to be permission driven. So, as a team, you can actually set it up so that different users have access to these different projects we spoke about. Different users have different things they're working on, and you don't want to see everything all at once. You can set up different tasks that different users can do in each project. Not everyone should be able to do everything and every project, right? That's just asking for trouble. And a lot of people don't want that access if they don't need it. And you can even get really specific. So, if you're bringing in contractors to help, for example, you might say, "Look, I just want the contractors to help with our two screening levels," and they shouldn't see anything else in the project, right? So, no problem there at all. You can have all sorts of specific configurations set up to make sure that your needs for that project are being met.
[On the dashboard, he scrolls.]
Now, on the landing page here, we can see as a reviewer really quick, where work needs to be done. How many unreviewed references I have at each stage of this workflow? But we can also jump over to the project progress tab. And this is going to give me a quick high-level overview of what's happening in the project right now.
[He navigates to the "Project Progress" tab. It shows a bar graphs denoting reference status and reference progress.]
So, I can see, for example, we're about 30% of the way through a title abstract screening. And I can get more details on that. How many references have been included or excluded at each of those levels?
Just really high-level information and some quick details on which users who've been active in the system recently, and what kind of effort they've been putting into this particular project. Now, we can, of course, get much more detailed in our reports, but if you want a quick high-level glance, that could be a great tab just for getting you that initial sort of assessment of where this project is at and what's happening currently.
So, we're going to jump over now to actually processing some of these references. We've imported data into the system. Let's take a look at how to actually get it through the system. So, I'm going to jump into our screening level here... or sorry, before I do that, I'm going to take one more step back. Once we've imported references, the next thing is, anyone knows you need to do is dedup(licate) them, right? Nobody wants to spend time and money looking at the same thing two or three times for the different team members. So, I'm going to go back into our menu here and jump into our Duplicate Detection tool.
[In the "Duplicate Detection" tool, many filters and tagging options appear. He highlights them, showing a button reading "Run Dupe Check."]
Just look at this briefly. But again, lots of bells and whistles. But you can jump straight to that big green button. Click that, and the system's going to bring back a bunch of potential duplicates
Now, we find it does a great job. Even using reliable tools like EndNote, there's still extra duplicates that most teams find using Distiller.
And so, in this case, we can see it's brought back RefID 6 and RefID 226. Just glancing at those with the eyes, they look pretty much identical to me. And the system agrees. It's giving us a 97% match confidence on those two duplicates. So, once we found these and evaluated them, we can pick which one we want to quarantine so that we're not processing the same thing multiple times. But a quarantine's not a delete. You still need that transparency, that auditability. So, you can then go into the project and say, "Show me all the references that we didn't process because they were quarantined. And why?"
Now, on this screen, I've only got about six potential duplicates. But in a lot of projects, you might have hundreds or even thousands of these. That's a lot of time and effort. So, we also have a feature called Smart Quarantine, and that can let you actually set what you're comfortable with the computer helping with. So, we could set a threshold and say, "Look, if the computer is 97% confident, then so am I." And I want the computer to automatically quarantine all of the newest references that meet that threshold or more. That's going to take care of, let's say, 80% of the duplicates that you guys find and leave just the ones you're a little less sure about for the human team to actually look at. And you can set up what sort of thresholds you want to have there.
But really, a big thing we try to do is the computer's there to help you when and where you want it to. So, it's up to you how much it does and where it's going to do that. And, otherwise, you guys are still in complete control of the process.
So, now, we're going to jump back to the dashboard again here. And I'm going to look at actually reviewing one of these references. So, now, we're going to jump into our title abstract screening page here, open up my unviewed references. Again, some filtering options at the top there. But I'm just going to grab the top reference in this list to see what that looks like.
[Derek navigates to a list of unreview references and selects one. It takes him to an abstract page with a warning at the top that reads "You are editing data on forms originally submitted by derek_lord. Any changes will be tracked in the Audit Log."]
And so, we can see that this is actually something that I already reviewed myself. So, apologies. Let me jump back out here. Go into an unviewed reference just so you guys can see what a new one looks like.
[He selects a new reference, but the same warning pops up.]
All right. So-
Peter O'Blenis: Derek, you still had the reviewed dropdown selected, I think.
Derek Lord: I did, didn't I? Apologies.
Peter O'Blenis: Yup.
Derek Lord: All right. Let me jump back here. I think it went into the wrong part.
[Derek adjusts some filters on the screening page.]
All right. Let me just go into full text screening instead here.
Vanessa Vermette: And maybe you're so efficient, you just reviewed them all.
Derek Lord: Apparently, I did. Apologies, everyone. All right. Well, you know what? For the interest of time, what I'm going to do is just jump in here.
Alejandra Jaramillo: Derek.
Derek Lord: Yup.
Alejandra Jaramillo: You have answers selected for sponsors for you. That's why it's coming up with all of the ones you've reviewed.
Derek Lord: Yes. Sorry. I got that cleared here now. So, we've got our unreviewed references. And so, I'm just going to open up the top one of this list. Thank you for the troubleshooting, everyone. Appreciate that. As we can see, humans aren't perfect. Machines aren't perfect. But we're all stumbling through it together.
[A new reference loads without the warning He mouses over components as he explains.]
So, I have an unreviewed reference now open in front of us that we can all take a look at together. Up here at the top, we've got a bibliographic format for that reference. Again, configurable, you guys can define what information should be displayed up here and what shouldn't. Down here to the left, we have an abstract for that reference. And we'll notice a whole bunch of colours in here, words like arthritis, costs, treatment. You can define what keywords are important to your project and set that up quickly and easily in the project. And that's just going to help draw the eye there to those critical details to help them get through that screening even faster and easier.
Over here to the right then, we have the actual review form that's being done. So, as a reviewer, these are the questions I need to answer for this part of the project. And we can have real time validation put in here. So, for example, if I try to submit this, the system says, "We can't go yet." There's mandatory questions that need to be answered. So, are we including this reference or not?
If I include it, then, that's fine. Now, I can submit, or if I exclude it, more questions are going to pop up. So, the forms can even be created dynamically set. Instead of going through a hundred rows in an Excel sheet trying to figure out which questions apply, which ones don't, spending time on that, the system will actually bring you the information that you need to collect based on the information you've already given it.
And so then, we can answer that as well, processing these references, answering whatever questions we need to. And everything on this form, completely configurable. How many questions you have? Are they text questions, multiple choice questions? Where that validation should be applied? It's all things that you guys get to define for your project and that step in the workflow that you're currently looking at.
Now, if we hit submit here, a few things are going to happen behind scene. So, as soon as I hit submit, that reference is going to go away. It's automatically going to load the next one so we can keep our train of thought going and process things quickly. But it's going to actually create an audit log. So, we know who looked at the reference, what date, what time, what answers did they give, all of that information. It's going to have a version history so that if I change my answers or a manager comes by and changes my answers, we know not just the current data that was selected, but like Peter said, the complete providence, the complete history of all of the information that's been associated with that reference.
So, you can go as far back through that as you need to, and it's all available to you. The system tracks all of it automatically. It's even going to make sure that any workflow requirements you have are being enforced. So, if we need to have two reviewers look at every reference, the system will take care of that and make sure that we have two independent people looking at them.
And then, of course, what if they disagree? One user says, "Include." One user says, "Exclude." The system can flag those "conflicts" as we call them, bring them up to your attention right away. So, you're actually finding those and resolving them right away while it's fresh in memory and instead out of six months down the road when you find out from reviewing the Excel sheet that all these two people disagreed, let's reread the whole paper and see why they disagreed.
So, there's a lot of things that's going to do behind the scenes just to make your life that much easier, letting you focus on the actual review and getting that data out. Now, as soon as we're done this, I'm going to actually jump back over here. We're going to at a different part of the workflow.
[in the reference filters form, Derek highlights search filters.]
And so actually, before we do that, I'm just going to talk real quick here about DAISY Rank too. So, we have what's called DAISY Rank selected here right now. And we have that automation that AI built into the system in a lot of places. But it's up to you how much we want to use it and where. And it ranges from really simple to really advance in terms of what it does. DAISY Rank is a great introduction into automation in AI.
So, in this case, like Peter said, the AI's going to watch what the team does, find similarities between what you've included and what you've excluded and start bringing all the most likely includes to the top of the pile so that the team finds 90, 95% of your includes after screening only a small percentage of the total references in that project. And you're going to be able to then process those through to the full text finding that information faster and getting those data-driven decisions that you're going to turn into policy or turn into regulations that much faster.
It's great because the AI helps, but the humans make all the decisions. There's no reliance on the computer, the accuracy or the trust that you have to build in that computer. And that's sort of the entry point. We've got a bunch of AI tools that are like that in that they help, but they don't force anything. And it goes right up to much more complex things where you can actually do build your own custom classifiers. No coding required. No data scientists on the team are needed, but you can actually have it answered closed-ended questions like is this an RCT? Is it a child population? Whatever kinds of those questions you want asked, you can train the custom classifiers to actually help with that classification as well for yourself.
So, a lot of different options there. Don't get too deep into the rest of it. But just you know so that there are many choices there in terms of how it can be incorporated. So, from the dashboard here, I'm going to jump back. And I just want to show one thing here at the full text screening level.
[Derek navigates back to the dashboard, highlighting elements.]
So, we'll note here that to this is still at 30, even though I reviewed one reference because it was excluded. And so, it hasn't gone on through the process. If we are excluding something, we don't need to look at it anymore. If something gets included, then, it's going to progress through the workflow as we would need it to do. This one here has an attachment on it. So, jump into that one.
[Derek heads back into the "Screening" task and scrolls through references. As he mentions links, he opens them.]
So, in this reference, we can see there's a PDF here. There's some more context. So, I could actually have a PDF available in the system. I can open that up in another tab. So, we can then get that context and read it and use that for the rest of our data extraction or full text screening.
There's a few other ways we can get full text documents in here. I'm just going to go through a few of them. But there's even more than what I talk about. So, we've got partnerships with right find and article galaxy. We have links back to the original DOI source if that information has come through on the reference as well, we can actually search PubMed Central for all the free PDFs associated with those references and automatically attach them for you. If you're using EndNote, we can pull in references from EndNote as well.
And again, there's more beyond that. But there's a lot of ways that we can try to get that information and get that context into your system. Even if you have an internal shared library or an online source of these PDFs, you might be able to link directly to that as well. So, a lot of choices because we know that gathering the full text documents is not the best part of that job. But it's a very important part of that job, right? Without it, the rest of it kind of comes to a halt.
So, last but not least here, I want to take a quick look at reporting. How do we get information back out of the system?
[Derek shows a dropdown menu from a top menu bar, showing a variety of reports. He navigates to one.]
So, we've got a bunch of pre-made reports in here. In the interest of time, I'm just going to show you one. So, we have our exclusions report. And I know we have a lot of groups on here, so I'm going to try to keep this high level. But we have our screening details. How many references have been excluded our title abstract screening? But even, how many have been excluded for a specific reason, right? And we'll do that for each.
We're giving you those counts and that information right away here. And if you do, use a PRISMA, that's great. If you need other formats, we have a few different ways. You can download this exclusion information. PRISMA is one of the most popular ones. So, I'm going to download PRISMA with details here.
[He downloads a PRISMA file and it opens, showing a complex flow chart.]
And anyone that's familiar with PRISMA is going to recognize this. Hopefully, you guys can see the Word document on my screen there. Thank you. And so, we can see it's actually prepopulated all of our PRISMA details for us. And anyone that's created these knows, on a good day, it takes a couple of hours to build this. And on a bad day, it can potentially take north of a day to actually get the information collected and organized and processed in a way that you can present it here. So, at the click of a button beats both of those by a pretty good stretch either way. So, that could be huge time savings for you.
Now, exclusion reports is just one thing, though. We need more. So, we also have our Datarama, which is a customized reporting tool here.
[On the Distiller platform, Derek moves to a "Datarama" page, showing tabs of report settings, options and criteria.]
It lets you build your own reports with your own information and your own choices in terms of what information we want to pull out of the system and to send off to wherever it needs to go to next.
So, in this case, I'm going to create a quick report on the screening data that we are just looking at just for consistency's sake. So, if I run that report, we'll see a bunch of data come back here with the references that have been processed, the answers to those questions.
[He clicks a button, and a table of info populates. He selects one and is taken to the reference page.]
And I can even click in here, and it's going to take me right back to the original one. So, just like we were looking at earlier, where if someone has already completed that, you can still open the data. You can still see it in original format that they saw it in with their answers. If I change this, I can do that no problem. And it's going to feed right back into that version history that we spoke about as well. So, this can be really great for that sort of why we make that decision. Let's dive into the details, and we can get there quickly and easily.
Now, if we're doing data extraction, for example, you might have a lot of questions. You don't want all of them in one report at once. So, for now, I'm just going to trim this one down and get rid of our exclusion reason and our comments, let's say, keeping just a couple of those fields so we can rerun that. And now, that's all we're getting back here. We might want to do filtering, so on specific products or outcomes. So, let's add a filter here. And we have a few different filter types.
But I'm going to take a look at question and answer since it's one of the most popular.
[On the "Datarama" page, he uses the filter options.]
I'm just selecting the level, the form, the question. I'm going to show only the references that we flagged as "yes" to keep as background material. Run that report one more time. And now, that is all we're getting back. So, it's really flexible in terms of what information do we need in this report and how do we want to get it out?
We can throw on a custom bibliographic format there as well, throw on AMA. This is still online though, only gets you so far. So, we can send that to Excel, CSV, RES. Whatever format's going to work. I'll throw this into a quick Excel sheet here, open that up.
[He downloads and Excel file.]
And that went to my other monitor. Here we go.
[The Excel sheet of the report pops up.]
So, we have that same information now available on an Excel sheet that I can create a graph on or do calculations or maybe just hand off to the next team, whatever it is we need to do with it. As Peter mentioned, we also have an APIs. You can pull out all kinds of information quickly and easily. That way, we can save these reports and rerun them easily. So, really, there's a lot of capabilities here. And this is still just the tip of the iceberg, just like with many other things.
But I think I've covered all the most important stuff I want to make sure I show you guys today. So, thank you very much for your time. And I'll hand the mic back over for some questions, I believe.
[The screenshare ends.]
Vanessa Vermette: Thank you so much, Derek. Yes, indeed. We're starting to get some questions in. So, just a reminder to all our participants, the time is now to send in your questions using the raise-hand icon in the player. I want to start a little bit just by maybe putting our Public Health Agency colleagues on the spot and asking them to tell us a little bit more about their experience in onboarding to the tool. What was that like, the configuration, all the work that went in because I imagine it's a fair bit of detailed work at the front end to get set up and then you start to reap the benefits over time. So, maybe, Alejandra or Alex, one of you tell us a little bit more about that.
Alejandra Jaramillo: So, yeah, Vanessa, that's a really good question. And I can definitely pass it over to Alex so she can answer some of the more technical pieces. But I did want to let you know that, I mean, the onboarding is pretty straightforward. From a management perspective, it actually facilitates things quite a bit. And it's onboarding for us as a team, like, learning to use a tool. But I'm sure that many of you can relate to the current situation with COVID at least here in the agency, because of all these additional COVID task force that were created, we have people coming in and out of the team.
It's a pretty regular practice these days. And so, from a management perspective, onboarding the team as a whole, but then also onboarding new people coming in has been pretty easy and handing things over when somebody else needs to go back to the task force, the vaccine task force, and then they come back again. We have to kind of, bring them up to speed again on the tool. And that has at least from a management perspective, it's easy. But it's also the certainty that we have everything documented because of that audit trail that Derek was mentioning. So, we know who did what. And the handover between people, between staff are much, much easier. And for the technical pieces, I'll pass that over to Alex. Thanks.
Alex Zuckerman, Public Health Agency of Canada: Yeah. Thank you. Thanks for having me. Yeah. So, I think Derek touched on a couple of the features that are really useful for these flexible teams where people are coming in and out. So, something that he didn't touch on is that you can actually save some of these reports that he was showing us in Datorama. You can save them, and then you can publish them to the task bar. So, if you have somebody coming in who doesn't have any familiarity with DistillerSR tool, you can make this dashboard very user-friendly. I mean, obviously, finding the references is very straightforward. As Derek showed us, you just click on the link where it says unreviewed. But then you can also make it easy for them to just run pre-set reports that will give them the answers they need.
For instance, if they want to check conflicts with colleagues or whatever, and so, as people are learning the platform, you can set it up to be very, very friendly. And then, as they become more familiar, you can give them more permissions. You can also set it up so people can only edit their own answers because even though you have the audit log, sometimes, people click on the wrong form by accident. And so, you can kind of reduce the error potential there.
And because you can add people to multiple projects as soon as they're comfortable with one project or if they want to switch over, it's a click of a button and you can move people between teams very easily. And then, the big part of the value of Distiller also for the training and for new members is with the AI, you get some very powerful quality control options that you can pursue.
So, in addition to doing pilot training with people on forms, as you would do with any systematic review, the AI, for instance, suggests some references that might have been accidentally excluded. So, if you have a new team working on something, you can go in, and you can look at what the AI thinks might have been excluded by accident. And you can do a very quick check there without even having to do a whole report and look at everything. So, it's very powerful on the experienced user end, and then, it's quite user friendly on the new user end.
Vanessa Vermette: That's great. Thank you so much. We have a question here about the intake process. So, the question is, "Does this tool help make sure we're reviewing all of the relevant evidence or are we only getting what we pre-select from certain sources?" So, I'm not sure who would like to maybe address that one. I know any kind of software tool is like garbage in, garbage out, right? So, at which point do you make those selections about what's going to be included in the review process? And is it still a human decision or is the tool helping you select what's going to be included at the beginning and the intake process?
Alejandra Jaramillo: So, I assume that the intake process, meaning what sources were you saying for citations. And I mean, for us, we follow very rigorous scientific methods. So, we are using kind of the standard databases where medical studies or clinical studies in public health studies are published. And so, we do have kind of standard databases. We always have more than one so that we can also kind of make sure that we're not missing anything. That bit comes with kind of the expertise of just knowing your field. What are the sources that you should be using? Kind of that's the human aspect of it. But we also use DAISY because of some of the ranking of, for example, citations that actually allows you to go through the ones that are most relevant to your work first so that you're kind of getting through those first.
And I should just clarify that our team focusing on this mandate is relatively new just within a year. So, it's pretty new. But very quickly, we were able to integrate all the AI pieces. So, the artificial intelligence, including the DAISY now, we're looking at integrating it, so that DAISY does the first screening, the first level of screening all by itself for some of the reviews where we're constantly updating. So, hopefully, DAISY will be doing all the first level of screening. And then, we do the other levels.
But the sources themselves, yes, they are sources that we choose knowing the topics, knowing where the evidence, the best evidence, will be coming out from. But we do include kind of the electronic databases where medical or public health studies are published. But we also include, for example, sources from government reports, or websites. And so again, that really is the human deciding that. But then, DAISY can help us go through that and even cover potentially the first level of screening once you've trained the tool.
Vanessa Vermette: Great. I think that answers the question really well. Alex, did you have anything to add or-
Alex Zuckerman: No. I think because you said garbage in, garbage out and Derek also touched the deduplication, it's very useful and we always use it. So, we have a living systematic review, where we're up to about 15,000 references. And obviously, there's a lot of duplicates in that. And the tool is very powerful. And we do deduplication before in our reference software. But there are always things that slip through the cracks, and you can really get rid of, as I said, the bulk of them with the automatic quarantine.
But then, you can also go and dig in and modify exactly how you want the tool to match things up. Do you want it to just check the titles or just check the abstracts because depending on which database you extracted from, you might have missing information. And so that really helps save time later on, and it catches pretty much 99% of duplicates. It's very rare that we really come across an obvious duplicate in screening after we run, we spend a little bit of time with the deduplication tool.
Vanessa Vermette: Thanks, Alex. A question from a participant, Peter, I'm going to kick this one over to you. So, if somebody wanted to be able to assess the methodological quality of an included article or study, does the system allow for that?
Peter O'Blenis: Yeah. So, I mean, you would have a method for assessing quality. It may be a Cochran standard. And that's essentially just a series of questions that you would ask an expert when reviewing the paper that they would answer. And you would basically score the paper based on how they answered the questions. And so, you would essentially create that form in Distiller. There are a number that are templated already in there that you could just use a lot of the sort of standard Cochran-type quality assessment forms are already there.
And to Derek's point earlier, you can set it up. So, the forms are smart. So, you're answering questions certain ways, and it can be calculating your overall quality score for you at the bottom of the form, if you'd like to as well. But that's just part of basically able to adapt to whatever protocol you want to use.
Vanessa Vermette: Great. Thank you.
Alejandra Jaramillo: Vanessa, if I may-
Vanessa Vermette: Yes, please go ahead.
Alejandra Jaramillo: ... Just add a little bit to that. So, just in our case, just risk of bias, I mean, that's kind of really what it's getting to... the first step in getting to quality of the studies or the citations. And we actually built the risk of bias tools. There are many different types of tools that are used to assess risk of bias depending on the type of studies. And we do have the flexibility of building them in the Distiller. So, we can actually do that and capture that, and it's part of, again, all kind of the database that we keep information on all these studies. We're also applying grade. So, those of you who know also about methods, rigorous methods for guideline development, for public health guidance, for clinical practise guidelines, we use grade. And that can also be built into Distiller, and, just as Peter said, respond to some of these questions.
And then, we have all the history of why we made decisions, because some of these are judgement calls. So, the human aspect is still there. We're making judgements about what we consider as we're appraising the evidence, let's say, or assessing the quality and the certainty on the evidence.
But we always need the ability to go back. And if Dr. Tam's team come and asks us, "Well, why didn't you miss this study or why didn't it get included?" And we're being asked questions about this. We need to be able to stand behind our decisions and why a study got included or excluded, or why we did in place as much certainty on that evidence. And so, we do have the ability of capturing that in the system.
Vanessa Vermette: Thank you. Yeah. That's very important, that auditability really supports the trust that you have in the end result and in the process overall. Another question around quantitative analysis, so the question is, "It possible to do a quantitative approach for review such as a meta-analysis or meta-regression in the system, or how would the system support further with respect to the data that you can extract?"
Alejandra Jaramillo: I can take that to start and then feel free to jump in, Alex, Peter, or Derek. So, we are using statistical software. So, we don't actually conduct the analysis in Distiller. But the ability for us to be able to extract the data in a consistent way, that is tremendous help. So, doing this in Excel, which I've also done in the past, the possibility of introducing error is much, much higher. And just the consistency in which you're able to extract the data can introduce errors, and it just delays. You would probably have to go through some QC, quality control before you do the analysis.
But no, the analysis is not done in Distiller. It's just that ability to be able to extract data, to build forms so that you're extracting exactly the data that you need, the outcome data that you need. And clinical data, it tends to be quite straightforward in how you do that. But we're also working on public health topics. And those can be reported in many different ways.
And so, for us to be able to do this in a consistent manner and helping people, the form can actually help the data extractors really identify what outcomes to extract, the data to extract, study characteristics, and how to put it into the form. So, the form can help that. It's just user friendly. And then, that data is used for the analysis. But the analysis is run separately. We use SaaS. We use other software, SPSS. We've used it in the past. So, yeah. For meta-analysis, for any kind of regression, we do that separately.
Peter O'Blenis: Just to chip in, the thinking behind that was that there is really good statistical software out there. And the intent was not to try to reinvent that wheel within Distiller, just to make it analysis ready. So, the output that's coming out is ready to go into one of those packages.
Vanessa Vermette: Great. Thank you. And a few of you have mentioned now the reduction in data errors. Peter, maybe I'll ask you to expand on the point about reducing data errors through the use of DistillerSR. Is it really just the kind of the human type of cut and paste, or are those types of errors that are being reduced, or are there other types of errors that using the software could help reduce?
Peter O'Blenis: And I'll open it up to my colleagues here too. But I mean, there's a number of places where you can reduce errors. One is by asking wherever possible, asking closed-ended questions. So, if you have an Excel spreadsheet and people are typing into fields, as I think as was alluded to earlier, they can enter data in different formats. It's easy to make a just a key error and so on.
If you can create closed-ended questions by dropdowns or pickers and that kind of thing, you're going to get a much more consistent formatted result. You can also build into your forms error checking right on the fly. So, it makes it harder to submit errors in the first place. And, of course, just the sort of the data correlation aspects, again, if you're using Google Sheets or whatever to move data around and to merge data from different sources and so on is just very easy to accidentally move a cell over by one block or something like that.
And these are just sort of very manual. They're simplistic errors. They just happen by accident, fundamentally. So, there's errors at many, many different stages that having the automation tool that validates the data at the point of entry tries to help you get it structured so that it's all formatted the same way, and then does all of that correlation and so on and reporting without you even touching it. That's a lot of, sort of safety mechanisms in place to keep the errors from creeping in.
Derek Lord: Absolutely. Just to expand on that with-
Vanessa Vermette: Yeah. Go ahead.
Derek Lord: ... a really small example, there's the real time validation that you can put in there. You can configure a fair bit of how that works. So, for example, if you're collecting percentages for a number, you can make sure that they only enter something between 0 and 100. And if they type anything outside of that, it'll start throwing an error and try to catch that, so you don't have a missing decimal or something in the data that's really throwing things off that way.
Vanessa Vermette: Thanks, Derek. I saw you nodding, Alejandra. Did you want to chime in on the data error question?
Alejandra Jaramillo: Yeah, I think I just wanted to, I mean, both what Peter and Derek said, I mean, that's right on. It really helps us with that piece of data extraction. But I just wanted to also mention that for us, when you think about a review of the evidence, whether it's a full systematic review, a rapid review or a full living systematic review where you're doing it on a regular basis, each step of process, if you were doing this without Distiller, you would have to carry the studies from one step to the other, to the other.
We've actually, and it's not the best to admit sometimes there is. But it does happen, where we've actually included a bunch of studies, and then, when we got to the study characteristics, we actually forgot one study. So, it never got carried over because we were doing this outside of Distiller. So, then, just even basic things like that, just making sure when you're talking about hundreds of studies that you're including, a good example, prevalence of long COVID, we have over a hundred outcomes being reported and many different studies.
Just carrying that over from one step to the next, even in that little piece there, you can make mistakes and drop a study, which is not ideal. But with the Distiller, we're able to avoid that. So, I think it's the data extraction, for sure. But it's also even just carrying things from one step to the other, making sure that everything was screened twice to make sure that we're meeting the highest standards. All that, it really allows you to see.
So, for every time we do this, we actually run pilots in the team just to make sure that we're all kind of doing things consistently. And during the pilots, the reports that the Distiller gives us allows us to see who is actually kind of, generating most of the conflicts. And we are able to go back to that person and say, "Okay, what is it that is not clear? Is it the questions that we're asking in the forms? Is it the concept? Is it too clinical, and this is a new topic for you? Do we need to do some more training on what long COVID is about?"
So, it's also making sure that you really try to tackle those issues at the beginning so that when the team goes out and starts screening, they can do this and the level of conflicts is minimum because everybody has the same understanding of the concepts and the definitions we're applying and the topic we're tackling.
Alex Zuckerman: May I jump in with one more error source? So, one of the big things is also that we found out with a couple of our projects is we get to the data extraction phase. And we realize that what we thought was going to be one systematic review is actually going to be multiple systematic reviews because of the types of data that we're finding.
And so, in Distiller, you can go back, and you can change, for instance, on a form, you can change something from include to an exclude, and then essentially ask the system to reconcile that. And it will change it over so you're essentially re-filtering everything with the answers you already have, except now, a group of papers is included or excluded, or it goes into a different bucket so to say. And that will be a huge error source if you were trying to do that in an Excel spreadsheet and swap things around, whereas in Distiller, you click a few buttons, and it does it for you, and you know that it's following the answers you've already given. So there's no way there's going to be an error introduced there because you're literally just changing how it's going to be sorted, which has been really helpful.
Vanessa Vermette: Thank you. I think this is going to be our last question before we have to close off. But this is a bit of a governance question. So, as you guys are all very aware, the government's paying more attention to how departments are using automation and artificial intelligence tools. So, the treasury board has now in place a directive on automated decision making. There are tools for algorithmic impact assessment that they're asking us to use.
How does this use of automation and AI stack up against the assessment tools that are now mandated for government to be using? Is this something you guys had to look at when you were implementing this tool in your department, Alejandra and Alex?
Alejandra Jaramillo: So, that is actually a really good question. And I can tell you right now, we haven't really been able to find any tool that has been proposed that actually does the work that we need to get done. So, right now, we haven't had any issues with that. So, any other tools being proposed and AIs, this is the tool that we're using. I guess there's some competition as any tool has. But right now, we feel that this one for us, it really meets our requirements and is allowing us to deliver what is expected from us. So, nothing coming from the government, any tools, or there's no comparison there.
And right now, for other products outside, also, we feel this is the only one that truly gives what we need, especially the DAISY. I like the name by the way, Peter. DAISY is really good, which is the artificial intelligence tool. We use DAISY quite a bit. And that's something that we're not getting from any of the other tools, even there's some free software out there that is intended to also do some of what DistillerSR is doing, and for us, this tool has really met our requirements. So, no, Vanessa. We haven't ran into any issues. It's software as service. So, then, we don't have any issues with also kind of the IT piece and maintaining the software in-house. And that has also facilitated the approvals for us to get it for us.
Vanessa Vermette: Yeah. And you're also not collecting any information about individual people or making decisions that affect individual people. It's really at the population level with established research studies. And the auditability and record keeping in the system really, I think, helps to avoid that black box syndrome that people often criticize these algorithms for being.
So, I'm going to just maybe hand it over to Peter for some closing thoughts before we actually let everybody go. Thank you so much, Peter, for being here with us. And we just wanted to give you a last chance to say some parting words.
[Peter speaks silently.]
Oh, Peter, you're on mute.
Peter O'Blenis: I'd like to actually say parting words when I'm not on mute as well. I would just like to thank you for having us. Really thrilled to have the opportunity to present to this group and to thank Alejandra and Alex as well. I know that our team has worked closely with them. And it's been a really great relationship. So, I really appreciate that. And, yeah, if folks are interested in learning more about the software, just reach out. And we'd be very happy to give you a deeper dive.
Vanessa Vermette: Awesome. So, thank you to all of our panellists today and to all of the participants joining us online. We hope you enjoyed the final event of Data Demo. In case you missed earlier events, or you want to share this one with a colleague, we will post all of the recordings from Data Demo Week to our YouTube channel very soon. So, keep an eye out for that.
We also wanted to give a special thank you to our partners at Innovation, Science and Economic Development who helped us identify this great lineup of Canadian companies to include in this week's events and profile some of the ways that they're helping us do our work better and more efficiently. Watch your inboxes for an evaluation form. We really do want to hear what you think and use your feedback to shape future events because there will be more of these.
And we invite you all to sign up for the GC Data Community mailing list to keep up to date with events and resources, job opportunities for data PR practitioners and all of that good stuff. Also, don't forget to visit our website regularly and register for the next GC Data Community event, which is happening on November 1st. And it's called The Power of Linking Data. So, we'll see you next time. Thank you again to everybody and have a great rest of your day.
[The video chat fades to CSPS logo.]
[The Government of Canada logo appears and fades to black.]