Ask Me Anything! With Lai Yi Ohlsen and Dustin Loup

Ask Me Anything! With Lai Yi Ohlsen and Dustin Loup Banner Image

Jun 17, 2022


About Our Distinguished Guests

Lai Yi is a data scientist with Measurement Lab -- an opensource project committed to providing a transparent, verifiable measurement platform for global network performance. 

Dustin is an expert on internet governance and policy and program manager for the Marconi Society's National Broadband Mapping Coalition. Much of his work centers on improving digital inclusion and establishing transparent, open-source, and openly verifiable mapping methodologies and standards.

Event Transcript

Drew Clark: Good afternoon and welcome to Broadband.Money's Ask Me Anything. I'm Drew Clark, editor and publisher of Broadband Breakfast, and I am joined today with Lai Yi Ohlsen of Measurement Lab and Dustin Loup of the National Broadband Mapping Coalition. This is an exciting topic, one that is really near and dear to my heart. I've been working closely on different aspects of broadband mapping for 15 and more years and we're at an extremely important point at this time to be able to address broadband mapping and data and I wanted to make sure that we have this opportunity to get the perspective of some real experts on this topic. Lai Yi, let's turn to you first. Tell us a little bit about yourself and about Measurement Lab and what got you interested in this topic of broadband mapping and data.

Lai Yi Ohlsen: Yeah, definitely. Well, first, thanks for having us, been looking forward to this. Yeah, my background is in technical project management. I'm really interested in the ways that we communicate about technology and the ways that we can sort of talk past one another. And so I think I've, throughout my career, tried to... Try and understand how to bring together folks from different backgrounds, areas of expertise to try and talk about the same thing and solve problems together. With that background, it led me to working in the internet freedom world, so working on censorship circumvention and DDoS mitigation for media and human rights websites. Through that work, I became really exposed to sort of the open source world and the different challenges that software development takes in the open source world, and so open source ecosystem. And what drew me to Measurement Lab was thinking about, with that background, what could we do with data about the areas of the internet that we are trying to improve?

Lai Yi: A lot of the time it kind of felt like we were just shooting in the dark and hoping that something would make the improvements we wanted to see, but it was hard to be able to say that we had improved something without a measurement. And so that's what drew me to Measurement Lab's mission, was the idea that we could actually track our progress in the way that project managers often do and be able to see sort of actionable metrics come out of the way that our tools are interacting with the larger network, and through that, after coming to Measurement Lab, becoming really interested in the ways in which data can actually work as a tool for local communities and sort of give those communities numbers to back up what they already know to be true in their day-to-day experience and the way that that can plan to the way that our government and our policy works.

Lai Yi: So here I am, I've been director of M Lab for three years. I'm actually transitioning into a data scientist role because through the director work, I've become very invested in working with the data and creating visualizations and analyses with it. So I'm really looking forward to continuing to work with Measurement Lab in that new capacity. Oh, Drew you're muted.

Drew: Happens to every one of us sometimes. Thank you for that great, great background and description. We'll talk in more detail little bit about Measurement Lab's history and so forth as we get into our discussion, but we wanna make sure to hear from Dustin. Tell us a little bit about yourself, Dustin, your interest in this mapping and data topic and in particular, the formation of the National Broadband Mapping Coalition through the Marconi Society.

Dustin Loup: Yeah, thanks, Drew. Hey everyone, I'm Dustin Loup and I've been involved in the internet governance and policy space for some time now. I started off in the ICANN world and the fun alphabet soup of the domain name system and everything that comes with the governance around that, and then I found myself branching out into a broader scope of focus around internet governance through our work facilitating and sharing the Internet Governance Forum USA, where we focus on a wide range of issues, really try to drive the discussion forward, and through that I found myself consistently being drawn to the broadband access and digital equity discussions out of all of the topics that were available to me, in part because I really believe in the power of communities to make the right decisions and broadband is one of those issues that is very much local and has a very direct impact on the communities that are involved in this.

Dustin Loup: And so that has led me to get involved in a number of efforts and working with local communities and coalitions and some of their broadband planning efforts, and part of that is my work with the National Broadband Mapping Coalition, which is a program that was spearheaded by The Marconi Society, alongside with several other founding members, all of whom are dedicated to improving the ecosystem around broadband performance, measurements and the mapping of availability and adoption of those services. And The Marconi Society has a broader focus on digital equity in general with programs to support the digital inclusion workforce through a digital inclusion leadership certificate and their 2030 Digital Equity Project to drive conversations forward around many different issue areas impacting digital equity.

Dustin Loup: The National Broadband Mapping Coalition is my focus primarily and where a coalition that has come together of different researchers, advocates, practitioners, community leaders and others who all want to see better data, and in particular, open data and methodologies being used to better inform our understanding of where broadband is and where it isn't and where it is meeting communities' needs and where it isn't and ensuring that as communities plan and decisions are made at various levels around policies and funding, that those decisions are being made using open and transparent data, methodologies and processes.

Drew: Yeah, we're gonna get into that aspect of open data very soon, but say just another word or two about your experience with ICANN and The Arab Spring. Dustin, what about those kind of global governance efforts kind of got you interested in internet data and broadband data?

Dustin Loup: Yeah, so when I was going through school, that was around the time where we were seeing a lot of uprisings that were fueled by the openness and empowerment that the internet brought. And so that kind of shifted my interest as I was trying to find my path toward the internet because of its power to enable communities to really take things into their own hands and make their circumstances around them better.

Dustin Loup: So in that process I found myself looking into the world of internet governance because it has this unique multi-stakeholder model in which the decisions are made and are driven through a consensus-making process in which different stakeholders are, in theory, have a even playing field and input into decision-making processes on those... The infrastructure that they rely on. That being said, the DNS is a very important piece of the internet infrastructure, but it is a very abstract one and it gave me a deep appreciation for the role of these communities and the role of these processes, but it also really reinforced my desire to work more directly with communities and a little bit closer to that direct impact. So that's how I shifted from more of the focus on the DNS infrastructure to the broadband infrastructure.

Drew: Wonderful. Well, let me dive in with the foundational question here for both of you. Why is it important to measure broadband performance? And what are some of the most important metrics to measure? We often think about speeds, download speeds, upload speeds. What does the speed mean anyway? What is the difference, for example, between a speed and throughput or capacity? What is latency and why is that important? What is jitter and why is that important? What other metrics should we be measuring? So let me toss this question to both of you for your brief answer on why and what should we be measuring.

Lai Yi: In terms of the importance, I think digital inclusion advocates are always very good about talking about... It's the way to participate in the world that we have today and if you can't have performant internet, then even if you have access to it, it's essentially the same as not having access. And so it's important to measure such that we can understand the individual's ability to participate in basic activities such as paying bills, having access to information about their democracy, things like this. So I think the importance is really just to make sure that the service that we agree or that digital inclusion advocates agree is important to participate in society, everyone has the sort of equal ability to use. In terms of what to measure, I will just name this as an open question that internet measurement experts are currently and consistently asking.

Lai Yi: You mentioned speed, I think it's become actually quite... We're moving towards a consensus in the internet measurement space that that is not the only metric to be measuring and that it even can... There might have even been an effective over-optimizing for that. And I wanna point out too that this is not specific to the internet measurement world. It's very easy for groups of experts to get sort of hung up on one measurement, one data point, because it is just...

Lai Yi: It's easy to point to. I think body mass index might be a good example, or even GDP, some might argue, is not necessarily the best way to measure how our economy is or how people are able to participate in the economy. There's a lot of examples of this. So just to say it's not just us, and yeah, it has been something that we have fallen, I don't wanna say victim to, that sounds extreme, but speed is not the only metric and yet it has become... Our conversations around broadband performance have become very speed-centric. So that's something to point to, is that while it is an important metric, it is not the only one, and it's also not super well-defined. Part of what Dustin and I talked about at Mountain Connect in a panel that we had there was that there's actually multiple ways to define speed, or I should say multiple ways to measure speed that sort of rely on different definitions. And I am of the thought and maybe some others will disagree, but I'm of the thought that that's not a bad thing as long as you're clear about what it is that you're measuring or what data that you're using.

Lai Yi: To me I think it's, like I use the analogy, you go to get your vision tested and you do several different tests, right? They're all measuring something different, but they're all asking the same question, can you see well? And so...

Drew: Can you give us a quick snapshot of the different speed ways, like what are two or three different ones and how do they do things differently?

Lai Yi: Yeah, happy too, although we could be here for hours, so I'll try...

Drew: No, that's why I say brief. That's why I say brief.

Lai Yi: I'll give the... So two of the most well-known measurement services are, one that M Lab provides which is called Network Diagnostic Tool, it's typically accessed through the Google search, but we have other integrations, namely local communities that integrate it to do data collection campaigns, and then as well Ookla There's many differences between the two, but one, just to kinda illustrate an example of the difference, is that our servers are actually placed in different parts of the network. And I would describe this as just differences in what you're measuring, I think they're both valid things to be measuring. But Ookla's servers often are in the access network, meaning in the access that you or I are connected to as our ISP and Measurement Labs are placed in the off-net locations where ISPs peer with one another.

Lai Yi: So we're just getting different vantage points of the network. In my ideal world, these are used in a complementary way and just telling us sort of different signals of internet performance. But that's one example of how we both measure speed, but we're actually measuring something quite different when you think about those differences in network topology. So all to say, I think there's multiple metrics that are indicative of user or broadband performance, and again, I think we can use all of them, it's just about being clear about what they mean and how they fit together.

Drew: Thank you. That's really, really crucial to get at. And on the point of different ways to do a vision test, I love that analogy, Lai Yi. What about these other elements I flagged, like latency, jitter? Are there other things we should be looking at that are completely different from speed everyone recognizes?

Lai Yi: Yeah, yeah, yeah, latency is a big one that we've been talking about a lot, we being the internet measurement world. I think latency under load is actually something that people are pushing for more measurement of, considering that it takes things like buffer bloat into consideration. So just this idea...

Drew: And sorry, I've gotta stop you. What is buffer bloat?

Lai Yi: Yeah. Well, I wish I could channel Dave Tot into the call right now 'cause he would give a more robust explanation. I'll just go ahead and recommend It's a project that he runs and it can share a lot more than I can right now. But all to say, it's just talking about the measurements of the lag of the queue. So when packets are being sent, they're all lining up and buffer bloat in latency and latency under load are just different ways about talking about that line getting too long and affecting the way that we interact with the internet. So we've all experienced video lag, that is often what's behind it.

Drew: Got it.

Lai Yi: So those types of metrics, things that sort of are more representative of how we interact with applications, so video conferencing, gaming, jitter is a very important metric for both of those things. And over the pandemic, obviously, we all became very aware of when we were not able to video conference. And so there's instances where you can have a great speed, depending on the definition of speed you're using, but still having a horrible time video conferencing. And so to your question of, well why is it important to measure more metrics, I think it's because of exactly that. If the whole point of measuring performance is to be able to kind of speak to your experience, if the speed measurement isn't doing it, then we need to look elsewhere.

Drew: Dustin, what would you add to this discussion Lai Yi's laid out so well?

Dustin Loup: Yeah, agreeing with everything that Lai Yi said, I would add a few things. I think going back to your original question of why it's important to measure broadband performance and how that impacts what we measure, I think it's kind of important to level set here and say that the individuals, communities, policy makers that are undertaking these measurement efforts are doing so for a variety of reasons. A lot of focus is on the FCC mapping process and determining whether or not things are available when the data, the federal data says that they are. And I'm sure we will get into that later, but there's also a desire to better understand the local needs and gaps based on the performance that people are seeing in different communities and whether or not that's actually meeting their needs for what they want to use it for. And as Lai Yi mentioned, the over-reliance on speed as a metric has perhaps skewed deployment decisions, funding decisions in a way that maybe doesn't quite depict the way that an ideal network would look and operate to meet the purposes that individuals need it for, like accessing telehealth and education.

Dustin Loup: And this over-reliance of the metric of speed and the speed on a particular network might lead to services that are hosted on like streaming platforms, social media platforms, that performing relatively well and then that data might skew our perception of what people are using or would like to use the internet for. It's like if I have a toxic water in my city, then I'm not going to drink it, and if the stats say that, "Well, nobody in the city is drinking the water, so we don't need to purify it and make it healthy to drink because nobody's drinking it." Well, it's kind of missing the point, right?

Drew: Right, right.

Dustin Loup: And so I think that's an important thing to keep in mind as we have this conversation. And so the data is useful to identify those needs and those gaps, determine potential solutions and then pursue the funding and support needed for those solutions. And that later step is kind of where the FCC process comes into play. And one other thing that I would like to add to this that might not be entirely captured by the Measurement Lab test, Ookla test or other tests, is reliability. I don't think that gets enough of a focus in the conversation, in our stories...

Drew: How do we... Is that latency? Is that like jitter? Or is it something that combines a variety of measures?

Dustin Loup: I would say, to kind of simplify it, the up-time, the amount of time in which somebody can actually connect to the internet. So there are accounts of 45-day internet outages...

Drew: Oh yeah, those stories from Detroit are just incredible. We heard about those at Mountain Connect, just... You're absolutely right that if we can't... If you can't get access to the internet for 45 days, Joshua Edmonds made this point over and over again, that if we can't get access, then how can you even say that's there, right?

Dustin Loup: Right, exactly.

Lai Yi: Just to say, that household might have gotten a speed test of an acceptable amount at some point and that data point could then be used to say, "Oh, it's served," but is that really served if you can't access it for 45 days?

Drew: Something we've kind of implied, but let's just make it explicit, it is important to link this data about a broadband network performance to a physical geography, isn't it, right? Like what we do when we take a speed test, I may just pull up and do a speed test on Google or on Ookla's and I'll see some measurement, but that's like a discreet data point. When we're linking it to a geography, what do we get from that? And let me actually piggyback on a question by Scott Woods, Vice President of Community Engagement at Broadband.Money, he says, "Can you provide an overview of what and how M-Lab measures and how the result aids our understanding of broadband access and availability? So let's start with Lai Yi and also get Dustin's perspective on linking performance to geography.

Lai Yi: Yeah, I mean it's... From a zoomed-out perspective, M-Lab was founded back in 2008 as a research consortium looking at... Wanting to be able to look at longitudinal data, so data points over time, being able to see how things change on a large scale. And so I think that's more or less what we've accomplished with something like nine billion rows of NDT data alone. And that said though, it is... A lot of what we talk about in terms of the differences between tests is, there's the question of what do we measure, but then also how do we collect it, how do we measure it? And the way that M-Lab has amassed all of those data points has been through crowd-sourced testing. So with that, we get a really large amount of test points through a lot of really different geographies, it's not limited, and our platform is able to handle massive amounts of tests. And so with that, like I said, with crowdsource, you get all of it.

Lai Yi: You just get a ton of people testing for whatever reason, whatever time, from whatever region. That said, you don't... A way to get a more specific sample set would be to go into an area and be very... Either put devices or do a sort of campaign to get results from a specific area. And so the crowdsource I see is kind of a compliment to that methodology for collecting data. So you're not just...

Lai Yi: You're not specifying what kind of user takes the test, you're just saying anyone and everyone, whereas with the more sort of spelled-out scientific campaign, you'd be getting specific results from the users in specific areas. So to answer the question of geography or to that point, we just sort of get as much as we can from wherever we can, and I think that can provide a really good signal for where you need to look more. And that I think is when data collection campaigns where you're working with local partners, community anchor institutions and maybe even going door-to-door, that kind of thing, then you can get even more specific data points. But the crowdsourced way that M-Lab has amassed all of this data, I think can give you a really good sort of cursory glance and also tell you where you need more information.

Drew: Amanda Lee asked a question, let me just get this one in here too, Lai Yi, "Is there a way for communities to help out with providing active accurate measurements to get to this point? What do you try to do to encourage communities? And is that what you want to do? Does that skew the results," so to speak?

Lai Yi: No, I think this is... Yeah, what I'm thinking about was complementary approaches, so the crowdsource, like I said, there's just such a low barrier of entry to taking that kind of test, like Googling how fast is my internet or speed test and taking one of Ookla's. It's such an easy way to collect data that I think it provides, again, with the massive amounts of measurements that you can get from something like that, it really provides a good starting point and sort of high-level representation of what's going on. And then from there, you can use something like working with communities to complement that data. So for example, if in M-Lab you're only getting results from a specific part of the county, then you know that you need to go into specific census blocks a little bit deeper, maybe do a campaign with the local library there, maybe work with the schools there and supplement the holes in the crowdsource data. And I think Dustin might have more to say about this too.

Drew: Please Dustin.

Dustin Loup: Yeah, just to build on that, the role of the community is pretty critical here, as Lai Yi was mentioning. At the zoomed-out aggregate level, it's hard to know with precision where the test came from, where it was initiated, the purpose for taking the test. I'm taking this Zoom call from the basement and I may have taken a speed test in order to make sure that I had adequate bandwidth considering that it was traveling through a router to the next floor. And so that can be difficult to filter out at that zoomed-out level. So in addition to the community being able to get the more granular address-level data through specifically designed campaigns to run speed tests, we're seeing a lot of communities and states include these speed test surveys as part of their data collection and planning process and it also provides them with an opportunity to provide a set of guidance for those taking the test to ensure that the data they're getting is more reliable.

Dustin Loup: So for example, I have seen the Missouri... University of Missouri's extension office has a speed test survey, in that they have directions which include: "While you take this test, please make sure that you're not streaming other services on your device through the same access point or router," to make sure that there aren't other factors. And then you can also suggest things like, please, if you have the ability to do so, connect by ethernet, and if you don't, make sure that you are close to the router as possible, because the purpose of this test is to gather data on the performance that people are getting with their internet connection.

Dustin Loup: And the other thing that can be included in this, which the broad crowdsourced M-Lab data set cannot tell you is who does not have internet access. And so these surveys often include... I mean, granted, they still have to have a device or a place to access it or to know where to find the paper form, or I've seen some communities implement a text message survey as well to indicate that they don't have any service at all. So there are a lot of ways in which those gaps in the broad aggregate data can be filled in with a little bit more clarity by a well-tailored community-led broadband mapping...

Drew: Campaign. On our way to talking about the FCC and the NTIA and broadband maps, I wanna address the skepticism, criticism, so to speak, that we hear a lot. It's well-expressed by Rick Zimmerman in the comments here. He says that "It's well known, at least should be, that consumer speed tests say nothing about broadband availability, yet we often see reporters and others conflate the two. What can be done to help the general public understand that internet speed test offers very little useful information as to what speeds are available in a particular area or a particular household?" Now, I don't agree with that question. However, I wanna ask it, and the point that is maybe worth pulling out a little bit more specifically is that oftentimes routers are a weak link in the process. And Dustin, you were kind of getting at this and saying, "Oh yeah, take this test close with an ethernet cable as opposed to Wi-Fi," because that kind of degrades the performance.

Drew: So if I'm gonna test Verizon's speed test capability or its speed capability, I may not be getting a true reflection of that if I'm taking it 200 feet from the router. So let's just address this question head-on. Rick says consumer speed tests saying nothing about broadband availability. Do you agree with that or disagree with that? Why or why not? Lai Yi and then Dustin.

Lai Yi: I think it goes back to the example of where. So one, I agree with the premise or the question in one way in that you cannot take a speed test if you do not have access. So in that way, we're not capturing data about that specific understanding of availability. But that said, if you don't have the performance that you need to sort of reasonably access the tools that you do on the Internet, then I would argue that that is not available in an instance of the reliability as well, even if you got that performance, say once for a month or so, but then you didn't for a month or longer, then I would argue also that that is not availability. So I think it kind of depends on your notion of what availability entails, but I would say that it includes notions of performance.

Lai Yi: But I do want to recognize that we do not collect data about when you cannot take the test and I think that is an important note for journalists, etcetera, to consider when they're using this data. And I have also seen that be conflated. And in terms of what we can do about it, I think... And this might be simplistic, but I do think defining our terms as much as possible is a start.

Drew: Do you wanna add anything on the question of routers, Dustin?

Dustin Loup: Yeah, so just to add on to that, I would disagree with the question in as much as it states that they say nothing about the speeds that are available. As we've discussed and addressed already, there are certainly limitations to what we can learn from speed test, especially in the aggregate, but they are helpful in identifying trends and well-tailored campaigns can certainly mitigate some of those limitations that we are seeing. So there are also steps that can be taken, and I won't say too much about this because we're in a pilot phase, but there are also efforts to hardwire devices that can more reliably over time capture data from a single connection so that we get more reliable data that addresses several of those biases that might impact the perception of how effective speed tests are. And one thing that I would also like to build on in terms of addressing this question is that current FCC maps also tell us very little about broadband availability and using these tools to empower communities to push back on something that is, I would say more widely recognized that there are flaws in Form 477 data that have informed the way that broadband has been deployed and funded for quite some time now. Equipping communities with the tools they need to push back on that, even if there are flaws in that or limitations to it, is an important thing for us to continue to do.

Drew: Well absolutely, and the FCC is central in this dilemma. Let me just give a 60-second summary of the history of broadband mapping, and again, I've been following this issue closely. In fact, Broadband Census, which is the sister company that bequeathed Broadband Breakfast, started off with an NDT speed test that we could take a speed test in an effort to crowdsource broadband data, and of course, this led in a way through many people's efforts, the Broadband Census Act for America that Ed Markey, then Representative, now Senator Ed Markey proposed, basically required the inclusion of broadband availability by carrier in each census block and that led to the very first broadband map that the Federal Communications Commission and the NTIA produced together in 2011 through 2015. The fundamental problem, as I see it to this, is that it basically had this presumption that if one person in a census block was served, the whole census block was served.

Drew: And this has created an over-exaggeration of who's covered in broadband that I think has just kind of quashed the reliability of these broadband maps. And of course, they were dependent upon this form that you mentioned, Dustin, the Form 477 of the FCC. The other problem is that these speeds that have been, again, I know speed's a limited proxy or maybe not even the best proxy, but the speeds that the map, that the FCC put together were reflected were simply whatever the carrier says, right? We say we serve 1,000 megabits per second or 100 megabits per second and the truth is that the performance was never at that level. So I want both of you to weigh in on like, why did the FCC and/or the NTIA, to the extent that they were linked, why did they mess up so badly with broadband maps? And what can we do to make sure that the round two, the version two of broadband mapping doesn't suffer from the same problems that the round one did? Let's again hear from Lai Yi first and then Dustin.

Lai Yi: Goodness. The why is a great question. I don't think I have an answer for it, but I will say that I think it could use... The more open data, the better. And I think you had spoken to this a bit about just asking why is open data important, I think it's exactly for instances like this where we all need to be looking at the same numbers and not only the same numbers, but an understanding of where those numbers came from and how they were calculated. And to this kind of theme I've been bringing up of defining our terms, a number only means something if I'm able to say it correlates to this definition and this sort of equation for how it was calculated.

Lai Yi: And so I think in terms of how to get it better next time, I think it's about exposing as much as possible, being as transparent as possible about the methodologies that the datasets are using. And I would also push again for a sort of complementary, coordinated effort where it's not sort of choosing one dataset or choosing one metric and saying, "This is the one, this is the one that we're going to look at and to use and uphold and anything that contradicts this is wrong," but rather to say, "Okay, what are the different signals that these different datasets are giving us? What are the different ways that we can interpret all of the different data points that we have in this region?"

Lai Yi: You can imagine if three different methodologies are saying the same thing, that's a really strong case for that being the reality in that area. But if one is telling us something different, instead of the reaction being, "Well, that's an invalid methodology" or "It's wrong for X, Y, Z," just asking maybe we don't have the whole picture and seeing what more we need to look into, again, maybe not upholding it as end-all be-all, but just saying what more is there to the story and how can we ask better questions to get a better sense, especially I would say, and this is bringing in the communities, especially if the communities themselves are saying, "This isn't the complete story."

Lai Yi: If the communities themselves are saying, "We need more data here, the data that you have is not representative," then asking what other datasets, what other methodologies can kind of fill in the gaps that clearly are there?

Drew: This is not the time to kind of go through and detail the tools that Broadband.Money has available, but we were chatting about that a little before we began Lai Yi, that obviously Broadband Money is assembling maps with data of where particular speed results are and it includes data that Measurement Lab makes available. And you obviously... You're very modest here. I mean, you are the gold standard in open data, Measurement Lab, because you've been collecting this for so long, because the data is open, and is it immediately available, is there any delay on the data, and what other datasets are out there, like you mentioned, if you can get three sources of truth that's saying something like, is Ookla data available as well? Are there other sources of data as well? Lai Yi and then Dustin.

Lai Yi: Yeah, I wanna be careful and I know the most about NDT data and most about M-Lab, so I don't... If I say anything wrong, feel free to correct me about other datasets. But I would give a shoutout to the indicators of Broadband Need Map that the NTIA has created in the past year, I believe, and it includes Ookla's data, M-Lab data and Microsoft as well, as well as the 477. And so that's a good example of what I'm talking about where you can kind of just see what they all tell you, and if they're contradicting one another, then being able to ask the question, "Well, what does that tell us?"

Lai Yi: And so I would recommend that as a way to just kind of get a sense of what's out there. And in terms of what other data is available, Ookla does provide... For good dataset, I myself haven't looked too much into it, but as I understand it, it's aggregated, please correct me if I'm wrong, what NDT, or sorry, what M-Lab does with NDT data is it publishes all of it and I mean all of it. And so what happens is that it goes into a Google cloud storage archive and you can go through the sort of raw data there and then it gets annotated and parsed into a tool called BigQuery where it's free to access, you just have to sign up for an account. So in terms of delay, I think we say about... I would say 24 to 48 hours to go completely through the pipeline...

Drew: Oh, my gosh. Nothing like six months. [laughter]

Lai Yi: Yeah in comparison, relatively short, smaller. But yeah, and so you're able to go through each individual dataset, or sorry, test result, and I think that affords a little bit more nuance in terms of how you break out by, say the times that the tests were taken, the carrier, meaning the ISP. And I believe, I could be wrong again, but I believe all of this is available on Ookla's commercial dataset for a fee.

Drew: So the time, the carrier and the location, right? What level, address level, census block, county?

Lai Yi: Yeah, that's a really good question and it's actually, the answer is around city or county level, is what we recommend in terms of, depending on for accuracy. So by nature of being a public dataset, we only collect the IP address from users' test results so that... We don't want any PII. We don't wanna know who you are, we just wanna get your test results. And so we only use the IP address to geo-locate and that has a very limited accuracy. So that is also where it's imperative that we have these other supplemental data campaigns so that we can... Or data collection efforts, because a lot of those will have, whether it's through actually collecting the user's address, whether it's through HTML5 geo-location or mobile GPS, there's a variety of ways that you can get more specific about geo-location than what M-Lab offers.

Lai Yi: So like I said at the top, I think that's why it's important to just understand what you're working with. It can give you a really good sense because it has often overwhelming amount of test results from a given region, but you're not gonna go down to that census block level, which the FCC has kind of made the norm or sort of the standard in terms of the mapping. And I believe Ookla goes down... Uses mobile GPS technology within their commercial dataset.

Drew: Dustin, I'm not sure we've given you the opportunity to just explain who are some of the other partners that are part of the National Broadband Mapping Coalition, right? I mean, obviously M-Lab is one of them. Who are some of these others? And what do you want the National Broadband Mapping Coalition to be able to do and speak to, particularly as we've just started to talk about this open data, the importance of the open datasets for being able to truth check particular claims?

Dustin Loup: Yeah, so I'm gonna weave this answer in with the previous one around the FCC mapping and I'll focus more, hopefully, productively on how to fix it rather than why it went wrong.

Drew: Why it got screwed up, yeah.

Dustin Loup: So the coalition includes different buckets of members. There's a group that's really focused on the research around networks and the particulars of the measurements like we've talked about a lot today. We also have groups that are focused on broadband and digital equity policy and support at the federal level. We have groups such as New America Public Knowledge, Next Century Cities, The Internet Society, Institute for Local Self-Reliance, folks like that. We've also increasingly been reaching out to local community leaders to join these discussions as well, both to have an opportunity to ask questions and learn but also to help inform the way that we are approaching our work, and then there are also a variety of others that include other similar regional or local coalitions, implementers and practitioners that might be helping communities directly with their data collection efforts.

Dustin Loup: And so where we aim to take this cross-section of expertise to do a number of things, one is to provide resources and educate communities on how they can carry out the data collection and mapping efforts in the best way possible given the current set of tools, available resources and necessary timelines.

Dustin Loup: We of course have a vision for what an ideal world looks like in terms of having all of the data and all of these networks but want to help pave the way given the parameters that the communities are facing but then also taking these conversations that we have as a coalition and the experiences of local communities and using it to inform policy and decision-making at a higher level on a broader scale. So part of that ties into the FCC mapping process and ways in which the new broadband data collection program will improve upon the Form 477 that we're currently relying on. Of course, it will address the geographic granularity issue by focusing on the location level rather than a census block, but you had also alluded to speeds being based on whatever provider or the filer claimed them to be or advertised them as locally.

Dustin Loup: And so part of this broadband data collection process and the legislation that established it requires the FCC to establish user-friendly, that's easier said than done, but a user-friendly challenge process that individuals, third parties, communities and states and others can use to challenge the data within those maps. And I think that's one area where we'd like to see a lot of improvement, more accountability and verification around the data that's being reported and being used to guide policy and funding decisions. And we currently have some concerns around a lack of clarity around how the challenge process will be carried out. It's been this...

Drew: And let me just... I wanna hear this, I definitely do, but let's just introduce one more element into the discussion which is this broadband fabric. Okay? And so about two, three years ago, there began a lot of attention, again, around broadband mapping, obviously, there was that long period where there was a National Broadband Map and it basically got stale after 2015, it wasn't actively collected by anyone with an interest [chuckle] in collecting broadband data and the data just began to get worse and worse and worse. And then again, there was a lot of discussion leading up to the passage of the Broadband Data Act in 2020 and the FCC in particular said, "We want to go with this thing called the broadband fabric."

Drew: They selected a contractor who's been hired to build this fabric. And as I understand it, a fabric is basically like a guide to every address or every serviceable address in the country. So there's a fabric and now that fabric has been turned over to the FCC and they're about to kick off... Just yesterday or two days ago, the FCC issued their public notice about webinars on using the broadband data collection system. And so now, June 30th, the providers are gonna start to feed in data into this new system and then we're gonna have some challenges. So with that background of fabric, and then after fabric some kind of availability map, what are the challenges that we're going to see, Dustin? What do we know about the challenges that we're gonna see?

Dustin Loup: Yeah, so it's worth noting here too, and thank you for describing the fabric, it's worth noting that the challenges can be based on the fabric itself, which identifies all of the serviceable locations. And so whether or not a location is serviceable or whether there's a serviceable location missing is something that can be challenged in this process. So that's worth setting aside for a second. But then there's also the challenge around the data that's being reported around the available broadband service at each serviceable location. And this is an opportunity to provide more verification and accountability in reporting around those speeds that are being advertised. But one of our concerns is around the lack of clarity around how that process will be carried out. So what we do know is that these maps will be made publicly available, hopefully later this year, and that will open up a public challenge process through which individuals, communities, states, third parties, tribal governments can challenge the accuracy of the maps and that has a process which notifies the provider of the challenge, gives them 60 days to respond.

Dustin Loup: Once they've responded, they have 60 days to resolve it with the challenger. If it's not resolved by then, then the FCC is tasked with reviewing the challenge and the data from the provider and making a determination around whether or not that challenge will be successful and whether the, I guess authoritative maps will need to be updated or not. It's that last step that... Well, I guess the first step as well in the sense that we would like to see more clarity on and we're in the process of drafting up a letter to give this clarity. But as the FCC thinks through this process for how they will evaluate challenges that reach the level of FCC review and what inputs will be accepted, we've yet to see that clearly defined. So what I mean by that is, in the challenge process will the FCC accept speed test data as a valid supporting documentation for a challenge?

Drew: Why would they not? Or how could they not accept that as a valid supporting data for a challenge?

Dustin Loup: Well, we don't think that that should be excluded but we just want...

Drew: You wanna be clear on it.

Dustin Loup: Well, obviously, some have expressed in this chat alone that there's a belief that these are widely recognized as things that don't capture the delivered speed, so there is a push for these to not be considered valid evidence in a challenge process. So we wanna see confirmed clarity that will be allowed, and if so, more information around the tests and the methodologies that will need to be employed for that data to be accepted, and then assuming that it gets through the point where the FCC is reviewing it and making a determination on whether or not the challenge is successful, what does that review process look like? How are different methodologies being laid against each other? How is the determination ultimately made so that there is openness and transparency around both the data of the challenger, the data of the provider and the process through which the decision is made to determine whether or not a particular challenge is successful or not? And that is critical because communities right now are collecting this data and each week we're seeing new communities launch new data collection efforts...

Drew: There's a lot of data being collected, but how is it gonna be synchronized? Sarah, for example, asked this question noting that the Federal Communications Commission is asking consumers to download a speed test app and take tests, and meanwhile it's going to the providers and saying, "Hey, you've gotta submit your data." But how is it gonna be synchronized? What database is it gonna sit in? What is the FCC gonna do to see, "Oh yeah, this is what M-Lab and Ookla and other Microsoft tests show. This is what the carriers say"? Not so much how is it gonna happen, but what would you like to happen? Let's hear from Lai Yi and Dustin. What's the ideal for synchronizing these conflicting claims that are going to be inevitable when it comes to measuring speeds and reliability and performance?

Lai Yi: Yeah, I think the ideal just goes back to, and I feel like I'm just beating a horse, but the complementary coordinated notion that all of these datasets tell us something and about being able to agree that... Think of ways in which we can provide different signals and follow those as indications of maybe we need to look here more and maybe we need to look there more. Maybe there's just more to the story. I guess the short answer is use them all, I think is my ideal. But the further ideal then down the road I think is, and this is something that M-Lab is working on leading, is coming up with a standard for what metrics should be collected that is sort of upheld by the FCC but also providers, also other measurement groups and something that we're all sort of able to sign on to and agree is a good idea in terms of the standard.

Lai Yi: I think that's too far down the road for maybe this mapping process or this challenge process, but I think that is the ideal. And I think that ideal, again, would not only include different standards for metrics but also data collection efforts, so crowdsource and targeted samples. And then furthermore, and this is something that some of these questions got into but we didn't get to, but standards for analyses as well. Are we using mean? Are we using median? Are we using distributions? These are all different... They have different effects on the storytelling that happens with the data. And so to be, again, not sort of editorial about which ones we use and not discrediting certain methods, but being clear and transparent about the effects they have on the outputs of the data. You can take the mean of a certain area and it looks... It's a far lower or higher number than, say the median and what is that, how do we... What else can we do to get more of the story?

Drew: Well, we got a lot of dialogue back and forth in the chat, you're welcome to see it all. Rick Zimmerman does point out the FCC speed test is for mobile only, not for wire line. Is there anything that either of you would like to say briefly? We're gonna wrap up soon, but I wanna get some more points in about the wired versus wireless speed test element, 'cause we haven't really talked about that at all. Is that important? Is that going to be important as we get to measuring broadband performance and speed, Lai Yi and Dustin?

Lai Yi: Yeah, the short answer is yes, and I think a big part of what the current frameworks don't capture, NDT included, is whether or not it's the Wi-Fi network or the router that is the bottleneck and that has a huge impact on how we think about addressing some of these issues. So is the answer for users to have access to better hardware and/or is the answer improvements in the access network and/or improvements in the interconnection point? And so I think being able to be clear about those bottlenecks and where they lie is a big part of what ideally we would be able to measure, which has probably been pointed out, only measuring from one kind of device is sort of limiting us from doing.

Drew: There is this repeated thread, Jim Partridge raises it about the NDT, Network Diagnostic Tool, he's saying it's not a... It's a diagnostic tool and not a speed test. I just wanna give you an opportunity to address this point that he's made repeatedly here in the comments, Lai Yi and Dustin.

Lai Yi: Yeah, I think it's a fair point in terms of naming. I would say though that to discredit it as a data source is the wrong direction. We have a lot of the data. There's a lot to be looked at over the years. And in terms of how it should be used, I think it should be considered another signal, the way that every other dataset is. And the last thing I would say is, the fact that it is open and available for free is not trivial and is not something that, as you said, is available elsewhere. And so I think that's important. But can we be more clear about the recommended analyses and the ways in which it's used? Absolutely. And I think he's been... He's brought that before and I agree.

Drew: And just to throw one more element into this puzzle, we've talked about the NTIA and their role and reliance on the FCC maps. The NTIA has their own separate map called the National Broadband Availability Map, which we haven't talked a lot about, but it has a lot of tools. It's not an open map, it's not publicly available, but it is a resource for many state broadband officials. Meanwhile, multiple states are developing their own broadband map. So in some ways we're gonna have like Mapzilla. And so Dustin, I know you've made the point that you didn't wanna create another map, you wanted to help guide people through what to pay attention to. And so let's give you an opportunity to address that and anything else that you'd like to address as we kind of wrap up our conversation here in the next few minutes and then we'll get some final thoughts from both of you. So Dustin, could you address this point about multiple places and sources of maps?

Dustin Loup: Yeah, absolutely. So you mentioned the state maps and that's a place where we're seeing a lot of variety in terms of methods that... Which is a pretty easy example of different, sometimes incompatible data sets that we might have to navigate through as we try to figure out how we analyze these things. So we're seeing states use speed tests, some are using Ookla, some are using M-Lab. We're also seeing states work directly with ISPs to collect data directly from them, often made available in an aggregate format in which some of the private details are kept for proprietary reasons. Some are relying on federal data and then maybe adding their own layers on top of it. So in terms of how to treat those state maps, it really depends on the way that the individual state is carrying that out. I would recommend looking at some research that the Institute of Local Self-Reliance did to kind of break this down, called the United States of Broadband. And then on the... I haven't had much of a chance to look at the National Broadband Availability Map, so I won't speak to that.

Dustin Loup: But in terms of other federal maps, 'cause I know I saw that somewhere in the question, there are maps around different funding programs as well through the USDA and through the FCC where projects have already been funded and they may not reflect in the availability because they haven't actually been built out yet. So looking at those areas where there might be funding through the USDA or through FCC's RDOF program, where that money has already been allocated... And I will say be careful with the RDOF maps and how you treat those, because there are a lot of nuances in there that we don't have time to get into. The one that I will say to pay attention to is that, with respect to the BEAD funding, that satellite...

Dustin Loup: Broadband is not considered served for the purposes of determining funding through the BEAD program and a lot of RDOF funding did go to Starlink. And there have also been a number of RDOF bids that had defaulted. So be careful of that dataset. Nevertheless, it's important to keep in mind that funding has already been allocated and that may not be reflected on speed tests or local data collection efforts or in the federal data sets.

Drew: Well, it's been a pleasure to have both of you on to talk about broadband mapping and data, which is a very, very central element. I wanna give Lai Yi a chance to offer some closing thoughts about what we should be paying attention for and really, I guess what's your vision for the best case scenario for how open data helps in this rollout of the infrastructure investment and JOBS Act funding?

Lai Yi: Yeah, the vision is that it's working for all. Because I think something that we also didn't touch on is that we are providing open data as a sort of third party, but providers can also provide this data as well. Whether or not they will is a question, but I think if there's a way in which we can all agree that the more data that we have about how our networks are performing, the better, I think that it's a starting point. In terms of ideals too, I think the need here is cooperation and coordination as we are handling... I think to the question of what happened with the FCC, I wanna at least point out that these are incredibly complex topics, we're dealing with one of the most fast-moving ephemeral systems that we've ever created in terms of the internet and it's not something that's trivial to just do.

Lai Yi: And so I wanna, by saying that, point out that we all need to have the sort of approach where we're prioritizing, being very clear with our definitions, being very clear about our methodologies, but also cooperative in a way that is not trying to whittle down to one singular primary data source, but rather understand what we can do with everything that's at our disposal and work together in that way. And so I think the ideal is one where organizations from various incentives and various industries are able to coordinate on the understanding that we need more data and we need to work together to provide it.

Drew: Well, it's been a pleasure to be with both of you for this hour plus seven minutes of bonus time. Don't miss our Ask Me Anything next Friday at 2:30 PM with Shirley Bloomfield, the Head of the NTCA or Rural Broadband Association. On behalf of our wonderful guest, Dustin Loup and Lai Yi Ohlsen, I'm Drew Clark at Broadband Breakfast. See you next week. Take care.