Operating system use on top one million domains năm 2024
Back in the .com days of the internet, a company called Alexa started collecting statistics on the websites that users visited with a plugin people installed in their browser. As part of that collection, Alexa aggregated that data into a collection of the “Top 1 Million” sites on the internet — the most-requested domains by users of their plugin. They then gave that list away for use by the Internet community. Show Because it was available, and especially because it was free, the “Alexa Top Million” list became widely used across the internet. In the security world, being listed on the Alexa Top Million list was often used as a proxy for whether a domain should be considered “safe” by default. The reasoning behind that typically went something like this:
DomainTools has historically provided the Alexa rank of a domain in Iris as one indicator to help investigators make this sort of calculation themselves. The ConflictAlexa, now owned by Amazon, announced they are discontinuing the Alexa Top Million list as of 1 May, 2022. This leaves us with a bit of a quandary: do we keep using a “frozen”/outdated list? Do we switch to someone else’s list? Do we drop the Alexa ranking entirely? Or do we try to generate our own ranking of the top domains on the Internet? We went with the last option: generating our own. We recently acquired Farsight Security, whose DNSDB has a great deal of information about DNS requests, so we were confident we could build a good replacement. Of course, it’s never that easy. What Does “Top Million” Even Mean?When we started researching whether to generate our own list, we ran into a fundamental question: top million domains by what criteria? That question has a lot of answers, and each of them has an interesting bias:
With all of those options, which one to choose? The best answer from our point of view is “all of the above.” Enter TrancoWe are not the first people to be thinking about this problem. In 2019, a group of researchers looked into building lists of the top domains for research purposes as well as identifying problems for these sorts of lists (churn, mis-classifying a popular but malicious domain, etc). Their paper analyzed the overlap of the various “top domain” lists with each other and with Alexa, and concluded that a combination approach was the best suited to their purposes. We agree, and think it’s well-suited to ours as well. The approach they put forward uses the position of a domain on each list to generate a “score” for each domain, then takes the average of the scores from each list to generate the position of a domain in the final list. (In practice it’s a bit more complicated than that, but that’s the core idea.) The practical effect of this averaging is that domains that are missing from one or more lists will be pushed down in the final list, since they’ll get “0” votes from lists that don’t have the domain. Conversely, domains that are in all of the lists will be pushed up. This rewards domains that appear consistently across all the collection types, which we feel is a good thing — a domain that is ranked highly across multiple sampling methods is one that is likely legitimately popular. The researchers have put up a website that automatically does this combination of multiple lists, and in theory we could have just used their list. We chose not to, mostly because we wanted to control our own fate. We are going to be working with the Tranco team, but the actual list that appears in Iris will be generated by DomainTools internally. Homegrown PopularityHaving decided to build the list ourselves, the next question becomes: which data sources are we using? We already knew we wanted to use the Farsight Security dataset, and that we wanted to average that against multiple other datasets to try to address blind spots in that data, but what would we be averaging the Farsight Security dataset against? When making this decision, we wanted to get a mix of sampling methods to ensure we got a good cross-section of the different ways of looking at this problem. We also needed to consider the license terms for each data set to make sure we were allowed to use them. In the end, we chose 4 datasets to use for our “top” list:
We feel that this combination of lists is a good, broad, mix of sampling methods, and the Tranco averaging methodology gives us a good way to collect them together. Why Should You Care?By the end of Q2 this year, DomainTools will be shifting the rank scores presented in our API and Iris to use this newly-generated ranking. What does this mean to you, our customers? Practically, it means:
If you are using the Iris API, and are using the Alexa rank field in those queries, we recommend that you shift to the new “rank” field soon. Beyond that, we do not anticipate any other changes to the user experience. We have confidence that the data generated in this list will be fairly stable, and will be a transparent replacement for the Alexa Top Million list. Which OS is run on most servers?Overview of Linux and its popularity in the server world Linux is an open source kernel known for its versatility and power that makes it the industry standard for running web servers, applications, and other intensive workloads across a wide variety of Linux Distributions. Which is the highest used operating system?Microsoft's Windows is the most widely used computer operating system in the world, accounting for 68.15 percent share of the desktop, tablet, and console OS market in February 2024. What operating system can big companies use?Best Operating Systems for Enterprise Businesses. Windows 11.. Windows 10.. Red Hat Enterprise Linux.. Chrome OS.. Windows 7.. Ubuntu.. macOS Sierra.. Which operating system do most web servers in the world use?Linux has completely dominated the supercomputer field since 2017, with all of the top 500 most powerful supercomputers in the world running a Linux distribution. Linux is also most used for web servers, and the most common Linux distribution is Ubuntu, followed by Debian. |