# 6 Data and Statistics

### Learning Objectives

• Learn the key statistical agencies and publications
• Become familiar with other sources of government statistical data
• Become aware of the most important statistical databases and meta sites
• Learn strategies for finding government-produced statistics and data

# Introduction

Questions about statistics can strike fear in the most experienced librarian. By the time a patron contacts us with a question, they’ve already plumbed the depths of Google and have found the statistics that are easy to discover, so librarians usually get really tough questions. The availability of statistics is constantly changing, but knowing the most common government statistical sources is a first step. This chapter will cover general concepts and strategies that you can use to find data and statistics.

When you’re helping patrons find data or statistics, it’s good to have a firm understanding of the differences between these terms. Data refers to the raw data that is collected through three primary means: enumeration, surveys, and administrative records. Enumerations are usually accomplished through censuses, which count all of the target population, be it people, houses, businesses, or animals. Microdata is the actual raw data, or a sample thereof, with personally identifiable information removed. The term PUMS usually stands for Public Use Microdata Sample, which allows researchers to perform their own analyses rather than rely on the published aggregate statistics. While some microdata is available for free, other data files are available for a (sometimes hefty) fee. Surveys collect data on a sample of the population and present the data as estimates. The sample needs to be representative and large enough to be valid. Because they don’t count the entire population, surveys have a margin of error that increases in inverse proportion to the sample size. Administrative records such as tax returns, death certificates, or school enrollment are another source of government data.

Statistics are derived from the raw data. In the old days, statistics were compiled in either tabular or textual form and published in printed reports. With the common availability of computers, most patrons nowadays want downloadable data that they can manipulate. The hope is that someone else has either scanned the tabular data and converted it to text or laboriously transcribed it and posted it on the internet for free. Sadly, the availability of downloadable electronic data files often does not match patron expectations.

When statistics are available electronically, they may simply be in an Excel spreadsheet. For example, the Hawaiʻi Data Book allows users to download statistics in PDF or Excel format. Increasingly, agencies have developed query interfaces so that researchers can conduct their own queries by selecting the rows and columns they need. For example, the Centers for Disease Control has a data visualization tool that allows the user to select which data they want to display in a chart or graph.

# Reference Strategies

As with any reference question, a librarian needs to nail down exactly what the patron wants. The geographic area is critically important, as is the time span and period. Patrons often want data at a frequency or a level of granularity that is not available. It frequently happens that patrons are looking for monthly data, but it may only have been published in annual releases. Or, they need county data, but can only find it for the state as a whole. They don’t always believe it when informed that the data hasn’t been published in their preferred form of output. Librarians need to manage patron expectations regarding statistics, just as we would for any other type of information.

Next, we need to determine whether the patron wants aggregate data or microdata. Microdata files are enormous and unwieldy, so if the person is not practiced in using this type of data, it is probably best to look for aggregate data.

Determining whether the statistics are collected by a government agency, a nonprofit organization, or a private company can be tricky and depends in part on your knowledge of what kind of information the government collects. A little preliminary searching in Google or USA.gov might help. You can also consult Statistical Programs of the United States Government. Topics not generally covered by government statistics include religion, shopping habits, and leisure activities.

Let’s look at a question that was posted on Govdoc-l. A patron wanted to find statistics on crimes committed by immigrants in two counties in Texas that would also tell him the interval between the date of immigration and the date of the crime. If we think about where this data might be collected, the demographic data might be on the arrest record, for starters. When we consult the arrest report form for Texas, it is apparent that the form does not include a space to record an arrestee’s immigration status. Where else might we find the immigration status of a person convicted of a crime? It might be mentioned in court records or newspaper accounts, but it  might not be recorded consistently. Determining the period of time between when a crime was committed and when the person immigrated could also be very difficult. Immigration and Customs Enforcement collects statistics on immigrants who have been convicted of crimes, but they don’t release the data by state, much less by county.[1] As you can see, this is a question that could, at best, be answered using anecdotal evidence or possibly a survey of convicts, but it’s unlikely that the patron could find this precise data in a published form.

Sometimes the data is available on an agency’s computer, but it may not be publicly posted. In such cases, a patron might need to make a request through the open records process prescribed by the state or federal government, which will be covered in Chapter 17.

# General Sources

## Statistical Compendia

One of the strategies we can use when trying to figure out where to find statistics is to consult statistical compendia. The most well-known such work is Statistical Abstract of the United States (StatAb), which was formerly produced by the Census Bureau and is now a printed reference book available from Bernan or a searchable database from ProQuest. Now that most government-produced statistics are published online, it’s not as critical to be able to consult the StatAb for current statistics, but when you are looking for historical statistics, it can point you in the right direction.

Most federal agencies produce some kind of data book. For example the IRS Data Book has tables with numbers of returns, tax collections, and other data by state. States also publish data books like the Hawaiʻi Data Book that can help you to determine whether the statistics being sought are available from the state or the federal government.

Government agencies have also published guides to their statistical works or programs, such as the EPA’s A Guide to Selected National Environmental Statistics in the U.S. Government and Data on Health and Well-Being of American Indians, Alaska Natives, and Other Native Americans: Data Catalog by the U.S. Department of Health and Human Services.

## Statistics and Data Portals

Portals are becoming increasingly important since there are so many datasets and databases. Data.gov provides access to public databases and data sets, but it also lists non-public data like the Foreign Visitor Information System. One helpful feature of Data.gov is that it provides metadata about the datasets as well as contact information for the person responsible.

Most federal agencies have a data and statistics portal, like ICE Statistics on the Immigration and Customs Enforcement website. Note that the data only goes back to 2011. Some agencies have 20 or more years of statistics on their websites, but others provide limited access to historical statistics online.

A data catalog is a type of data portal that allows you to search for and download datasets. An example is the Department of Labor’s data catalog, which provides access to various enforcement datasets from Department of Labor subagencies.

Another important category of data is geospatial data, which we will explore in chapters 14-15. Most states have a portal that provides access to a number of downloadable files that can be used in a geographic information system (GIS). Some also have online mapping capabilities.

## Cooperation in Statistics

State and local statistics may be collected in cooperation with a federal agency:

• State Data Centers—Census of Population and Housing; Economic Census
• State departments of agriculture—Census of Agriculture, agricultural production, prices
• State departments of health—health surveys, vital statistics

Similarly, national governments report statistics to intergovernmental organizations. Some examples:

## University Research Centers

Many university-operated research centers collect, analyze, and publish data and data tools. Here are some university research center projects that provide access to data sets and aggregate data from a variety of government sources.

• StatsAmerica is a service of the Indiana Business Research Center at Indiana University. It is primarily focused on economic and labor statistics.
• ICPSR is a consortium of universities that provides data curation and management for data from government agencies, private organizations, and individual researchers. Some of the data is free, but membership is required to access some data products.
• Minnesota Population Center (MPC), as the name implies, provides access to data on population, among other things, through its Integrated Public Use Microdata Series. MPC has done some processing of the data to make it more usable. It’s another way to access Census data, health surveys, and education data.

Universities also receive federal funds for statistical programs, like the UH Mānoa Center on the Family, which compiles demographic and social statistics about Hawaiʻi families.[2]

## Industry Associations, Think Tanks, and Non-Profit Organizations

Aside from government sources, some statistics may be gathered or republished by industry associations, research organizations, or non-profit organizations. Industry associations typically provide statistics to members, but some charge a fee for access or restrict access to members. An example is the School Nutrition Association, which publishes data for its members about school breakfast and lunch prices and participation, among other things.[3] The Pew Research Center, a non-profit research organization, uses government statistics to perform its own analyses of, among other things, wage stagnation.[4] Rand Corporation is a well-known think tank that publishes research in a number of subject areas, including education and public policy. Its report on student outcomes in Louisiana was based in part on U.S. Department of Education data.[5]

Testimony made in Congressional hearings by representatives of universities, research organizations, and think tanks are another source of statistics. For instance, the hearing Comprehensive Immigration Reform: Government Perspectives on Immigration Statistics held before the House Subcommittee on Immigration, Citizenship, Refugees, Border Security, and International Law includes testimony containing statistics from the Heritage Foundation and the Center for Immigration Studies.

## Commercial Databases

Some common commercial databases for government and non-government statistical data include ProQuest Statistical Insight (SI), Data-Planet, and Geolytics. SI started out as three printed indexes accompanied by microfiche sets: American Statistics Index (ASI), Statistical Reference Index (SRI), and Index to International Statistics (IIS). ASI is compiled by contacting all of the federal statistical agencies and acquiring copies of their statistical publications and data. The list of sources includes hundreds of federal government agencies, both current and past.[6] The original publisher, Congressional Information Service, created abstracts of the publications and issued microfiche reproductions of them. Libraries could choose to purchase the index/abstract volumes alone or as a package with the microfiche. ASI provides indexing and abstracting from 1973 forward and full text from 2004 on. Now published by ProQuest as an online database, SI also indexes individual statistical tables within larger publications, so it provides a level of granularity that can be helpful when one is trying to find very precise data. SRI is an indexing and abstracting source for non-federal statistical publications, with coverage beginning in 1980 and full text coverage from 2007. Its sources include professional associations, non-profit organizations, research institutes, state governments, and commercial sources.[7] IIS covers intergovernmental organizations. Statistics of intergovernmental organizations and foreign countries will be covered in chapters 12 and 13.

Even if a library does not subscribe to the full text modules of SI, it is frequently possible to find the publications online. Let’s say we are looking for statistics on homeless veterans and we want to compare Hawaiʻi to another state. Using SI, we can search for homeless veterans, then narrow our results to sources that have a breakdown for states. The table State of Homelessness looks promising. If we visit the home page of the National Alliance to End Homelessness, which is the source of the table, we can download the source publication and find the table.

Data-Planet, published by SAGE, is a collection of datasets, many of which are from U.S. government sources. It also contains data from commercial sources and intergovernmental organizations such as Organisation for Economic Cooperation and Development and Food and Agriculture Organization of the United Nations. Geolytics provides an easy to use interface for current and historical census data.

It happens fairly often that we cannot find the data requested by the patron in any published sources. In these cases, the patron might have to contact the agency directly to request the data, possibly through a FOIA request. Another approach is to find an expert who may have additional unpublished data. Journal articles and dissertations can point you to these individuals. Finally, our colleagues have often developed very specific guides for statistics in their areas of interest. For example, a librarian at the Naval Postgraduate School has an extensive set of guides that includes military statistics.

## Interpreting Data

There are several types of documentation that may be helpful when you are assisting a patron who is trying to understand statistics or data. Manuals such as Uniform Crime Reporting Handbook: How to Prepare Uniform Crime Reports explain to the people conducting the survey how they are supposed to complete it. Often, there is a code book that explains the coding system for the survey. For example, the Behavioral Risk Factor Surveillance System (BRFSS) has a codebook that explains variable names and locations. Technical documentation includes definitions of terminology, as shown in the Table Definitions, Sources and Explanatory Notes for petroleum statistics.

# Statistical Resources by Topic

The U.S. government compiles and publishes a wide variety of statistics and data related to trade, economic activity, and commercial enterprises, not only for the U.S. as a whole, but also for states, counties, and even foreign countries.

Some of the most important statistics issued by the U.S. government are economic indicators. The Council of Economic Advisors publishes the annual Economic Report of the President and the monthly Economic Indicators, which includes such indicators as gross domestic product (GDP), labor force statistics, interest rates, federal receipts and expenditures, and consumer prices. The data is also available in the Federal Reserve Bank of St. Louis’ FRED database, which has coverage back to 1948. In addition to national statistics, FRED also includes some international and regional data series.[8]

Frequently, we receive questions about imports or exports. Important databases for this data are USA Trade and UN Comtrade. In USA Trade you can search by commodity, state, or port, and use either the North American Industry Classification System (NAICS) code or the Harmonized Tariff Schedule code. So, for example, you can create a report showing the total value of seafood exports from Hawaiʻi to the world or to particular countries. USA Trade requires users to register for a free account.

The freely available United Nations Commodity Trade Statistics Database (UN Comtrade) “contains detailed goods imports and exports statistics reported by statistical authorities of close to 200 countries or areas. It concerns annual trade data from 1962 to the most recent year. UN Comtrade is considered the most comprehensive trade database available with more than 3 billion records.”[9]

The International Trade Commission’s Dataweb provides access to U.S. import and export statistics, U.S. tariffs, U.S. future tariffs and U.S. tariff preference information from 1989 to the present. Access is free, but it requires the user to register for an account. ITC also publishes the Harmonized Tariff Schedule, a vast list of tariffs based on a classification scheme. Using HTS, we find that the code for macadamia nuts is 2008.19.90.10. Using that code number, we can search Dataweb for export data.

The Bureau of Labor Statistics (BLS) collects data that feeds into economic forecasts. Its press releases are eagerly awaited by analysts who are looking at trends in prices, employment and unemployment, or inflation. Major economic indicators, including the Consumer Price Index (used to measure inflation), Producer Price Index, Employment Situation, and many others are issued periodically. BLS also publishes area wage surveys to measure wages in various industries in different metropolitan areas and states. BLS’s Consumer Expenditure Surveys collect data on consumer characteristics and expenditures. A variety of data tools are available to access the data.

Patrons sometimes need 30 years of economic data to examine trends or compare to other indicators over time. ICPSR and other data sources contain downloadable data sets. If you’re looking for economic publications, FRASER has digitized long runs of historical and current publications such as Bureau of Labor Statistics bulletins, which cover a wide variety of topics.

You might think that consular reports would involve statistics about visas issued, diplomatic relations, or similar matters. In fact, employees of consulates throughout the world have promoted U.S. commerce by collecting data on local industries and markets. Consular officials filed reports on industries and markets in foreign countries in order to promote U.S. business interests. These reports provide very detailed historical information about industries and commerce such as leather tanneries, blacksmiths, bicycles in China, or interisland travel in the Pacific. Figure 1 shows an example of a report prepared by the Bureau of Foreign and Domestic Commerce from information collected by consular officials.

Publicly traded companies file 10K (annual) and 10Q (quarterly) reports of financial data as required by the Securities and Exchange Commission (SEC). These reports, and many others, can be found in SEC’s EDGAR database, which covers the mid-1990s to the present.[11]

The Bureau of Economic Analysis (BEA) is another federal agency that collects and publishes economic data including GDP for the U.S. and states, income, consumer spending, personal savings, healthcare, and purchasing power. Most data tables can be downloaded in Excel format.

The Small Business Administration (SBA) exists to administer programs to support It offers counseling, research assistance, loan programs, disaster assistance, and support for businesses that want to bid on federal contracts. Among its programs and services are the following:

• Small Business Development Centers serve local small businesses and provide market research assistance and information about loans, grants, disaster assistance, and other programs. The Hawaiʻi SBDC includes the Hawai‘i Business Research Library, which provides research assistance for business development.
• HUBZone is a certification program for small businesses in underutilized business zones that allows qualified businesses to bid on federal contracts.
• SBA also provides export assistance and helps veterans, women, and minorities develop their small businesses and acquire certification to do business with the government.
• Statistics such as the dollar amount of disaster loans can be found on the website of each headquarters office or in the Agency Financial Report.

Census Bureau surveys related to business are discussed in Chapter 7.

## Transportation

Roads, airports, bridges, ports, and other transportation facilities and infrastructure are especially confusing for patrons because they don’t know who controls them. Generally, the federal government controls interstate travel and produces statistics on the highway system, airports and air travel, rail transport, and shipping. The U.S. Department of Transportation’s Bureau of Transportation Statistics is a gateway to national statistics. Accident data and reports of accident investigations are available from the National Highway Traffic Safety Administration, the Federal Aviation Administration, and the Pipeline and Hazardous Materials Safety Administration.

States or cities maintain statistics about state- or municipality-controlled roads and modes such as ferry systems, as exemplified by the statistics page for Washington State Ferries. States are required to report some statistics to federal agencies. For instance, the Federal Highway Administration publishes statistics about the condition of bridges in each state based on data reported by the states according to National Bridge Inspection Standards.[12]

The Census Bureau also collects transportation-related statistics in its surveys, including commute times and local employment dynamics, which explores where workers live and where they travel to work. The U.S. Coast Guard, which does not have a centralized statistical branch, maintains statistics about shipping in coastal and inland waterways, maritime law enforcement, numbers of boats, marine pollution, and boating accidents, among other things.

## Education

The National Center for Education Statistics produces some data for states and school districts, accessible through its tool Search for Public School Districts. It also publishes data about the number of schools, employment in education, school finance, and other characteristics. Some of this data is reported by states, which compile data on educational achievement, enrollment, employment, and other indicators. Similarly, school districts publish statistics about student demographics, achievement, and other characteristics.

One of the most commonly sought statistics is the number or percentage of students eligible for free or reduced-priced lunches, which is considered a key indicator of poverty, and the number of students who actually get free lunches. These statistics are usually maintained by school districts, and you can find state-level data through the U.S. Department of Agriculture, which administers the school lunch program.

The National Center for Science and Engineering Statistics (NCSES) of the National Science Foundation tracks trends in STEM education and publishes data on STEM education, advanced degrees by institution, and employment statistics in science and engineering fields. It conducts several surveys of universities and students and makes PUMS available through its website.[13]

Commonly sought historical statistics include educational attainment and literacy, which can usually be found in state board of education reports or in StatAb. Educational attainment is also recorded in Census data.

## Agriculture, Forestry, Fisheries

The National Agricultural Statistical Service (NASS) collects and publishes statistics on state agricultural production, farm animals, acreage in production, and farm employment. Much of the data is derived from the Census of Agriculture, a survey-based data collection that is specific to each state. For example, the Hawaiʻi state statistics don’t include berries because the most common types of berries, such as blueberries and raspberries, are either not grown in the state or the production is minimal. Be aware that, like the Census of Population and Housing and the Economic Census, the Census of Agriculture is subject to data suppression to protect privacy. In addition, the data is only available at the county level and not by island.

The National Agricultural Library (NAL) hosts the Ag Data Commons, a repository for USDA-funded research data. Researchers can find data by topic or through full-text searching.

The Economic Research Service (ERS) publishes a wealth of data about agricultural commodities, aquaculture, exports and imports, rural development, food security, and other topics in agricultural economics.

## Energy and the Environment

The Energy Information Administration publishes statistics on energy generation, use, and prices for the U.S. and states. State energy offices like the Utah Office of Energy Development compile similar statistics related to production, consumption, and energy development.

Water can be a confusing topic when one is searching for statistics. Water resources and consumption are monitored by the US Geological Survey (USGS), which produces reports for all of the states. The USGS also publishes data on flooding, volcanoes, and earthquakes. Water quality data is reported by local water systems to the Environmental Protection Agency (EPA) and is published in the Safe Drinking Water Information System. EPA also issues the Toxics Release Inventory (TRI), a report of toxic chemicals released by industrial facilities. EPA provides several data tools to examine TRI data.

The National Climatic Data Center (NCDC) collects temperature, pressure, rainfall, and other climate data. Over 100 years of data is available online for states and local areas through NCDC. Note that access to some reports is free only if you are connected through a .edu domain.

The U.S. Fish and Wildlife Service administers federal wildlife refuges and the Endangered Species Act in collaboration with state agencies. When it comes to marine species, the National Marine Fisheries Service publishes statistics on fisheries, marine mammals, and other ocean life. State departments of fish and wildlife typically produce statistics on game, endangered species, and invasive species like the Little Fire Ant or zebra mussels.

## Criminal Justice

The FBI produces a number of reports about crimes. The UCR Program consists of four data collections: The National Incident-Based Reporting System (NIBRS), the Summary Reporting System (SRS), the Law Enforcement Officers Killed and Assaulted (LEOKA) Program, and the Hate Crime Statistics Program. The data is available through the Crime Data Explorer (CDE). The FBI’s main statistical publication, Crime in the United States, is compiled from statistics provided by states and covers broad categories of violent and property crimes. It also compiles reports on specific types of crimes such as active shooter incidents and financial crimes. The Bureau of Justice Statistics is the gateway to statistics about criminal characteristics, prisoners, types of crimes, law enforcement, and other topics. Statistics on crime, criminals, and prison populations at the level of detail desired by patrons are often more difficult to find than most other types of statistics. Bear in mind that there is a federal criminal justice system with federal prisons, and parallel state systems. The federal Bureau of Prisons maintains statistics on its prison population, while states maintain statistics on state prisons. Looking at the Arizona Department of Corrections Inmate Ethnic Distribution by Unit, it is evident that Arizona is most interested in tracking Caucasians, Latinos, and African Americans. There are many prisoners from Hawaiʻi in Arizona prisons, but because they are a relatively small portion of the total prison population, Native Hawaiians, Asians, and Pacific Islanders are not reported separately.

Many states such as Texas and California have state police agencies that issue statistical reports. For instance, the Texas Department of Public Safety issues reports on crime that feature data supplied to the Department of Justice for Uniform Crime Reports. Municipal police departments have varying amounts of information on the web, and some police departments have fallen behind on reporting to the Department of Justice.

## Military and Veterans

Statistics on military personnel and veterans can be difficult to find with the granularity desired by patrons. Patrons want both current and historical information about numbers of service members, casualties, ethnicities, and service in particular conflicts or locations. This information might not be readily available online and often requires some sleuthing. The Department of Defense used to publish Atlas/Data Abstract for the United States and Selected Areas, but it is no longer being issued. Statistics on servicemembers and veterans by ethnicity can be really hard to find, although you can use Census data to get some numbers. Figure 2 is a Census Bureau infographic about veterans.

# Historical Statistics

In addition to StatAb, there are several other sources for historical statistics. Annual reports of federal and state agencies often include the most important measures of the agency’s activities in the preceding year and sometimes include several years’ worth of data. Congressional publications are an often-overlooked source of statistics. In the course of budget allocations and program review, Congress is often provided with statistics by federal agencies and programs. During the 19th and early 20th centuries, the annual reports of federal agencies were published as congressional documents. These annual reports are packed with statistics about everything from legumes to lighthouses. For instance, Report of the Honorable Roland S. Morris on Japanese Immigration and Alleged Discriminatory Legislation Against Japanese Residents in the United States includes statistics about Japanese and Chinese immigration, land ownership by immigrants, and related topics.[14] Now that many of these documents are available in HathiTrust and other repositories, it is relatively easy to locate historical statistics.

