How-to: Connecting Grant Funding in USA Spending and Federal RePORTER

Categories: Big Data, Quantitative, Tools and Resources

Tags: , , ,

Every summer, the Biocomplexity Institute’s Social and Decision Analytics Division’s Data Science for the Public Good (DSPG) Young Scholars program draws university students from around the country to work together on projects that use computational expertise to address critical social issues faced by local, regional, state or federal governments. The students conduct research at the intersection of statistics, computation, and the social sciences to determine how information generated within every community can be leveraged to improve quality of life and inform public policy. The program, held at the University of Virginia’s Arlington offices, runs for 10 weeks for undergraduate interns and 11 weeks for graduate fellows who work in teams collaborating with postdoctoral associates and research faculty from the division, and project stakeholders.

The 2019 cohort conducted nine research projects, and their methodologies and discoveries will be presented at MethodSpace over the next three weeks as part of our examinations of Methods In Action. The descriptions of the projects were penned by the students themselves, and their names, mentors and sponsors appear under the DSPG logo in the text.

ABSTRACT - Measuring the Public Funding of R&D: A Feasibility Study
Federal (public) funding of research and development (R&D) accounts for 25% of total R&D funding. Using publicly available sources of administrative data, the project is documenting the quality and availability of public funding of R&D to universities and nonprofit organizations. The goal is to assess the feasibility of enhancing and supplementing the current collection of these data through the NSF Survey of Federal Science and Engineering Support to Universities, Colleges, and Nonprofit Institutions (Fed Support) (FSS).

Federal grant funding data have become increasingly transparent and available to the public. However, federal agencies (and divisions within the same agency) format their data differently, making cross-agency comparisons difficult. But each database usually contains unique information which would be useful to combine into one ultimate source. How can these data sources be combined? This brief outlines an attempt to combine two such databases and the challenges faced.

Spotlight on Data Sources

We identified 10 public federal funding databases, most of which focus only on a single agency (see Table 1). We put our initial focus on two databases, Federal RePORTER and USA Spending, based on their wide coverage of six major funding sources of science and engineering spending: the Department of Energy (DOE), Department of Defense (DOD), National Science Foundation (NSF), National Institutes for Health (NIH), National Aeronautics and Space Administration (NASA), and Department of Agriculture (USDA).

Table 1: Coverage of six major federal agency for the eleven identified data sources of federal funding.

SourceDOEDODNASANIHNSFUSDA
Federal RePORTER 
USA Spending
Current Research Information System     
NIH Project Exporter     
Conservation Innovation Grants All Project List     
NSF Award Search     
Office of Science Awards     
DOE Public Access Gateway for Energy & Science (PAGES)     
ARPA-E Projects     
CDMRP Awards Search     
NOTE: “yes” only indicates that there are at least some data from the department in the data source, not that the data are complete

 USA Spending is the unified source for federal spending data in the United States, including both grants and contracts. The database has been maintained by the Treasury Department since 2008. Data are recorded at the transaction level, that is, individual amounts at the time that the transaction occurs, in addition to the overall project totals. USA Spending is particularly appealing because it has a publicly available data dictionary, unlike some of the other databases evaluated. USA Spending outlines a number of variables, including:

  • Assistance type code: The type of assistance provided by the award.
  • Business funds indicator code: The Business Funds Indicator (BFI), indicating the award’s applicability to the Recovery Act.
  • Business types: A collection of indicators of different types of recipients based on socio-economic status and organization / business areas.
  • CFDA number: The number assigned to a federal area of work in the Catalog of Federal Domestic Assistance (CFDA). Identifies the department and specific assistance program. The CFDA handbook specifies a rich variety of information about each CFDA code (see also https://beta.sam.gov/).
  • DUNS: The unique identification number for an awardee or recipient. A nine-digit number assigned by Dun and Bradstreet (D&B) referred to as the DUNS® number. Note that in our analyses, a DUNS number would often vary on the same URI (see below).
  • FAIN: The Federal Award Identification Number (FAIN) is the unique ID within the Federal agency for each (non-aggregate) financial assistance award.
  • Record type: Code indicating aggregation status.
  • SAI number: A number assigned by state review agencies to the award during the grant application process.
  • URI: Unique Record Identifier. An agency-defined identifier that (when provided) is unique for every reported action.
Data-Science_Public_Good-Logo
DSPG Students: Sean Pietrowicz, Alyssa Fowers
Mentors: Samantha Cohen, Joel Thurston, and Stephanie Shipp
Sponsor: National Science Foundation’s National Center for Science and Engineering Statistics

Federal RePORTER  is an award database. Unlike USA Spending, Federal RePORTER reportedly only contains research and development projects (R&D) grant funding, excluding contracts and non-R&D grants. Project data reaching back to 2000 exist in the database, but only in 2014 is there at least one record from each of the NIH, NSF, USDA, DOD, and NASA. Federal RePORTER lacks funding by the DOE but does cover other agencies such as the Environmental Protection Agency and Veteran Affairs.  Federal RePORTER reports grants at the project level–that is, the total amount of money awarded in a grant for a particular project, regardless of when the funding was dispersed. Federal RePORTER, unlike USA Spending, includes project terms based on the title and abstract information, as well as match scores indicating the similarity of two projects.

Combining Project-Level Data in USA Spending and Federal RePORTER

Our goal was to merge these two databases with different fields together into a central, comprehensive database of project-level data. By merging the two, this aggregated source would include:

  • Institutional affiliations
    • Project terms (i.e. keywords)
    • All allocated payments to date
    • The total amount of the award
    • Amounts and dates of specific transactions
    • R&D status (if the grant is in Federal RePORTER)
    • Descriptions of areas of work (via CFDA number)

However, creating this dataset is not straightforward, because within USA Spending project identifiers are structured differently between departments and even agencies. They also vary in structure from the formats used in Federal RePORTER (which similarly vary by department and agency). As a result, the same project could have different identifiers in Federal RePORTER than in USA Spending.  To link these two databases together, we examined how unique project identifiers were structured for the main funders of R&D in Federal RePORTER and USA Spending and found ways to link most of these agencies together. The exact mapping is shown in Table 2 below. The conversion process of identifiers by agency or subagency from USA Spending to Federal RePORTER is shown in Figure 1.

Some agencies could not be linked. The DOE is not included in Federal RePORTER. Within the DOD, most Federal RePORTER grants were related to just one agency, the Congressionally Directed Medical Research Programs. Within the USDA, we were unable to find consistent project identifier mappings for two agencies: the Forest Service and the Agricultural Research Service.

Diagram of how USA Spending project identifiers were standardized to the same format as Federal RePORTER to link projects across the two data sources.

Table 2: Mapping of project identifiers between Federal RePORTER and USA Spending.

Direct linkage between data sets with no modification

Sub-AgencyUSA Spending NotationFederal RePORTER NotationLinkage procedure
NSF7-digit number
e.g., 1624124
7-digit number
e.g., 1624124
1:1 match
1624124 = 1624124
NASA10-character alphanumeric string
e.g., NNX15AC14G
10-character alphanumeric string
e.g., NNX15AC14G
1:1 match
NNX15AC14G = NNX15AC14G

Direct linkage between data sets with modification

Sub-AgencyUSA Spending NotationFederal RePORTER NotationLinkage procedure
NIH11-character alphanumeric string; last 6 digits always numeric
e.g., P30CA016087
12-character (minimum) alphanumeric string*
e.g., 4P30CA016087-36
* Some of the strings are hyphenated with additional characters. This is also the NIH Grant Number​.
USA Spending string matches characters 2-12 in Federal RePORTER
P30CA016087 = 4P30CA016087-36
USDA-NIFA15-digit number with last digit separated by a decimal point
e.g., 20114700230882.6​
14-digit hyphenated number; first 4 digits are fiscal year of award
e.g., 2011-47002-30882​
USA Spending string first 14 digits match unhyphenated Federal RePORTER string
20114700230882.6​ = 20114700230882​
DOD-CDMRP6-character alphanumeric string followed by 7-digit number*
e.g., W81XWH1620050
*The first 6 digits in USA Spending identifiers are the DoDAAC.
8-character alphanumeric string**
e.g., GW150034​
** If first two characters are letters it specifies a research program.
Entries connected using the dataset extracted from this site

No direct linkage established between data sets at this time

Sub-AgencyUSA Spending NotationFederal RePORTER NotationLinkage procedure
USDA-ARS15-digit hyphenated number*
e.g., 58-5030-5-080-1​
* 4-digit portion indicates research lab (preliminary list here)​.
“ARS” followed by hyphen and 7-digit USDA numeric accession code
e.g., ARS-0429067
Some projects linked by matching USDA accession code to USA Spending projects via this site URL produce similar but not always identical project titles; testing needed to establish exact relationship between USA Spending notation and ARS codes
USDA-National Forest15-digit string; 3rd and 4th as letters, rest are numbers
e.g. 11CA11132543158
“USFS-” followed by 7-digit number
e.g. USFS-0000048
No linkage found at this time

In addition to connecting the two databases, project identifiers often contain additional information about projects.

Overlapping projects

With common identifiers, we can link projects found in both USA Spending and Federal RePORTER. However, identifiers are not enough for one-to-one comparisons, as the data are reported at different timescales. Federal RePORTER reports overall project awards at the time when they are awarded, while USA Spending reports individual disbursements of those funds. A grant disbursed over multiple years would have only one record in Federal RePORTER, but would appear multiple times in USA Spending. Consider a grant awarded in 2012, with payments in FY13, 14, and 15. The project ID would appear in USA Spending in FY 2013, 2014, and 2015.  In Federal RePORTER, the project would appear only in FY 2012.

Thus, to gather all the additional data found in Federal RePORTER and join it to our data in USA Spending, we would need to acquire all Federal RePORTER records prior to and including our comparison year in USA Spending. Some projects in USA Spending were originally awarded prior to Federal RePORTER’s range of projects (as evidenced by CFDA numbers no longer in common use by 2008), suggesting that we’d be less successful in matching USA Spending records for grants awarded prior to 2008 to the original grant record in Federal RePORTER. In addition, not all grants from USA Spending are expected within Federal RePORTER, which is supposed to only consist of R&D grants. Overall, our expectation for the percentage of grants from a single year of USA Spending to be found in Federal Reporter is less than 100%.

Table 3: Match rate by department

DepartmentPercent of USA Spending (2016) projects found in Federal RePORTER (2014-2016)
DOD7 percent
NASA59 percent
NIH99.7 percent
NSF85 percent
USDA28 percent

Overall, these findings indicate that the coverage between Federal RePORTER and USA Spending is not identical. This is as expected, since projects from USA Spending 2016 may have begun finding prior to 2014, or because projects were deemed to not fit the R&D criteria of Federal Reporter.

Conclusion

In this brief, we’ve outlined how a citizen scientist can use existing federal funding databases to create a more comprehensive understanding of grants funded by the federal government. We provide both the protocol for combining two widespread databases, USA Spending and Federal RePORTER, as well as outlined the additional information that can be gleaned from each—if you know what you’re looking for.

Leave a Reply