The Digging into Data Challenge aims to address how "big data" changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials available for research in the humanities and the social sciences--ranging from digitized books, newspapers, and music to information generated by Internet-based activities and mobile communications, administrative data from public agencies, and customer databases from private sector organizations-—what new, computationally-based research methods might we apply? As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st-century scholarship.
IATH is a research unit of the University of Virginia established by the University of Virginia in 1992. Our goal is to explore and develop information technology as a tool for scholarly humanities research. To that end, we provide our Fellows with consulting, technical support, applications development, and networked publishing facilities. We also cultivate partnerships and participate in humanities computing initiatives with libraries, publishers, information technology companies, scholarly organizations, and other groups residing at the intersection of computers and cultural heritage.
The Project for American and French Research on the Treasury of the French Language (ARTFL) is a cooperative enterprise of the Laboratoire ATILF (Analyse et Traitement Informatique de la Langue Française) of the Centre National de la Recherche Scientifique (CNRS), the Division of the Humanities, and Electronic Text Services (ETS) of the University of Chicago.
The Association of Religion Data Archives (ARDA) strives to democratize access to the best data on religion. Founded as the American Religion Data Archive in 1997 and going online in 1998, the initial archive was targeted at researchers interested in American religion. The targeted audience and the data collection have both greatly expanded since 1998, now including American and international collections and developing features for educators, journalists, religious congregations, and researchers.
Search America's historic newspaper pages from 1789-1963 or use the U.S. Newspaper Directory to find information about American newspapers published between 1690-present. Chronicling America is sponsored jointly by the National Endowment for the Humanities and the Library of Congress.
HathiTrust is a partnership of academic & research institutions, offering a collection of millions of titles digitized from libraries around the world. While copyright-protected texts are not available for download from HathiTrust, fruitful research can still be performed on the basis of non-consumptive analysis of features extracted from full text. These features include volume-level metadata, page-level metadata, part-of-speech-tagged tokens, and token counts. Additionally, HTRC has partnered with advanced researchers to release a derived dataset, Word Frequencies in English-Language Literature, 1700-1922.
HathiTrust Research Center (HTRC) enables computational analysis of works in the HathiTrust Digital Library (HTDL) to facilitate non-profit research and educational uses of the collection. HTRC engages in research and development for computational text analysis of massive digital libraries.
Housed with in the UK Data Archive at the University of Essex, the History Data Service (HDS) collects, preserves, and promotes the use of digital resources, which result from or support historical research, learning and teaching. the History Data Service is a successor service to AHDS History which from 1996 to March 2008 was one of the five centres of the Arts and Humanities Data Service.
The Magazine of Early American Datasets (MEAD) is an online repository of datasets compiled by historians of early North America. MEAD preserves and makes available these datasets in their original format and as comma-separated-value files (.csv). Each body of data is also accompanied by a codebook. MEAD provides sweet, intoxicating data for your investigations of early North America and the Atlantic World.
The Oxford Text Archive develops, collects, catalogues and preserves electronic literary and linguistic resources for use in Higher Education, in research, teaching and learning. The OTA also gives advice on the creation and use of these resources, and is involved in the development of standards and infrastructure for electronic language resources.
Since planning began in 1985, the Perseus Digital Library Project has explored what happens when libraries move online. Perseus is a practical experiment in which we explore possibilities and challenges of digital collections in a networked world. Our flagship collection, under development since 1987, covers the history, literature and culture of the Greco-Roman world. We are applying what we have learned from Classics to other subjects within the humanities and beyond.