How to download data from the human protein atlas

The Human Protein Atlas (HPA) is a Swedish-based program initiated in 2003 and has become an important resource for both basic and clinical research.

It is one of the largest and most comprehensive databases available today, with more than 300,000 visitors per month. The atlas provides information about the spatial distribution of about 78% of known human protein-coding genes. The current version consists of twelve separate sub atlases each providing novel insights into various aspects of the human proteome. An overview of protein expression profiles is provided based on antibody‐based approaches combined with transcriptomics data. The protein expression patterns are shown at the level of a single cell in the form of immunohistochemically stained images and there are more than 10 million images uploaded.

Access to the HPA data is mainly via a web-based interface allowing views of individual proteins; all data is publicly available at www.proteinatlas.org, including different downloadable files with the complete data set for each gene. Most of the data (complete or partial) is available as CSV/TSV, RDF, or XML files.

Additionally, information about any gene can be looked up by using the search option. The data from the search result can be downloaded in different formats, including XML, RDF, and TAB by using the links located at the far right in the table header of the search result.

Future directions

One of the major aims of the Human Protein Atlas project is to systematically study protein expression and localization based on spatial proteomics.

The last decade has seen the emergence of several spatial technologies that can detect both protein and mRNA expression levels with high-throughput and high plex profiling and is offering unprecedented insights into the spatial organization of tissues and how fundamental cellular processes are orchestrated in multicellular organisms.

The human protein atlas gets updated every year and is continuously evolving as new data is generated. Future versions of the human protein atlas database will likely include the integration of data generated from multiomics platforms. Knowledge thus gained will revolutionize both basic research and medicine and can be applied to several areas of translational research, such as mining and evaluating biomarkers for disease prognosis, and developing novel therapeutics and drugs.

Examples of such multiomics spatial technologies include the GeoMx® Digital Spatial Profiler (DSP) and the CosMx Spatial Molecular Imager (SMI), platforms developed by NanoString that are compatible to work on archival FFPE tissue, a valuable resource for DNA, RNA, and proteins.

The GeoMx DSP can profile 18,000+ protein-coding genes while maintaining a wide dynamic range for the detection of low to high-expressing genes. Thus, the GeoMx DSP excels at unbiased biomarker discovery through its whole transcriptome profiling capabilities and has been instrumental in generating a spatial atlas for six different human organs: kidney, brain, intestine, lymph node, liver and pancreas.

The CosMx SMI complements the GeoMx DSP technology to facilitate spatial biology at the single-cell and subcellular levels with the ability to detect more than 1,000 RNA and over 64 protein analytes. Therefore, CosMx SMI is ideal for creating a spatial cell atlas, mapping cell types, and examining cellular interactions.

By Nirupama Deshpande
For research use only. Not for use in diagnostic procedures.