API kegg - IndexError: list index out of range
0
0
Entering edit mode
4 months ago

Hi,

I have an Excel table with gene symbols listed under the column "Symbol" (e.g., AKR1A1, adh) and their corresponding functions listed under the column "Function" (e.g., alcohol dehydrogenase (NADP+) [EC:1.1.1.2]). These symbols represent genes across various organisms. I would like to utilize the KEGG API to retrieve the cellular communities associated with these gene symbols and incorporate the results back into the Excel table. but I encounter this error

IndexError                                Traceback (most recent call last)

<ipython-input-17-1bf6ed03288a> in <cell line: 43>()
     41 
     42 # Add a new column for cellular community information
---> 43 df['Cellular Community'] = df['Symbol'].apply(get_cellular_community)
     44 
     45 # Save the updated DataFrame to a new Excel file

4 frames

<ipython-input-17-1bf6ed03288a> in get_cellular_community(symbol)
     24     if result:
     25         if len(result) > 0:  # Check if the result list is not empty
---> 26             kegg_id = result[0].split(':')[1]
     27             # Get the pathways associated with the gene from KEGG
     28             pathways = k.get_pathway_by_gene(kegg_id)

IndexError: list index out of range

and here is the script I used

import pandas as pd
from bioservices import KEGG

# Initialize the KEGG object
k = KEGG()

# Read the Excel file
try:
    df = pd.read_excel('/content/sample_data/function 1-5.xlsx')
except FileNotFoundError:
    print("Error: Excel file not found.")
    exit()

# Check if the 'Symbol' column exists in the DataFrame
if 'Symbol' not in df.columns:
    print("Error: 'Symbol' column not found in the Excel file.")
    exit()

# Function to get cellular community information for a gene symbol
def get_cellular_community(symbol):
    cellular_community = ""
    # Search for the gene symbol in KEGG
    result = k.find('genes', symbol)
    if result:
        if len(result) > 0:  # Check if the result list is not empty
            kegg_id = result[0].split(':')[1]
            # Get the pathways associated with the gene from KEGG
            pathways = k.get_pathway_by_gene(kegg_id)
            # Extract cellular community information from pathways
            for pathway_id, pathway_info in pathways.items():
                if 'Categories' in pathway_info:
                    categories = pathway_info['Categories']
                    if 'Cellular community - eukaryotes' in categories:
                        cellular_community = 'eukaryotes'
                        break
                    elif 'Cellular community - prokaryotes' in categories:
                        cellular_community = 'prokaryotes'
                        break
    return cellular_community


# Add a new column for cellular community information
df['Cellular Community'] = df['Symbol'].apply(get_cellular_community)

# Save the updated DataFrame to a new Excel file
output_file = 'output_with_cellular_community.xlsx'
df.to_excel(output_file, index=False)
print(f"Updated Excel file saved as: {output_file}")
kegg API • 320 views
ADD COMMENT
1
Entering edit mode

It might be because the gene symbol is not found in the KEGG database. Add some extra error handling before parsing the results or do a manual check

ADD REPLY

Login before adding your answer.

Traffic: 1498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6