Oxylabs' Amazon Scraper API allows users to easily scrape publicly-available data from any page on Amazon, such as reviews, pricing, product information and more. If you're interested in testing out this powerful tool, you can sign up for a free trial on the Oxylabs website.
Below is a quick overview of all the available data source
values we support with Amazon.
Source | Description | Structured data |
---|---|---|
amazon |
Submit any Amazon URL you like. | Depends on the URL. |
amazon_bestsellers |
List of best seller items in a taxonomy node of your choice. | Yes |
amazon_pricing |
List of offers available for an ASIN of your choice. | Yes. |
amazon_product |
Product page of an ASIN of your choice. | Yes. |
amazon_questions |
Q&A page of an ASIN of your choice. | Yes. |
amazon_reviews |
Reviews page of an ASIN of your choice. | Yes. |
amazon_search |
Search results for a search term of your choice. | Yes. |
amazon_sellers |
Seller information of a seller of your choice. | Yes. |
The amazon
source is designed to retrieve the content from various Amazon URLs. Instead of sending multiple parameters, you can provide us with a direct URL to the required Amazon page. We do not strip any parameters or alter your URLs in any way.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | N/A |
url |
Direct URL (link) to Amazon page | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
render |
Enables JavaScript rendering. More info. | - |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data, as long as the URL submitted is for one of the page types we can parse. |
false |
- required parameter
In the code example below, we make a request to retrieve the Amazon product page for B0BDJ279KF
.
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon',
'url': 'https://www.amazon.co.uk/dp/B0BDJ279KF',
'parse': True
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('YOUR_USERNAME', 'YOUR_PASSWORD'), #Your credentials go here
json=payload,
)
# Instead of response with job status and results url, this will return the
# JSON response with results.
pprint(response.json())
To see the response example with retrieved data, download this sample output in JSON format.
The amazon_search
source is designed to retrieve Amazon search result pages.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | amazon_search |
domain |
Domain localization for Amazon. The full list of available domains can be found here. | com |
query |
UTF-encoded keyword | - |
start_page |
Starting page number | 1 |
pages |
Number of pages to retrieve | 1 |
geo_location |
The Deliver to location. See our guide to using this parameter here. | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
render |
Enables JavaScript rendering. More info. | - |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data. |
- |
|
Search for items in a particular browse node (product category). | - |
|
Search for items sold by a particular seller. | - |
- required parameter
In the code example below, we make a request to retrieve product page for ASIN 3AA17D2BRD4YMT0X
on amazon.nl
marketplace. In case the ASIN provided is a parent ASIN, we ask Amazon to return a product page of an automatically-selected variation.
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon_search',
'domain': 'nl',
'query': 'adidas',
'start_page': 11,
'pages': 10,
'parse': True,
'context': [
{'key': 'category_id', 'value': 16391843031},
{'key': 'merchant_id', 'value':'3AA17D2BRD4YMT0X'}
],
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
To see the response example with retrieved data, download this sample output file in JSON format.
The amazon_product
data source is designed to retrieve Amazon product pages.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | amazon_product |
domain |
Domain localization for Amazon. The full list of available domains can be found here. | com |
query |
10-symbol ASIN code | - |
geo_location |
The Deliver to location. See our guide to using this parameter here. | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
render |
Enables JavaScript rendering. More info. | |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data. |
- |
|
To get accurate pricing/buybox data, set this parameter to true (which tells us to append the th=1&psc=1 URL parameters to the end of the product URL). To get an accurate representation of the parent ASIN's product page, omit this parameter or set it to false . |
false |
- required parameter
In the code example below, we make a request to retrieve product page for ASIN B09RX4KS1G
on amazon.nl
marketplace. In case the ASIN provided is a parent ASIN, we ask Amazon to return a product page of an automatically-selected variation.
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon_product',
'domain': 'nl',
'query': 'B09RX4KS1G',
'parse': True,
'context': [
{
'key': 'autoselect_variant', 'value': True
}],
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
To see the response example with retrieved data, download this sample output file in JSON format.
The amazon_pricing
data source is designed to retrieve Amazon product offer listings.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | amazon_pricing |
domain |
Domain localization for Amazon. The full list of available domains can be found here. | com |
query |
10-symbol ASIN code | - |
start_page |
Starting page number | 1 |
pages |
Number of pages to retrieve | 1 |
geo_location |
The Deliver to location. See our guide to using this parameter here. | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
render |
Enables JavaScript rendering. More info. | |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data. |
- |
- required parameter
In the code examples below, we make a request to retrieve product offer listing page for ASIN B09RX4KS1G
on amazon.nl
marketplace.
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon_pricing',
'domain': 'nl',
'query': 'B09RX4KS1G',
'parse': True,
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
To see what the parsed output looks like, download this JSON file.
The amazon_reviews
data source is designed to retrieve Amazon product review pages of an ASIN of your choice.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | amazon_reviews |
domain |
Domain localization for Amazon. The full list of available domains can be found here. | com |
query |
10-symbol ASIN code | - |
geo_location |
The Deliver to location. See our guide to using this parameter here. | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
start_page |
Starting page number | 1 |
pages |
Number of pages to retrieve | 1 |
render |
Enables JavaScript rendering. More info. | |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data. |
- |
- required parameter
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon_reviews',
'domain': 'nl',
'query': 'B09RX4KS1G',
'parse': True,
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
To see the response example with retrieved data, download this sample output file in JSON format.
The amazon_questions
data source is designed to retrieve any particular product's Questions & Answers pages.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | amazon_questions |
domain |
Domain localization for Amazon. The full list of available domains can be found here. | com |
query |
10-symbol ASIN code | - |
geo_location |
The Deliver to location. See our guide to using this parameter here. | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
render |
Enables JavaScript rendering. More info.**** | |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data. |
- |
- required parameter
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon_questions',
'domain': 'nl',
'query': 'B09RX4KS1G',
'parse': True,
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
To see the response example with retrieved data, download this sample output file in JSON format.
The amazon_bestsellers
data source is designed to retrieve Amazon Best Sellers pages.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | amazon_bestsellers |
domain |
Domain localization for Amazon. The full list of available domains can be found here. | com |
query |
Department name. Example: Clothing, Shoes & Jewelry |
- |
start_page |
Starting page number | 1 |
pages |
Number of pages to retrieve | 1 |
geo_location |
The Deliver to location. See our guide to using this parameter here. | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
render |
Enables JavaScript rendering. More info. | |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data. |
- |
|
Search for items in a particular browse node (product category). | - |
- required parameter
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon_bestsellers',
'domain': 'de',
'query': 'automotive',
'start_page': 2,
'parse': True,
'context': [
{'key': 'category_id', 'value': 82400031},
],
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
To see the response example with retrieved data, download this sample output file in JSON format.
The amazon_sellers
data source is designed to retrieve Amazon Sellers pages.
Parameter | Description | Default Value |
---|---|---|
source |
Data source. More info. | amazon_sellers |
domain |
Domain localization for Amazon. The full list of available domains can be found here. | com |
query |
13-character seller ID | - |
geo_location |
The Deliver to location. See our guide to using this parameter here. | - |
user_agent_type |
Device type and browser. The full list can be found here. | desktop |
render |
Enables JavaScript rendering. More info. | |
callback_url |
URL to your callback endpoint. More info. | - |
parse |
true will return structured data. Please note that right now we only support parsed output for desktop device type. However, there is no apparent reason to get sellers pages with any other device type, as seller data is going to be exactly the same across all devices. |
- |
- required parameter
In the code examples below, we make a request to retrieve the seller page for seller ID ABNP0A7Y0QWBN
on amazon.de
marketplace.
import requests
from pprint import pprint
# Structure payload.
payload = {
'source': 'amazon_sellers',
'domain': 'de',
'query': 'ABNP0A7Y0QWBN',
'parse': True
}
# Get response.
response = requests.request(
'POST',
'https://realtime.oxylabs.io/v1/queries',
auth=('user', 'pass1'),
json=payload,
)
# Print prettified response to stdout.
pprint(response.json())
Also, check this tutorial on pypi