Loop through a list to find values that are present in a dataframe
I’m attempting to perform a loop in a list to check if any item is present on specific row of a dataframe if so then a new column is created on the dataframe with the item matched in the list and row. In case my logic isn't clear, what I want is basically to extract the brand and model from a given column with the products description. I apologize if this question is trivial.
I tried running it through two methods, one using a lambda and regex function and the other with a simple loop. The problem with the method using regex is that the loop does not return the third string "Ti" or "Super".
The problem with the method using a simple loop is when in the dataframe row there are, for example, the models "Ti" or "Super" because there are "RTX 3060" and "RTX 3060 Ti" as well on the list.
I can imagine that the problem lies in the fact that my list contains both "RTX 3060" and "RTX 3060 Ti" items. So in the case of logic using the regex method when the first item in the list satisfies the condition then it returns the corresponding string. In the case of the second method it is the same problem, however both items that satisfy the condition are returned.
Method 1:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.options import Options
import pandas as pd
import datetime as dt
import os
import time
from numpy import loadtxt
import numpy as np
import re
df = pd.read_csv("./tables/KABUM - VGA_NVIDIA 20221205.csv",index_col=0)
class header:
txt_brands = open("./lists/list_brands_vga_nvidia.txt","r")
txt_models = open("./lists/list_models_vga.txt","r")
txt_lines = open("./lists/list_lines_vga.txt","r")
开发者_如何学Go list_brands = txt_brands.read().splitlines()
list_models = txt_models.read().splitlines()
list_lines = txt_lines.read().splitlines()
pattern_brands= '|'.join(list_brands)
pattern_model = '|'.join(list_models)
pattern_line = '|'.join(list_lines)
df_nvidia = pd.read_csv('./tables/KABUM - VGA_NVIDIA 20221205.csv')
headers = header()
def model_searcher(search_str:str,search_list:str):
search_obj = re.search(search_list,search_str,re.IGNORECASE)
if search_obj:
return_str = search_str[search_obj.start():search_obj.end()]
else:
return_str = 'NA'
return return_str
df['Fabricante'] = df['Produto'].apply(lambda x:model_searcher(search_str=x,search_list=getattr(headers,'pattern_brands')))
df['Modelo'] = df['Produto'].apply(lambda x:model_searcher(search_str=x,search_list=getattr(headers,'pattern_model')))
df['Linha'] = df['Produto'].apply(lambda x:model_searcher(search_str=x,search_list=getattr(headers,'pattern_line')))
df['Modelo'] = df['Modelo'].str.cat(df['Linha'],sep=' ')
Method2:
import pandas as pd
import re
list_brands = open('./lists/list_models_nvidia.txt','r')
df = pd.read_csv('./tables/KABUM - VGA_NVIDIA 20221206.csv',index_col=0)
df['Produto'] = df['Produto'].str.upper()
brands = list_brands.read().splitlines()
tmp = []
for produto in df['Produto']:
x = [brand.upper() for brand in brands if brand.upper() in produto]
tmp.append(x)
df['Marca'] = tmp
df.drop_duplicates(subset=['Marca'])
df.to_csv("./tests/Test01 - loop brands.csv",encoding="utf-8")
I'll leave the github repository in case it's helpful, again I apologize for the trivial question.
精彩评论