Loop through a list to find values that are present in a dataframe

2022-12-07 22:21 问答作者：

I’m attempting to perform a loop in a list to check if any item is present on specific row of a dataframe if so then a new column is created on the dataframe with the item matched in the list and row. In case my logic isn't clear, what I want is basically to extract the brand and model from a given column with the products description. I apologize if this question is trivial.

I tried running it through two methods, one using a lambda and regex function and the other with a simple loop. The problem with the method using regex is that the loop does not return the third string "Ti" or "Super".

The problem with the method using a simple loop is when in the dataframe row there are, for example, the models "Ti" or "Super" because there are "RTX 3060" and "RTX 3060 Ti" as well on the list.

I can imagine that the problem lies in the fact that my list contains both "RTX 3060" and "RTX 3060 Ti" items. So in the case of logic using the regex method when the first item in the list satisfies the condition then it returns the corresponding string. In the case of the second method it is the same problem, however both items that satisfy the condition are returned.

Method 1:

from selenium import webdriver
from selenium.webdriver.common.by import By 
from selenium.webdriver.common.keys import Keys 
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.options import Options
import pandas as pd
import datetime as dt
import os
import time
from numpy import loadtxt
import numpy as np 
import re

df = pd.read_csv("./tables/KABUM - VGA_NVIDIA 20221205.csv",index_col=0)

class header:
    
    txt_brands = open("./lists/list_brands_vga_nvidia.txt","r")
    txt_models = open("./lists/list_models_vga.txt","r")
    txt_lines = open("./lists/list_lines_vga.txt","r")
    
   开发者_如何学Go list_brands = txt_brands.read().splitlines() 
    list_models = txt_models.read().splitlines()
    list_lines = txt_lines.read().splitlines()

    pattern_brands= '|'.join(list_brands)
    pattern_model = '|'.join(list_models)
    pattern_line = '|'.join(list_lines)

    df_nvidia = pd.read_csv('./tables/KABUM - VGA_NVIDIA 20221205.csv')

headers = header()
    
def model_searcher(search_str:str,search_list:str):
    search_obj = re.search(search_list,search_str,re.IGNORECASE)

    if search_obj:
        return_str = search_str[search_obj.start():search_obj.end()]
    else:
        return_str = 'NA'
    return return_str

df['Fabricante'] = df['Produto'].apply(lambda x:model_searcher(search_str=x,search_list=getattr(headers,'pattern_brands')))
df['Modelo'] = df['Produto'].apply(lambda x:model_searcher(search_str=x,search_list=getattr(headers,'pattern_model')))
df['Linha'] = df['Produto'].apply(lambda x:model_searcher(search_str=x,search_list=getattr(headers,'pattern_line')))

df['Modelo'] = df['Modelo'].str.cat(df['Linha'],sep=' ')

Method2:

import pandas as pd
import re

list_brands = open('./lists/list_models_nvidia.txt','r')
df = pd.read_csv('./tables/KABUM - VGA_NVIDIA 20221206.csv',index_col=0)
df['Produto'] = df['Produto'].str.upper()

brands = list_brands.read().splitlines()

tmp = []

for produto in df['Produto']:
    x = [brand.upper() for brand in brands if brand.upper() in produto]
    tmp.append(x)

df['Marca'] = tmp
df.drop_duplicates(subset=['Marca'])

df.to_csv("./tests/Test01 - loop brands.csv",encoding="utf-8")

I'll leave the github repository in case it's helpful, again I apologize for the trivial question.

继续阅读：dataframe for-loop lambda python python-regex

Loop through a list to find values that are present in a dataframe

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？