开发者

IMDB to MySQL: Insert IMDB data into MySQL database

I’m looking for a solution to import all the IMDB data into my own MySQL database. I’ve downloaded all the IMDB data files from their homepage which are all in the file format *.list (in Windows).

I want to retrieve and that information and insert it correctly into my MySQL database so I can do some test and query searches.

I followed a guide but about half I realized that it was a 2004 guide and the way things works now did not go well with the tools from seven years ago.

I’ve browsed the net for applications, php-scripts, python-s开发者_StackOverflowcript and what not to find a solution but with no luck. The W32 tool that IMDB themselves references to don’t work either.

Is there anyone who knows a solution or a way to do this task?


There is some nice py script, witch helped me. Just make connection and run it. ~1hr to work around everything.

EDIT: Use this readme file for making script.


Changes to IMDbPY and the IMDb data files format mean that the existing answers no longer work (as of January 2018).

I am using Ubuntu 17.10 and MariaDB 10.1 (not MySQL, but the following will also work with MySQL).

Changes to IMDbPY

The latest version of IMDbPY is 6.2, it is implemented in Python 3, and the dependencies on gcc and SQLObject have been removed. Also, the Python package MySQL-python is not available for Python 3, so we install mysqlclient instead; see below. (The API of mysqlclient is compatible with MySQL-python.)

Changes to the IMDb data files format

Changes to the format of the IMDb data files were introduced in December 2017, and IMDbPY 6.2 (the current version) does not yet work with the new file format. (See this GitHub issue.)

Until this is fixed, use the most recent version of the IMDd data published in the old format, which is available at ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/. Download all *.list.gz files (excluding files from subdirectories).

New steps to follow

  1. Install Python 3 and required packages:

    sudo apt install python3
    pip3 install mysqlclient
    
  2. In MariaDB, create a database imdb, and grant all privileges to user with password password.

    CREATE DATABASE imdb;
    GRANT ALL PRIVILEGES ON imdb.* TO 'user'@'localhost' IDENTIFIED BY 'password';
    FLUSH PRIVILEGES;
    
  3. Get IMDbPY 6.2:

    wget https://github.com/alberanid/imdbpy/archive/6.2.zip
    unzip 6.2.zip
    cd imdbpy-6.2
    python3 setup.py install
    
  4. Load IMDb data into MariaDB:

    cd bin
    python3 imdbpy2sql.py -d [imdb_dataset_directory] -u 'mysql://user:password@localhost/imdb'
    

Edit: Version 6.2 of IMDbPY does not create foreign keys. See this GitHub issue. You will need to use an older version of IMDbPY if you need foreign keys to be created, but there are also reported issues with the generation of foreign keys in old versions too (see linked GitHub issue).

Update: It took 4.5 hours to import, and I had no problems using InnoDB tables.

Edit: If wish to use version 6.2 of IMDbPY and require foreign keys, then you will need to add them manually to the database after it is generated. A very small amount of cleanup of the data is required before foreign keys can be added. This cleanup and the foreign keys that need to be added are described in this GitHub issue.


On ubuntu

1) Install all the required packages.

sudo apt-get install -y gcc python python-dev libssl-dev libxml2-dev libxslt1-dev zlib1g-dev python-setuptools python-pip
easy_install -U SQLObject
pip install MySQL-python

2) Install IMDBPY.

cd [IMDBPY_parent_directory]
wget http://prdownloads.sourceforge.net/imdbpy/IMDbPY-5.1.tar.gz
tar -xzf IMDbPY-5.1.tar.gz
cd IMDbPY-5.1
python setup.py install

3) In mysql, create a database "imdb", and grant all privileges to "user" with password "password".

CREATE DATABASE imdb;
GRANT ALL PRIVILEGES ON imdb.* TO 'user'@'localhost' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

4) Download all IMDB data.

mkdir [imdb_data_directory]
cd [imdb_data_directory]
wget -r --accept="*.gz" --no-directories --no-host-directories --level 1 ftp://ftp.fu-berlin.de/pub/misc/movies/database/

5) Load IMDB data to mysql (use myisam as the storage engine).

cd [IMDBPY_parent_directory]/IMDbPY-5.1/bin
python imdbpy2sql.py -d [imdb_data_directory] -u
'mysql://user:password@localhost/imdb' --mysql-force-myisam

Borrowed from "Import IMDb Data Set from Plain Text Files To MySQL Database" with some minor fixes.


There has been an update to the imdb client and some documentation added making some of this outdated. Refer to updated docs for the latest.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜