Mixed content (float || unicode) for database column
For sake of simplicity let's say I have a questionare.
Each answer gains a score.
Some questions are qualitative so user must choose between one of the text answers.
Q: what's your fav pet?
- cat [1 point]
- dog [2 points]
- caiman [3 points]
Answering dog I get 2 points.
Some questions are quantitative so user inputs a number and gets scoring with linear interpolation:
How much beer liters do you drink in a day?
- 0 [0 points]
- 1 [1 point]
- 3 [5 points]
If I answer 2 liters I get 3 points.
Now I use sqlalchemy and have a table with an answer each row:
questions
id PK
name String
quantitative Bool
answers
id Integer PK
id_question Integer FK
value String
and cast answers.value
to float each time I have to deal with it as a number for interpolation and so on.
I could change column name
value
to_value
and make getter and setter functions foranswer.value
that casts each time to floatanswer._value
if question is numeric (answer.question.quantitative
isTrue
)I could have separate columns in answer for textual and numeric values (like
value
andtext
, I will have no millions of records anyway)Or...
What is supposed to be more efficient and easy to use?
Please consider SQLAlchemy magic that takes care of a lot of the dirty work, I'd like to keep it that simple.
Edit:
Since beer example could be misleading I integrate with another:
Q: h开发者_如何学运维ow many money do you give in charity in USD?
- 0 [0 point]
- 10 [1 point]
- 100 [2 points]
Like for pets & beer question I have answer values "0"
, "10"
, "100"
stored in database as strings in answers.value
column so to interpolate values to get score for answer 50
I have all the time to cast answers.value
to float.
Here's where I have mixed content type in the same db column.
What makes this needlessly complex is trying to optimize the quantitive answers.
It's multiple choice. Treat the quantitative answer as if it were qualitative. Carry the "points" as a separate attribute of each answer.
And yes, there will be ("3 liters", 3) in the database. And yes, to a thinking person it may seem redundant.
But for software purposes, it works out well to consider all answers qualitative and keep any quantitative mapping entirely separated.
Edit. Don't store the answer as a number. It's simply wrong.
Like for pets & beer question I have answer values "0", "10", "100" stored in database as strings in answers.value column.
Correct.
to interpolate values to get score for answer 50 I have all the time to cast answers.value to float.
Incorrect.
Look them up just the same way you handle the pets. It's a simple join. Do everything the way you do pets. Treat all data as "qualitative". One simple rule; not two rules. That's the correct and standard solution.
For a quick and dirty solution I would suggest at least using two different columns to store different answers. You can also add a CHECK constraint to the database to ensure that exactly one of them is used for any row and the other is NULL. Than do the quick-n-dirty code to calculate total Test
score.
The alternative
The idea is build the proper object model, map it to RDMBS and the question does not need to be asked. Also I expect that when using Single Table Inheritance, the resulting DB schema would be almost identical to the current implementation (you can see the model when you run the script with the option echo=True
):
CREATE TABLE questions (
id INTEGER NOT NULL,
text VARCHAR NOT NULL,
type VARCHAR(10) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE answer_options (
id INTEGER NOT NULL,
question_id INTEGER NOT NULL,
value INTEGER NOT NULL,
type VARCHAR(10) NOT NULL,
text VARCHAR,
input INTEGER,
PRIMARY KEY (id),
FOREIGN KEY(question_id) REFERENCES questions (id)
)
CREATE TABLE answers (
id INTEGER NOT NULL,
type VARCHAR(10) NOT NULL,
question_id INTEGER,
test_id INTEGER,
answer_option_id INTEGER,
answer_input INTEGER,
PRIMARY KEY (id),
FOREIGN KEY(question_id) REFERENCES questions (id),
FOREIGN KEY(answer_option_id) REFERENCES answer_options (id),
--FOREIGN KEY(test_id) REFERENCES tests (id)
)
The code below is a complete working script that shows both the object model, its mapping to the database and the usage scenarios. As it is designed, the model is easily extendable with other types of questions/answers without any impact on existing classes. Basically you get less hacky and more flexible code simply because you have an object model which properly reflects your case. The code is below:
from sqlalchemy import create_engine, Column, Integer, SmallInteger, String, ForeignKey, Table, Index
from sqlalchemy.orm import relationship, scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
# Configure test data SA
engine = create_engine('sqlite:///:memory:', echo=True)
session = scoped_session(sessionmaker(bind=engine))
Base = declarative_base()
Base.query = session.query_property()
class _BaseMixin(object):
""" Just a helper mixin class to set properties on object creation.
Also provides a convenient default __repr__() function, but be aware that
also relationships are printed, which might result in loading relations.
"""
def __init__(self, **kwargs):
for k,v in kwargs.items():
setattr(self, k, v)
def __repr__(self):
return "<%s(%s)>" % (self.__class__.__name__,
', '.join('%s=%r' % (k, self.__dict__[k])
for k in sorted(self.__dict__) if '_sa_' != k[:4] and '_backref_' != k[:9])
)
### AnswerOption hierarchy
class AnswerOption(Base, _BaseMixin):
""" Possible answer options (choice or any other configuration). """
__tablename__ = u'answer_options'
id = Column(Integer, primary_key=True)
question_id = Column(Integer, ForeignKey('questions.id'), nullable=False)
value = Column(Integer, nullable=False)
type = Column(String(10), nullable=False)
__mapper_args__ = {'polymorphic_on': type}
class AnswerOptionChoice(AnswerOption):
""" A possible answer choice for the question. """
text = Column(String, nullable=True) # when mapped to single-table, must be NULL in the DB
__mapper_args__ = {'polymorphic_identity': 'choice'}
class AnswerOptionInput(AnswerOption):
""" A configuration entry for the input-type of questions. """
input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB
__mapper_args__ = {'polymorphic_identity': 'input'}
### Question hierarchy
class Question(Base, _BaseMixin):
""" Base class for all types of questions. """
__tablename__ = u'questions'
id = Column(Integer, primary_key=True)
text = Column(String, nullable=False)
type = Column(String(10), nullable=False)
answer_options = relationship(AnswerOption, backref='question')
__mapper_args__ = {'polymorphic_on': type}
def get_answer_value(self, answer):
""" function to get a value of the answer to the question. """
raise Exception('must be implemented in a subclass')
class QuestionChoice(Question):
""" Single-choice question. """
__mapper_args__ = {'polymorphic_identity': 'choice'}
def get_answer_value(self, answer):
assert isinstance(answer, AnswerChoice)
assert answer.answer_option in self.answer_options, "Incorrect choice"
return answer.answer_option.value
class QuestionInput(Question):
""" Input type question. """
__mapper_args__ = {'polymorphic_identity': 'input'}
def get_answer_value(self, answer):
assert isinstance(answer, AnswerInput)
value_list = sorted([(_i.input, _i.value) for _i in self.answer_options])
if not value_list:
raise Exception("no input is specified for the question {0}".format(self))
if answer.answer_input <= value_list[0][0]:
return value_list[0][1]
elif answer.answer_input >= value_list[-1][0]:
return value_list[-1][1]
else: # interpolate in the range:
for _pos in range(len(value_list)-1):
if answer.answer_input == value_list[_pos+1][0]:
return value_list[_pos+1][1]
elif answer.answer_input < value_list[_pos+1][0]:
# interpolate between (_pos, _pos+1)
assert (value_list[_pos][0] != value_list[_pos+1][0])
return value_list[_pos][1] + (value_list[_pos+1][1] - value_list[_pos][1]) * (answer.answer_input - value_list[_pos][0]) / (value_list[_pos+1][0] - value_list[_pos][0])
assert False, "should never reach here"
### Answer hierarchy
class Answer(Base, _BaseMixin):
""" Represents an answer to the question. """
__tablename__ = u'answers'
id = Column(Integer, primary_key=True)
type = Column(String(10), nullable=False)
question_id = Column(Integer, ForeignKey('questions.id'), nullable=True) # when mapped to single-table, must be NULL in the DB
question = relationship(Question)
test_id = Column(Integer, ForeignKey('tests.id'), nullable=True) # @todo: decide if allow answers without a Test
__mapper_args__ = {'polymorphic_on': type}
def get_value(self):
return self.question.get_answer_value(self)
class AnswerChoice(Answer):
""" Represents an answer to the *Choice* question. """
__mapper_args__ = {'polymorphic_identity': 'choice'}
answer_option_id = Column(Integer, ForeignKey('answer_options.id'), nullable=True)
answer_option = relationship(AnswerOption, single_parent=True)
class AnswerInput(Answer):
""" Represents an answer to the *Choice* question. """
__mapper_args__ = {'polymorphic_identity': 'input'}
answer_input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB
### other classes (Questionnaire, Test) and helper tables
association_table = Table('questionnaire_question', Base.metadata,
Column('id', Integer, primary_key=True),
Column('questionnaire_id', Integer, ForeignKey('questions.id')),
Column('question_id', Integer, ForeignKey('questionnaires.id'))
)
_idx = Index('questionnaire_question_u_nci',
association_table.c.questionnaire_id,
association_table.c.question_id,
unique=True)
class Questionnaire(Base, _BaseMixin):
""" Questionnaire is a compilation of questions. """
__tablename__ = u'questionnaires'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
# @note: could use relationship with order or even add question number
questions = relationship(Question, secondary=association_table)
class Test(Base, _BaseMixin):
""" Test is a 'test' - set of answers for a given questionnaire. """
__tablename__ = u'tests'
id = Column(Integer, primary_key=True)
# @todo: add user name or reference
questionnaire_id = Column(Integer, ForeignKey('questionnaires.id'), nullable=False)
questionnaire = relationship(Questionnaire, single_parent=True)
answers = relationship(Answer, backref='test')
def total_points(self):
return sum(ans.get_value() for ans in self.answers)
# -- end of model definition --
Base.metadata.create_all(engine)
# -- insert test data --
print '-' * 20 + ' Insert TEST DATA ...'
q1 = QuestionChoice(text="What is your fav pet?")
q1c1 = AnswerOptionChoice(text="cat", value=1, question=q1)
q1c2 = AnswerOptionChoice(text="dog", value=2, question=q1)
q1c3 = AnswerOptionChoice(text="caiman", value=3)
q1.answer_options.append(q1c3)
a1 = AnswerChoice(question=q1, answer_option=q1c2)
assert a1.get_value() == 2
session.add(a1)
session.flush()
q2 = QuestionInput(text="How many liters of beer do you drink a day?")
q2i1 = AnswerOptionInput(input=0, value=0, question=q2)
q2i2 = AnswerOptionInput(input=1, value=1, question=q2)
q2i3 = AnswerOptionInput(input=3, value=5)
q2.answer_options.append(q2i3)
# test interpolation routine
_test_ip = ((-100, 0),
(0, 0),
(0.5, 0.5),
(1, 1),
(2, 3),
(3, 5),
(100, 5)
)
a2 = AnswerInput(question=q2, answer_input=None)
for _inp, _exp in _test_ip:
a2.answer_input = _inp
_res = a2.get_value()
assert _res == _exp, "{0}: {1} != {2}".format(_inp, _res, _exp)
a2.answer_input = 2
session.add(a2)
session.flush()
# create a Questionnaire and a Test
qn = Questionnaire(name='test questionnaire')
qn.questions.append(q1)
qn.questions.append(q2)
session.add(qn)
te = Test(questionnaire=qn)
te.answers.append(a1)
te.answers.append(a2)
assert te.total_points() == 5
session.add(te)
session.flush()
# -- other tests --
print '-' * 20 + ' TEST QUERIES ...'
session.expunge_all() # clear the session cache
a1 = session.query(Answer).get(1)
assert a1.get_value() == 2 # @note: will load all dependant objects (question and answer_options) automatically to compute the value
a2 = session.query(Answer).get(2)
assert a2.get_value() == 3 # @note: will load all dependant objects (question and answer_options) automatically to compute the value
te = session.query(Test).get(1)
assert te.total_points() == 5
I hope that this version of the code answers all the questions asked in the comments.
精彩评论