This one kind of rambles because I’m in a ramblin’ kinda mood..
We have lies, damn lies and statistics. One of the things about stats is that very few people really understand the theory behind them. I certainly don’t and can only think of a few of my friends who I honestly think get it. And by “get it” I mean in the same way that Einstein “got” relativity. They clearly see what it is and what it isn’t and the line between the two. They own it.
We are now hard at work on the manuscript for the second study. It’s been a messy process because this is complicated, multilevel data. But that also means there is a myriad of ways we can analyze it. Our first paper was a null result, so it would be nice to have a positive result in this paper. However, we have to be very careful about hunting for significance. If you want to find it – you can and you will. With complicated data there is almost always a way to find significance.
But it may not be honest. Your result could be significant (real) but also not significant (impact). Or it could be significant but arbitrary. Or significant and not germane. How do you know when you are being honest with your analysis?
Parsimony is one way – keeping it as simple as you can. But there are others. I personally tend to worship at the altar of the research question. That is, when things get murky I look to the research question to act as a lighthouse to guide me home.
In our current database, we have a set of data (one of our research conditions) that is not showing a result. So we had to struggle with whether to include it in the manuscript or not. My first response was to include it, because if we leave it out we could be cherry picking our data. But, it would make the paper incredible complicated to write – and we are under significant page limits in terms of length. So the question is, why would it make it complicated? Because it would require a ton of explanation and ultimately have no impact on the final result.
Bingo. The final research question was about science learning in 2D vs. stereo. This particular set of data was dirty enough that it could not be used to address that question. The only thing it has in common with the rest of the data is that it was collected as part of the same study. But it ultimately did not meet the criteria needed to address our research question. The data showed that no learning took place under this particular condition – thus the condition could not be used to test the research question because the question itself is about learning – it presumes that learning is taking place.
Say you have two thermometers with a range of 0-100 degrees (F or C – doesn’t matter) and you want to compare their accuracy and precision. If you place them in an oven at 110 degrees (or -10 degrees) your data would be useless.
These are the kind of tough questions that takes days of thinking to honestly answer. They also require, for me at least, to talk it out with other people to make sure we really are being as objective as possible. It’s also one reason why science requires independent verification of results. One of the happiest days of my life was when my first ed research paper got verified by someone else.
So, onward. The goal is still to have this paper done by the end of this month.
Maybe later we’ll analyze this separate set of data closer to see why learning didn’t take place in that condition and write our own manuscript about it.
The first paper has been published and you can access it here. Unfortunately, I can’t e-mail copies upon request. Some journals allow this, but this one does not. Journal article licensing is a complicated issue that is based on decisions made by the journal, author and funding agencies. Typically, smaller journals have more openness because they want their work to be circulated and cited. Larger journals, and ones run by the main journal companies, have more restrictions to protect their revenue source. Almost all allow the author to pay a fee to retain copyright and therefore openly publish the article. I’ve seen that fee range from $300 to $5000, depending upon the prestige of the journal. The US government now requires work funded by certain agencies (such as NIH) to be published using one of those open licenses. However, my funding agency (NSF) does not require that yet. I suspect it will in a few years. It’s a big complicated mess. The next paper, which has a surprising and intriguing result, will be published using an open source license. Our current plans are to submit it in about a month.
We are happy to report that our journal revisions were accepted for publication in the Journal of Science Education and Technology. It took about 3 weeks to make our revisions and prepare our cover letter. After we submitted them, they responded within an hour accepting the paper without further revision. We are very grateful to the editors and referees for such efficient processing of the manuscript.
Now we’re in the publication process. The first thing we needed to do was sign a contract granting them copyright of the manuscript. We also needed to decide whether to publish this as an open-access article. Normally, an article is only available to those who subscribe to the journal or those who will pay a fee to download it. However, authors have an option to pay extra to make the article free to the world immediately. We were hoping to do that with this paper. However, the fee is large and we only have budget to do it for one article. So we are going to save that for our 2nd article, which we are preparing now. The reason is that second article has some tantalizing results… (I can’t wait to write a blog post teasing it!) so we think it is best to save our open access card for that one.
But that doesn’t mean we can’t share the basic results, or even send a copy to anyone who requests it personally. Once the editorial process is complete, I’ll write a short summary of the results and post it here. If anyone wants a full copy of the final manuscript, you can e-mail me if your school/organization does not subscribe to the journal.
Soon we should get some proofs of the article from the editors. This will show us what it will look like in print. Our job, as authors, will be to read it very carefully and make sure all the grammatical and formatting/design changes they made do not alter the meaning of anything we were trying to say. The copy editors and layout staff of these journals are professionals, and I tend to not have many comments to make. They are much better at those things than I.
I expect the publication process to take 1-3 months, based on past experience. But it can vary widely based on work load of the editors and publication staff.
In the meantime, we are starting work on the next manuscript – which will be about the two astronomy films we showed at the Adler Planetarium. A goal is to have a rough draft prepared in about 3 weeks and a final draft submitted to a journal by mid August. We have high hopes for this one. More on that coming soon…
We have received initial feedback from the first paper we submitted to a journal. I’d like to use it to explore how the peer review process works in science education research (it is similar to many of the social sciences).
The paper was submitted a few weeks ago. The editors reviewed it to see if the topic and scope was appropriate for the journal (and to make sure it doesn’t have any glaring problems). They then sent it to two referees, who are anonymous to us.
After a couple of weeks, we received responses from the referees, forwarded to us by the editors. This is very quick in the science education research field! It usually takes 3-6 months.
The comments consisted of a series of Likert-style ratings where they judged the quality of various aspects of the paper. They also include an open-ended section where the referees can write specific comments, concerns, suggestions, etc. This latter section is of most use to us.
For our paper, which was on the results of the Boston Museum of Science study, the referees comments were quite different. The first referee rated the paper very well and had no substantial change requests. They suggested some ways the literature review could be improved and offered a couple of new takes on how to frame the results. Overall, they were happy with it.
The second referee had many more concerns. None of which involved the research methodology, analysis or results. They were more focused on how the paper was written and, mostly, its importance to the field. But the referee did not simply criticize the paper. They explained the rationale behind their thoughts, cited examples and offered suggestions to improve the writing.
While discussing the suggestions with my coauthors, we realized a pretty significant aspect of this research was practically ignored in the paper. That is – this is a study about kids in a real world setting. The vast majority of stereoscopic literature involves studying adults (often college kids because they are an easy population to access). This study was designed for kids only yet we did not frame the results within that context. We of course mentioned it, but we didn’t explain how age could explain why our results may differ from the literature. This is an important item because this is a science education study – not a psychological lab experiment. Our context is learning – specifically among pre-adolescent children. So we need to talk about that – how do these results affect science education with children?
The editors considered both responses and asked us to make “major revisions” to the article. In science education, a paper is never accepted outright. The usual reply to an initial submission is to make “minor revisions”, “major revisions” or simply be rejected. Different journals use different terminology, but that’s basically what it comes down to.
We are in the process of revising the paper. When done, we’ll send it in with a cover letter that will list all the comments from the referees and describe, in painstaking detail, what modifications we made in response. Our original paper is about 22 pages long. In the past, I’ve turned in revision cover letters that were 15 pages in length. It can be a major endeavor! In this case, it will probably be 7-10 pages.
The editors will then decide whether we made enough changes to accept the paper out right, accept with an additional round of changes or whether it should be rejected. Sometimes the changes are sent back to the original referees, sometimes they are sent to new referees and sometimes the editors will make the call themselves. It depends on the journal and situation.
So this is an example of thoughtful referee comments that have helped us improve the paper. In a sense, we needed this kind of 3rd party expert attention in order to remind us of a glaring contextual omission from our original manuscript. It doesn’t always go like this. Sometimes you get a referee who is grumpy, rude or didn’t even read the entire paper. But in my experience that is pretty rare. And it is one reason why journals have more than one referee review manuscripts.
Hopefully we will hear back in a month or two. If it gets published, we plan to do so under an open-access license so anyone can read it. If that happens, we’ll link to the paper from here.
With the first paper now submitted to a journal, we are now working on analyzing the data from the second study. As a reminder: that study involved watching 1 of 2 stereoscopic films at the Adler Planetarium and taking a pre/post-test.
One of the items on the test was a confidence measure. After the first question, we asked “How confident are you of your answer?” and provided a 3-point scale they could choose from (very, somewhat, not at all). I was interested in whether stereoscopy would affect confidence/self efficacy1. And remember we also had them take a short spatial cognition test.
Below are the responses we received on the two tests compared with the spatial cognition score the person received.
First, you can see an obvious drop off in confidence among all groups after they watched the film. This is not a surprise as the films were relatively challenging intellectually – with a lot of material packed into a short amount of time. However, when you look at the trends between the tests something interesting appears. On the pre-test, spatial ability does not seem to be consistently related to confidence in answering the question. However, on the post-test spatial ability seems to be inversely related with confidence. That is, those who did better on the spatial test felt like they performed worse in answering the question! Why would this only appear on the post-test? There definitely is some effect caused by the film here. The difference here is statistically significant (p<.01) according to a repeated measures ANOVA.
The effect size here is small. Our sample is ~1000 people so it doesn’t take much signal to drown out the noise. And the eyeball test makes one think there could be more noise in the pre-test and that is just drowning out the trend. That could be true. One of my statistical mentors drilled into me to always trust your eyes - its the best S/N filter out there. But we have a large sample size, the standard errors are small and this is linked (not independent) data. So if that was the case we really should be able to see that in the error of the datasets being widely different – but it’s not. Nevertheless, stochastic error is not the same as systemic error, so this is a very tough call. My gut tells me it is systemic, but not due to the film content as much to the setting. Maybe people were concentrating more intently on the post-test because the room was quieter and everyone was working at the same time (vs. the pre-test when people came in separately). We will certainly investigate this further in the coming months.
Now the big question is, how is the overall drop in confidence related to whether they saw the film in 2D or in stereo? Do those with lower spatial ability lose more confidence because they feel overwhelmed? Or do those with higher spatial ability lose confidence because they actually “get” the film more and realize what they have yet to learn? Or is there no affect at all?
1It also may be aligned with a result I found in an earlier citizen science research study I conducted that found citizen scientists show lower self efficacy not because they feel they know less but more because they become aware of how much they do not yet know.
So it’s been a while and in that time we have completed the first draft of a research paper about the first study. In that paper, the results we discussed in the blog post have held up. And we found the difference was indeed related to some of the demographic factors we studied. In fact, the difference was only apparent when we controlled for some of the factors. And one of the factors that was most important was novelty.
A lot of the 3D papers out there show that 3D in the classroom does indeed increase performance and engagement. But those papers do not control for the novelty effect – that kids will be paying more attention simply because something different is going on in the classroom.
I have a perfect example of such an effect. I was in an 8th grade Earth Science classroom at a math and science charter school pilot testing some 3D technology and visuals. When I passed out the glasses to the class the kids got very animated and excited. One kid held his hand in front of his face and went “Oooohhh! Cool!” then some other kids did the same thing after seeing him.
That doesn’t make any logical sense because the glasses themselves have no effect on your ability to see depth in real life objects. If anything, they probably harm it! But the students knew what to expect and (combined with the excitement of doing something different in their class) magnified that expectation. That is the novelty effect.
That studies haven’t accounted for this has bothered me and, to be honest, came across as somewhat intellectually dishonest. Everyone who uses 3D with kids sees this effect. But I think there is a fear, conscious or subconscious, that the only reason 3D may work is because of this novelty factor.
But it is important to consider novelty. 3D is everywhere now. A 10 year-old kid has never lived in a world where 3D movies weren’t available or even commonplace. Now 3D is on consumer televisions and video games. Soon, if not already, the novelty will wear off and it no longer can be a crutch.
So, the question is, when novelty has worn off will 3D still work?
We have endeavored to control for the novelty effect in this study in a few ways. First, we provided the children with a training period at the start of the study session where they got to use the glasses, look at pictures and answer fake questions (real questions where we discarded the data). That way they got some time to get used to the system. Second, we showed the slides to the children in both 2D and 3D format – but we did not tell them which was which. That is, the children wore their glasses the entire time. Finally, we asked their parents about their child’s experience with 3D movies and video games. That data was entered into our statistical model as a covariate, which controls for that variable.
When we remove that covariate from our analysis of the drawings, the difference goes away. This tells us that novelty does have an impact on the results of this study. I’ll save the details and interpretation of that for the paper, which we hope to submit to a science education journal in April. If we meet that deadline, then we’re probably looking at publication next Fall – if it is accepted. And that’s a big if. I’ll talk more about that next time.
So one of the items on the slide show test at the MoS Boston is a picture of a hurricane from the International Space Station. Some of the children saw the picture in 2D and some saw it in 3D (both were wearing glasses). After they looked at it, we removed it from the screen and asked them to draw what they saw.
We had two people code the drawings (see last post for details). We gave them two rubrics: one about how the child drew the eye of the hurricane and the other about how they drew the overall shape.
Our analysis, so far (i.e. it’s very early and could change), shows that there is no difference in how the children drew the eye and there is a large difference in how they drew the shape of the hurricane. The latter is mainly through an increased number of children who drew it as a spiral as opposed to a donut. One possible meaning of this could be that 3D, in this experimental condition, increased the ability to denote shape but not structure. This surprises me, and goes against my hypothesis, because structure is more dependent on spatial depth than shape is. Nevertheless, the data is what it is - and the difference seems pretty robust at this point.
Next, I need to check the data and analysis to confirm the result. Then I will start to see if the result is related to any other factors we recorded (covariates) such as prior spatial ability, gender, age, etc.
The slide we showed.
So we are deep into the data analysis phase now. Al the data has been processed, stored, backed up, etc. ad infinitum. Now we start looking for signals and stories.
Recall that for both of our studies we had participants answer multiple-choice questions and also draw pictures. The multiple-choice questions will be analyzed as quantitative data, where numbers represent the answers given in a somewhat objective manner. The drawing questions will first be analyzed as qualitative data, which is subjective in nature.
For example, we can ask someone “Is the Sun a star?” with the choice to answer “Yes” or “No”. We can assign a “1” to “Yes” (the correct answer) and a “0” to “No”. Once we set that rule, no matter who analyzes the data, they know what a 1 and a 0 mean. 1 means they got it right, 0 means they got it wrong.
With our qualitative data, we have drawings. For one set of drawings we asked them to draw the Milky Way galaxy and label the location of the Sun. In order to determine if the person placed the Sun in the correct location we have to judge a number of things about the drawing. We first have to judge whether they drew the galaxy accurately enough to allow us to denote a location. Then, if so, we have to judge whether the location they gave was correct or not. We have a general idea of where the Sun is in our galaxy. But that position, as shown on a drawing, can change based on the viewing angle of the galaxy.
We are interested in this question because it is something talked about in the 3D film they watch. Also, it is a very spatial concept to understand – because we only see the galaxy from within. So how do we know where the Sun is in relation to the rest of the galaxy? It requires some 3D thinking about how a spiral galaxy would look from the inside, something the film attempts to convey using stereoscopy.
Here are some pictures of the galaxy with the Sun drawn or labeled on it by three adult study participants:
For differing reasons for each of those drawings, someone could argue that it is correct and someone could argue that it is wrong. How can we analyze data when we can’t even agree if the answer is right or not?
In education research, the typical procedure is to have multiple people rate (grade) a set of test drawings using a common set of instructions (rubric). You then compare the results and see if they are the same – this is called inter-rater reliability. If it is the same, then you do that for all the rest of the drawings. If it is not the same, then you talk about why each person rated it differently, adjust the rubric to be more precise, and start again with a new set of data. One the inter-rate reliability reaches a certain level, you stop the iterative process and begin rating the full, real data set. The point that you stop at can change based on the person and project – in general it depends on the research question. You almost never get 100% reliability. In general, I use a statistic called kappa (which is correlation minus chance) and if it is .7 or .8 or higher, we’re good.
That is the stage we are at now. We are almost done finishing six rubrics for the two studies. We have about 2,300 drawings that need to be analyzed. So we’ll create almost 15,000 ratings when we’re done. It will be an amazing data set. But it will take a long time to analyze. We’ll talk about that next…
So this has been a pretty wild summer. Data collection on both projects is at full blast. For the Boston MoS study, we currently have around 275 study participants. For the Adler Planetarium study, we have about 876. We are well on pace for the Adler study and should meet our goal of 1000 by the end of August (we will likely exceed it!).
But we probably won’t reach our goal of 400 for the Boston MoS study. This is despite the fact that we extended data collection more than 3 months longer than originally planned and the fact that the MoS is graciously donating some of their own intern time to the project. Recruitment is going fine, it just takes a long time to process a family through the kiosk and we have limited hours on the floor. However, this will likely be OK. Looking at the stats, it looks like this will be enough to answer our core research questions about 3D vs. 2D representations.
I have spent a lot of time with the data in the last month, with some help from experts who are also part of the project, and results are starting to form up. Our original plan was to devote year 3 to analysis of the data (which begins in October). I think we’ll probably have some results for this study by the end of December and hope to submit a paper by then. The rest of the year will be spent on the second study at the Adler.
All summer we had hoped to submit a paper to an education conference called AERA. However, we just didn’t have enough data or time by their July 22 deadline (I think their deadlines are way too early – the conference is not until April). We were going to submit to another conference – NARST (which is my favorite anyway) on August 15. However, I need to present a work related paper there and they only allow one primary-authored publication. So that won’t “work”. Now we’re looking at the ICLS conference, which is due in November. So the timing is right. And that’s also one of my favorite conferences to attend.
This is what happens when you are rounding 3rd on a research project (its the dog days of August and pennant races are afoot - allow me my baseball metaphors!) . Now deadlines mix with data to create a weird matrix of what-has-to-be-done-when. At one point I was burning the midnight oil trying to make this work. Then it struck me – l have an entire year to do the analysis and publication for this project. Why am I in such a hurry? Let’s slow down, do it right abd actually enjoy it. The best part of a project is the first time you see the data loaded into SPSS/your stats software of choice. Let’s make it last.
By the end of this month data collection should be done with both projects. That will be a huge stress relief. There is a lot of busywork and (for a worry wart like me) stress that goes into running data collection on two studies simultaneously and remotely. At work I manage 4 researchers in the same type of work – but I’m there everyday to interact with them. Being far away is a little nerve racking. Am I being too hands-off? Or am I micromanaging? Do they want more input from me or do they want me to back off? Am I being too much of a cheerleader or too pessimistic? One thing is almost definite – I’m overthinking it. :)
Once the data is in my hands (and the hands of my Co-PIs) then the fun begins. But the hard part is by no means over. I’ll explain why next…
The Adler study has begun active data collection. The study uses software run on iPads. The software was expertly produced by Clockwork Active Media Systems and funded by the National Science Foundation grant. We have decided to release the Objective C source code under the GNU Affero open source license. You can download the software and learn more via its GitHub page:
This begs the question: Why would you want to use software that is so uniquely designed for this project? As a whole, you probably don’t. However, there are pieces of the code that are somewhat unique that someone may want to use for their own software and/or study. So here is a description of how the software works with some pointers at interesting pieces.
The program opens with an admin page for the researcher. This is where they tell the software which of the two films the audience is about to see and whether it will be in 2D or 3D. After that, it moves on to the test – which is customized based on which film they are watching.
The first test page asks some demographic questions. There is nothing interesting here. Then the user is given 5 spatial cognition tasks taken from the Purdue Visualization of Rotations Test . Note the tests themselves are NOT in the public domain. However, our experience is that the author is extremely helpful and was willing to let us use the test with minimum fuss. In return, we will share our results with him.
After those test items are done, the user is given a test item in a format that begins with a question and then lets them click on one of four images to answer it. Then they are asked about their confidence in the answer. That is followed by two multiple choice text questions. The final question is a drawing task, where the user draws with their fingers.
When that is done, the app pauses for five minutes so the user can watch the film. This happens to prevent people from skipping ahead. When the film is over, the app is ready for the post-test. This test is exactly the same as the one before, except without the demographic and spatial cognition questions.
When they are done, the software uploads the data to a server (also available in the source code). The server parses it into an XML file that the researcher downloads later for analysis. If the data upload does not succeed for any reason, the border of the screen turns red, thereby informing the researcher. It will automatically try again after a few minutes and eventually turn blue upon success.
While all this is happening, the software is recording the time of completion for each task and item. It is also recording the accelerometers in X,Y,Z coordinates based on time. We will analyze that data to see if there is a relationship between how the user held the device and the results.
So, some interesting things that may be of use to other researchers/coders are: 1. The client/server relationship 2. The accelerometer archiving and 3. The drawing GUI (which I think is very intuitive and simple – nice job, Clockwork!) . Of course, the entire framework may be of use to anyone who wants to do a pre/post test of any type of short-timed intervention.
Once the Adler study is finished, we will also release the two films into the public domain using a Creative Commons license. Expect that to occur in the Fall.
A screen shot of the drawing task:
The intermission screen:
The multiple choice image format:
The source repository contains the full Xcode project for Two Eyes, 3D, as well as the PHP web services for uploading quiz data, listing it, and updating the quiz remotely.
A word about Clockwork: They were terrific to work with. I’m relatively new to the world of professionally coded software and it was a treat to work with them. They were patient, professional, courteous, organized and very, very smart. I highly recommend them to any researchers needing expert coding help from a professional group.
The second film is almost done. The crack team at the Adler Space Visualization Lab have done a great job. These two films will be shown at the Adler throughout the summer on almost every weekday – free to visitors. We also plan to eventually place them on YouTube (in both 2D and 3D form) as well. And they will be released into the Creative Commons so almost anyone can use/edit/play with them as they wish.
In about two weeks official data collection will begin at the Adler. Last night I tested the last version of the iPad testing software, created by a similarly crack team at Clockwork Active Media Systems. That too will be released using an open source license and likely posted to SourceForge in the next month or so.
Also in two weeks our RA at the Boston Museum of Science will be resigning to spend summer with her two boys. We are very grateful for her help this year. If anyone is looking for a part-time research assistant in the Boston area for the next year, please contact me and I’ll put you in touch. I cannot recommend her enough. A new intern from the Museum will be taking over data collection duties, which we expect to run through the end of July. Right now we have about 275 of the 400 test responses we planned for.
This summer is really where the wheels meet the road. Lots of data collection and various items to be juggled. But when it is done, the data will be here and the fun part begins – analysis and results.
This is a screen shot from the 2nd film, which is about the shape of the Milky Way galaxy. This is from a scene that describes a Native American story of the Milky Way being a path of animals across the sky.
The following shot is from a scene that describes how some early models of the galaxy involved spheres of stars to explain the band of light we see in the night sky. This will be shown stereoscopically, which we hope will help explain how spheres can be oriented to appear as bands of light when seen from far away.
The project has just passed its half-way mark. We are cruising through our data collection phase. This summer will be very busy for the project and critical towards its success.
The Boston Museum of Science study continues to chug along smoothly. The Research Assistant works two shifts per week with the MoS staff out on the Museum floor collecting data from children viewing the 2D vs. 3D slides. Our goal is to reach 400 test subjects. At the current rate, we should reach that in July. However, it is possible that recruitment will pick up in the summer as the Museum becomes more busy.
Once we have the data, I’ll began crunching it quickly. I’ll work closely with one of my Co-PIs, a professor at University of California, Santa Cruz, to prepare a set of mini papers and presentations to educational research conferences next year. Most of those have submission deadlines late this summer, so we won’t have much time to play with the data. It will be all work. Luckily, my family and I are moving to an apartment next to a beach on Lake Michigan in Chicago this summer. So I hope to set up shop with the laptop and an umbrella, so as not to miss the summer sunshine.
The Adler study is a little behind schedule, but it should also end up finishing on time in the end. Our narrator just finished recording her tracks and the production team is putting the final touches on the films. We plan to begin showing the two films at the Adler Planetarium in their Space Visualization Lab during the 2nd week of June. Stop by if you want to participate! The plan is to do 3 screenings per day, five days a week. If we are able to get 20 people per screening then we should reach our data collection goal for this project in mid July. We are scheduled to work until the end of August. So it gives us some room for error (sick days, slow days, equipment trouble, etc.). If we do happen to reach the goal early, then we have a month to get extra data and/or run an entirely new, bonus study.
Basically, we should be done with all data collection by the end of the summer. We have to be really, since our data collection funding runs out then. That gives us an entire year to analyze results and write papers. For me, that is the fun part. Right now we plan on at least three papers for major science education research journals. I expect we’ll end up with many more, and that doesn’t count conference presentations. At this point, no one has studied this – yet development in 3D in the classroom and informal settings continues at a fast pace. There continues to be a need for this information.
A screen shot from a draft of the film about the Milky Way galaxy.
This is a report on the 2nd study of this project, taking place at the Adler Planetarium.
The first film, about Type-1a supernovae, is just about in the can. It runs about 7-8 minutes in length. I think it is a little dense on the content, but we slowed down the narration to compensate and some small test screenings have gone well. The second film, about galaxy morphology, is currently being written. The plan is to storyboard early next month and develop the film for pilot testing in April. If all goes well, both films will be airing to the public in the summer - and we’ll be collecting data the entire time.
We have looked at the pilot data of the first film and test sessions. Because of technical reasons, we could only show the 3D film so we don’t have any 2D vs. 3D comparison data. Of the data we have, we don’t find any relationships with correct answers and the spatial cognition scores (nor for gender or age). But that’s not our core question, which is whether those scores are related to the difference in correct answers between 2D and 3D. Plus, our sample size was tiny (N=33). The pilot was mainly about testing the software, procedures and the test items.
We had four test items - two multiple choice, followed by “Explain Your Choice:” text boxes and two “draw and label” questions. We’ve opted to take the “explain” answers and use them to derive 4-5 options for answers, which will replace the prior multiple choice options. The reason for this is that it will lower the time it takes to take the pre- and post-test. Right now each test takes about 15 minutes, meaning 30 minutes for the whole day. That’s unacceptable for an audience that is attending the planetarium for fun! We need to get it down to 8-10 minutes. As a rule, I hate multiple choice tests. I’ll save that rant for another time. But one way to make them slightly more palatable is to use open-ended (“explain your answer”) data to create multiple choice options that reflect authentic thinking and are not artificially generated by an outside person. So I think we have a good compromise here.
As for the drawing questions, we decided to drop one of them. Looking at the data, it seems that there is a greater difference between the pre- and post-test answers on one of the items than the other. This implies it may be more sensitive to differences in groups, so we decided to keep it. Here is a sample pre- and post-test drawing made by one of the pilot participants:
The item asked them to draw two white dwarves merging. The first drawing shows two stars with a black hole in the middle. It is interesting that the person’s prior knowledge made them think there is a black hole in the middle of two merging stars (which is a very complicated concept to decipher). The post test still has the black hole, but now has lines to indicate an explosion, or greater luminosity. It also has arrows showing momentum. Are those lines orbital paths of two stars? Or do they represent the surface of two stars? The answer is important as it changes the interpretation of the arrows, which could indicate orbital motion or a spinning sphere or disc. This is why drawing questions are so tough to score. They can often reveal much more nuanced understanding by the participant, but scoring them requires sense making by the scorer (a.k.a. “grader”), thus introducing a source of noise. In education research, I fall into the “mixed methods” camp - which states that the best research uses both qualitative and quantitative methods. Hence why this study has both traditional test questions and these drawing tasks.